| Title: | The Shell Game - Audit Geographic Data Transformations |
| Version: | 0.1.1 |
| Description: | Reveals how data quality silently degrades during geographic transformations while variable labels remain unchanged. Demonstrates that transformation error is agnostic to both the variable (population, income, etc.) and the tool ('R', 'Python', etc.). Provides a reproducible audit framework for quantifying the shift from observed to imputed data at each transformation hop. |
| License: | MIT + file LICENSE |
| URL: | https://github.com/phinnphace/shellgame |
| BugReports: | https://github.com/phinnphace/shellgame/issues |
| Depends: | R (≥ 4.0.0) |
| Imports: | dplyr (≥ 1.0.0), ggplot2, janitor, magrittr, rlang, stringr, tidycensus, utils |
| Suggests: | geoDeltaAudit, knitr, readr, rmarkdown, spelling, testthat (≥ 3.0.0) |
| VignetteBuilder: | knitr |
| Encoding: | UTF-8 |
| Config/testthat/edition: | 3 |
| Config/roxygen2/version: | 8.0.0 |
| Language: | en-US |
| NeedsCompilation: | no |
| Packaged: | 2026-05-20 16:58:57 UTC; phinnmarkson |
| Author: | Phinn Markson |
| Maintainer: | Phinn Markson <markson.2@osu.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-05-27 20:10:23 UTC |
Audit geographic transformation
Description
Main function to audit a complete geographic transformation pipeline. Quantifies the perturbation introduced at each hop and reveals the shell game.
Usage
audit_transformation(
baseline_data,
zip_zcta_map,
hud_crosswalk,
county_fips,
variable_name = "value",
value_col = "estimate"
)
Arguments
baseline_data |
Data frame with baseline data at source geography |
zip_zcta_map |
ZIP-ZCTA association crosswalk |
hud_crosswalk |
HUD ZIP-County crosswalk |
county_fips |
Target county FIPS code |
variable_name |
Name of the variable being tracked (for reporting) |
value_col |
Name of the value column in baseline_data |
Value
An object of class "shellgame_audit" with audit results
Examples
baseline <- data.frame(zcta = c("00001", "00002"), estimate = c(1000, 2000))
zip_zcta <- data.frame(zcta = c("00001", "00002"), zip = c("00010", "00010"))
hud <- data.frame(zip = "00010", county = "99999", tot_ratio = 1)
result <- audit_transformation(
baseline_data = baseline,
zip_zcta_map = zip_zcta,
hud_crosswalk = hud,
county_fips = "99999",
variable_name = "population"
)
summary(result)
Check for Census API key
Description
Validates that a Census API key is available for tidycensus.
Usage
check_census_key(install = FALSE)
Arguments
install |
Logical, whether to install the key for future sessions |
Value
Invisible TRUE if key exists, stops with error if not
Create complete audit report
Description
Generates all visualizations for an audit.
Usage
create_audit_report(
audit_result,
zcta_baseline_sf = NULL,
zcta_geometric_sf = NULL,
county_sf = NULL
)
Arguments
audit_result |
A shellgame_audit object |
zcta_baseline_sf |
Optional: SF object with baseline ZCTAs |
zcta_geometric_sf |
Optional: SF object with geometric ZCTAs |
county_sf |
Optional: SF object with county boundary |
Value
List of ggplot2 objects
Extract perturbation by receiving county
Description
Returns a data frame of counties that received population redistributed from the target county during the transformation, ordered by magnitude.
Usage
extract_perturbed_population(audit_result, top_n = 10)
Arguments
audit_result |
A shellgame_audit object |
top_n |
Number of top counties to return (default: 10) |
Value
Data frame with columns: county, value
Get ACS baseline data for ZCTAs
Description
Fetches ACS 5-year estimates for a specified variable at the ZCTA level using the Census API via the tidycensus package. Requires a Census API key (see https://api.census.gov/data/key_signup.html) and the tidycensus package to be installed.
Usage
get_zcta_baseline(variable, year = 2022, zctas = NULL)
Arguments
variable |
ACS variable code (e.g., "B01001_001" for total population) |
year |
ACS year (default: 2022) |
zctas |
Optional character vector of ZCTAs to filter to |
Value
Data frame with columns: zcta, estimate, moe
Examples
## Not run:
# get_zcta_baseline() retrieves ACS data via the Census API.
# See vignette("data-preparation", package = "geoDeltaAudit") for a full walkthrough.
pop_data <- get_zcta_baseline("B01001_001", year = 2022)
## End(Not run)
Pad GEOID to 5 digits
Description
Ensures geographic identifiers are zero-padded to 5 digits.
Usage
pad_geoid(geoid)
Arguments
geoid |
Character or numeric vector of geographic identifiers |
Value
Character vector of 5-digit zero-padded GEOIDs
Examples
pad_geoid(c("123", "45678", 789))
#> [1] "00123" "45678" "00789"
Plot baseline ZCTAs
Description
Creates a map showing the baseline ZCTAs used in the analysis.
Usage
plot_baseline_zctas(zcta_sf, county_sf, title = "Baseline ZCTAs")
Arguments
zcta_sf |
SF object with ZCTA geometries |
county_sf |
SF object with county boundary |
title |
Plot title |
Value
A ggplot2 object
Plot geometric vs relationship membership
Description
Visualizes the discrepancy between geometric intersection and relationship-based membership.
Usage
plot_geometric_vs_relationship(
zcta_baseline_sf,
zcta_geometric_sf,
county_sf,
title = "Geometric vs Relationship Membership"
)
Arguments
zcta_baseline_sf |
SF object with baseline ZCTAs (relationship-based) |
zcta_geometric_sf |
SF object with all geometrically intersecting ZCTAs |
county_sf |
SF object with county boundary |
title |
Plot title |
Value
A ggplot2 object
Plot transformation perturbation
Description
Creates a simple bar chart showing baseline vs recovered values.
Usage
plot_transformation_perturbation(audit_result)
Arguments
audit_result |
A shellgame_audit object |
Value
A ggplot2 object
Prepare HUD ZIP-County crosswalk data
Description
Standardizes HUD crosswalk data with proper column names and formatting.
Usage
prep_hud_crosswalk(data, ratio_col = "TOT_RATIO")
Arguments
data |
Raw HUD crosswalk data frame |
ratio_col |
Name of the ratio column to use (default: "TOT_RATIO") |
Value
Data frame with standardized columns: zip, county, tot_ratio
Examples
raw <- data.frame(ZIP = "00010", COUNTY = "99999", TOT_RATIO = 1)
result <- prep_hud_crosswalk(raw)
Prepare ZIP-ZCTA crosswalk data
Description
Standardizes ZIP-ZCTA crosswalk data with proper column names and formatting.
Usage
prep_zip_zcta(data, zip_col = NULL, zcta_col = "zcta")
Arguments
data |
Raw ZIP-ZCTA crosswalk data frame |
zip_col |
Name of the ZIP code column (default: "ZIP_CODE" or "zip") |
zcta_col |
Name of the ZCTA column (default: "zcta") |
Value
Data frame with standardized columns: zcta, zip
Examples
raw <- data.frame(ZIP_CODE = c("00010", "00010"), zcta = c("00001", "00002"))
result <- prep_zip_zcta(raw)
Print method for shellgame_audit
Description
Print method for shellgame_audit
Usage
## S3 method for class 'shellgame_audit'
print(x, ...)
Arguments
x |
A shellgame_audit object |
... |
Additional arguments (ignored) |
Value
Invisibly returns the input object. Called for side effects (console output).
Run full transformation pipeline
Description
Executes both hops: ZCTA → ZIP → County. Tracks the complete swap from observed to imputed data.
Usage
run_full_transformation(
baseline_data,
zip_zcta_map,
hud_crosswalk,
value_col = "estimate",
county_fips = NULL
)
Arguments
baseline_data |
Data frame with ZCTA-level baseline data |
zip_zcta_map |
ZIP-ZCTA association table |
hud_crosswalk |
HUD ZIP-County crosswalk |
value_col |
Name of value column in baseline_data |
county_fips |
Optional county FIPS to filter final result |
Value
List with intermediate and final results
Examples
baseline <- data.frame(zcta = c("00001", "00002"), estimate = c(1000, 2000))
zip_zcta <- data.frame(zcta = c("00001", "00002"), zip = c("00010", "00010"))
hud <- data.frame(zip = "00010", county = "99999", tot_ratio = 1)
result <- run_full_transformation(baseline, zip_zcta, hud,
value_col = "estimate", county_fips = "99999")
Summary method for shellgame_audit
Description
Summary method for shellgame_audit
Usage
## S3 method for class 'shellgame_audit'
summary(object, ...)
Arguments
object |
A shellgame_audit object |
... |
Additional arguments (ignored) |
Value
Invisibly returns the input object. Called for side effects (console output).
Transform ZCTA data to ZIP level
Description
Performs the first hop: ZCTA → ZIP using association-based allocation. This is where the first swap occurs: observed data → imputed data.
Usage
transform_zcta_to_zip(baseline_data, zip_zcta_map, value_col = "estimate")
Arguments
baseline_data |
Data frame with columns: zcta, and a value column |
zip_zcta_map |
Data frame with columns: zcta, zip |
value_col |
Name of the value column in baseline_data (default: "estimate") |
Value
Data frame with columns: zip, value (allocated to ZIP level)
Examples
baseline <- data.frame(zcta = c("00001", "00002"), estimate = c(1000, 2000))
zip_zcta <- data.frame(zcta = c("00001", "00002"), zip = c("00010", "00010"))
result <- transform_zcta_to_zip(baseline, zip_zcta, value_col = "estimate")
Transform ZIP data to County level
Description
Performs the second hop: ZIP → County using HUD TOT_RATIO allocation. This is where the second swap occurs: further imputation via proxy.
Usage
transform_zip_to_county(zip_data, hud_crosswalk, county_fips = NULL)
Arguments
zip_data |
Data frame with columns: zip, value |
hud_crosswalk |
Data frame with columns: zip, county, tot_ratio |
county_fips |
Optional FIPS code to filter to specific county |
Value
Data frame with columns: county, value (allocated to county level)
Examples
zip_data <- data.frame(zip = "00010", value = 3000)
hud <- data.frame(zip = "00010", county = "99999", tot_ratio = 1)
result <- transform_zip_to_county(zip_data, hud, county_fips = "99999")