Title: | An R Interface for Downloading, Reading, and Handling IPUMS Data |
Version: | 0.9.0 |
Description: | An easy way to work with census, survey, and geographic data provided by IPUMS in R. Generate and download data through the IPUMS API and load IPUMS files into R with their associated metadata to make analysis easier. IPUMS data describing 1.4 billion individuals drawn from over 750 censuses and surveys is available free of charge from the IPUMS website https://www.ipums.org. |
License: | Mozilla Public License 2.0 |
URL: | https://tech.popdata.org/ipumsr/, https://github.com/ipums/ipumsr, https://www.ipums.org |
BugReports: | https://github.com/ipums/ipumsr/issues |
Depends: | R (≥ 3.6) |
Imports: | dplyr (≥ 0.7.0), haven (≥ 2.2.0), hipread (≥ 0.2.0), httr, jsonlite, lifecycle, purrr, R6, readr, rlang, tibble, tidyselect, xml2, zeallot |
Suggests: | biglm, covr, crayon, DBI, dbplyr, DT, ggplot2, htmltools, knitr, rmapshaper, rmarkdown, RSQLite (≥ 2.3.3), rstudioapi, scales, sf, shiny, testthat (≥ 3.2.0), tidyr, vcr (≥ 0.6.0), withr |
VignetteBuilder: | knitr |
Contact: | ipums@umn.edu |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-06-04 16:11:38 UTC; robe2037 |
Author: | Greg Freedman Ellis [aut], Derek Burk [aut, cre], Finn Roberts [aut], Joe Grover [ctb], Dan Ehrlich [ctb], Renae Rodgers [ctb], Institute for Social Research and Data Innovation [cph] |
Maintainer: | Derek Burk <ipums+cran@umn.edu> |
Repository: | CRAN |
Date/Publication: | 2025-06-04 16:50:02 UTC |
ipumsr: An R Interface for Downloading, Reading, and Handling IPUMS Data
Description
An easy way to work with census, survey, and geographic data provided by IPUMS in R. Generate and download data through the IPUMS API and load IPUMS files into R with their associated metadata to make analysis easier. IPUMS data describing 1.4 billion individuals drawn from over 750 censuses and surveys is available free of charge from the IPUMS website https://www.ipums.org.
Author(s)
Maintainer: Derek Burk ipums+cran@umn.edu
Authors:
Greg Freedman Ellis
Finn Roberts
Other contributors:
Joe Grover [contributor]
Dan Ehrlich [contributor]
Renae Rodgers [contributor]
Institute for Social Research and Data Innovation ipums@umn.edu [copyright holder]
See Also
Useful links:
Report bugs at https://github.com/ipums/ipumsr/issues
Add values to an existing IPUMS extract definition
Description
Add or replace values in an existing ipums_extract
object.
This function is an S3 generic whose behavior will depend on the
subclass (i.e. collection) of the extract being modified.
To add to an IPUMS microdata extract definition, click here. This includes:
IPUMS USA
IPUMS CPS
IPUMS International
IPUMS Time Use (ATUS, AHTUS, MTUS)
IPUMS Health Surveys (NHIS, MEPS)
To add to an IPUMS aggregate data extract definition, click here. This includes:
IPUMS NHGIS
IPUMS IHGIS
This function is marked as experimental because it is typically not the best
option for maintaining reproducible extract definitions and may be retired
in the future. For reproducibility, users should strive to build extract
definitions with define_extract_micro()
or define_extract_agg()
.
If you have a complicated extract definition to revise, but do not have
the original extract definition code that created it, we suggest that you
save the revised extract as a JSON file with save_extract_as_json()
. This
will create a stable version of the extract definition that
can be used in the future as needed.
To remove existing values from an extract definition, use
remove_from_extract()
.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
add_to_extract(extract, ...)
Arguments
extract |
An |
... |
Additional arguments specifying the extract fields and values to add to the extract definition. All arguments available in |
Value
An object of the same class as extract
containing the modified
extract definition
See Also
remove_from_extract()
to remove values from an extract definition.
define_extract_micro()
or define_extract_agg()
to define an
extract request manually.
submit_extract()
to submit an extract request for processing.
Examples
# Microdata extracts
usa_extract <- define_extract_micro(
collection = "usa",
description = "2013 ACS Data",
samples = "us2013a",
variables = c("SEX", "AGE", "YEAR")
)
# Add new samples and variables
add_to_extract(
usa_extract,
samples = c("us2014a", "us2015a"),
variables = var_spec("MARST", data_quality_flags = TRUE)
)
# Update existing variables
add_to_extract(
usa_extract,
variables = var_spec("SEX", case_selections = "1")
)
# Modify/add multiple variables
add_to_extract(
usa_extract,
variables = list(
var_spec("SEX", case_selections = "1"),
var_spec("RELATE")
)
)
# NHGIS extracts
nhgis_extract <- define_extract_agg(
"nhgis",
datasets = ds_spec(
"1990_STF1",
data_tables = c("NP1", "NP2"),
geog_levels = "county"
)
)
# Add a new dataset or time series table
add_to_extract(
nhgis_extract,
datasets = ds_spec(
"1980_STF1",
data_tables = "NT1A",
geog_levels = c("county", "state")
)
)
# Update existing datasets/time series tables
add_to_extract(
nhgis_extract,
datasets = ds_spec("1990_STF1", c("NP1", "NP2"), "state")
)
# Modify/add multiple datasets or time series tables
add_to_extract(
nhgis_extract,
time_series_tables = list(
tst_spec("CW3", geog_levels = "state"),
tst_spec("CW4", geog_levels = "state")
)
)
# Values that can only take a single value are replaced
add_to_extract(nhgis_extract, data_format = "fixed_width")$data_format
Add values to an existing IPUMS NHGIS extract definition
Description
Add new values to an IPUMS aggregate data extract definition.
All fields are optional, and if omitted, will be unchanged.
Supplying a value for fields that take a single value, such as
description
and data_format
, will replace the existing value with
the supplied value.
This function is marked as experimental because it is typically not the best
option for maintaining reproducible extract definitions and may be retired
in the future. For reproducibility, users should strive to build extract
definitions with define_extract_agg()
.
If you have a complicated extract definition to revise, but do not have
the original extract definition code that created it, we suggest that you
save the revised extract as a JSON file with save_extract_as_json()
. This
will create a stable version of the extract definition that
can be used in the future as needed.
To remove existing values from an IPUMS NHGIS extract definition, use
remove_from_extract()
.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
## S3 method for class 'agg_extract'
add_to_extract(
extract,
description = NULL,
datasets = NULL,
time_series_tables = NULL,
geographic_extents = NULL,
shapefiles = NULL,
breakdown_and_data_type_layout = NULL,
tst_layout = NULL,
data_format = NULL,
...
)
Arguments
extract |
An |
description |
Description of the extract. |
datasets |
List of If a dataset already exists in the extract, its new specifications will be added to those that already exist for that dataset. |
time_series_tables |
For NHGIS extracts, list of If a time series table already exists in the extract, its new specifications will be added to those that already exist for that time series table. |
geographic_extents |
For NHGIS extracts, vector of geographic
extents to use for all of the Use |
shapefiles |
For NHGIS extracts, names of any shapefiles to include in the extract request. |
breakdown_and_data_type_layout |
For NHGIS extracts, the desired
layout of any
Required if any |
tst_layout |
For NHGIS extracts, the desired layout of all
Required when an extract definition includes any |
data_format |
For NHGIS extracts, the desired format of the extract data file.
Note that by default, Required when an extract definition includes any |
... |
Ignored |
Details
For extract fields that take a single value, add_to_extract()
will
replace the existing value with the new value provided for that field.
It is not necessary to first remove this value using
remove_from_extract()
.
If the supplied extract definition comes from a previously submitted extract request, this function will reset the definition to an unsubmitted state.
Value
A modified agg_extract
object
See Also
remove_from_extract()
to remove
values from an extract definition.
define_extract_agg()
to create a new extract definition.
submit_extract()
to submit an extract request.
download_extract()
to download extract data files.
Examples
extract <- define_extract_agg(
"nhgis",
datasets = ds_spec("1990_STF1", c("NP1", "NP2"), "county")
)
# Add a new dataset or time series table to the extract
add_to_extract(
extract,
datasets = ds_spec("1990_STF2a", "NPA1", "county")
)
add_to_extract(
extract,
time_series_tables = tst_spec("A00", "state")
)
# If a dataset/time series table name already exists in the definition
# its specification will be modified by adding the new specifications to
# the existing ones
add_to_extract(
extract,
datasets = ds_spec("1990_STF1", "NP4", "nation")
)
# You can add new datasets and modify existing ones simultaneously by
# providing a list of `ds_spec` objects
add_to_extract(
extract,
datasets = list(
ds_spec("1990_STF1", "NP4", "nation"),
ds_spec("1990_STF2a", "NPA1", "county")
)
)
# Values that can only take a single value are replaced
add_to_extract(extract, data_format = "fixed_width")$data_format
Add values to an existing extract definition for an IPUMS microdata collection
Description
Add new values or replace existing values in an IPUMS microdata extract
definition. All fields are optional, and if omitted, will be unchanged.
Supplying a value
for fields that take a single value, such as description
and data_format
,
will replace the existing value with the supplied value.
This function is marked as experimental because it is typically not the best
option for maintaining reproducible extract definitions and may be retired
in the future. For reproducibility, users should strive to build extract
definitions with define_extract_micro()
.
If you have a complicated extract definition to revise, but do not have
the original extract definition code that created it, we suggest that you
save the revised extract as a JSON file with save_extract_as_json()
. This
will create a stable version of the extract definition that
can be used in the future as needed.
To remove existing values from an IPUMS microdata extract definition, use
remove_from_extract()
.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
## S3 method for class 'micro_extract'
add_to_extract(
extract,
description = NULL,
samples = NULL,
variables = NULL,
time_use_variables = NULL,
sample_members = NULL,
data_format = NULL,
data_structure = NULL,
rectangular_on = NULL,
case_select_who = NULL,
data_quality_flags = NULL,
...
)
Arguments
extract |
An |
description |
Description of the extract. |
samples |
Vector of samples to include in the extract
request. Use |
variables |
Character vector of variable names or a list of
If a variable already exists in the extract, its specifications will be added to those that already exist for that variable. |
time_use_variables |
Vector of names of IPUMS-defined time use variables
or a list of specifications for user-defined time use variables
to include in the extract request. Use |
sample_members |
Indication of whether to include additional sample
members in the extract request. If provided, must be one of
Sample member selection is only available for the IPUMS ATUS collection
( |
data_format |
Format for the output extract data file. Either
Note that while |
data_structure |
Data structure for the output extract data.
|
rectangular_on |
If Defaults to |
case_select_who |
Indication of how to interpret any case selections included for variables in the extract definition.
Defaults to |
data_quality_flags |
Set to Use |
... |
Ignored |
Details
If the supplied extract definition comes from a previously submitted extract request, this function will reset the definition to an unsubmitted state.
To modify variable-specific parameters for variables that already exist
in the extract, create a new variable specification with var_spec()
.
Value
A modified micro_extract
object
See Also
remove_from_extract()
to remove
values from an extract definition.
submit_extract()
to submit an extract request.
download_extract()
to download extract data files.
define_extract_micro()
to create a new extract
definition from scratch
Examples
extract <- define_extract_micro(
collection = "usa",
description = "2013 ACS Data",
samples = "us2013a",
variables = c("SEX", "AGE", "YEAR")
)
# Add a single sample
add_to_extract(extract, samples = "us2014a")
# Add samples and variables
extract2 <- add_to_extract(
extract,
samples = "us2014a",
variables = c("MARST", "BIRTHYR")
)
# Modify specifications for variables in the extract by using `var_spec()`
# with the existing variable name:
add_to_extract(
extract,
samples = "us2014a",
variables = var_spec("SEX", case_selections = "2")
)
# You can make multiple modifications or additions by providing a list
# of `var_spec()` objects:
add_to_extract(
extract,
samples = "us2014a",
variables = list(
var_spec("RACE", attached_characteristics = "mother"),
var_spec("SEX", case_selections = "2"),
var_spec("RELATE")
)
)
# Values that only take a single value are replaced
add_to_extract(extract, description = "New description")$description
Define an extract request for an IPUMS aggregate data collection
Description
Define the parameters of an IPUMS aggregate data extract request to be submitted via the IPUMS API.
The IPUMS API currently supports the following aggregate data collections:
Note that not all extract request parameters and options apply to all collections. For a summary of supported features by collection, see the details below and the IPUMS API documentation.
Use get_metadata_catalog()
and get_metadata()
to browse and identify
data sources for use in an extract definition.
Learn more about the IPUMS API in vignette("ipums-api")
and
aggregate data extract definitions in vignette("ipums-api-agg")
.
Usage
define_extract_agg(
collection,
description = "",
datasets = NULL,
time_series_tables = NULL,
shapefiles = NULL,
geographic_extents = NULL,
breakdown_and_data_type_layout = NULL,
tst_layout = NULL,
data_format = NULL
)
Arguments
collection |
Code for the IPUMS collection represented by this
extract request. Currently, |
description |
Description of the extract. |
datasets |
List of dataset specifications for any
datasets to include in the extract request. Use |
time_series_tables |
For NHGIS extracts, list of time series
table specifications for any
time series tables
to include in the extract request. Use |
shapefiles |
For NHGIS extracts, names of any shapefiles to include in the extract request. |
geographic_extents |
For NHGIS extracts, vector of geographic
extents to use for all of the Use |
breakdown_and_data_type_layout |
For NHGIS extracts, the desired
layout of any
Required if any |
tst_layout |
For NHGIS extracts, the desired layout of all
Required when an extract definition includes any |
data_format |
For NHGIS extracts, the desired format of the extract data file.
Note that by default, Required when an extract definition includes any |
Details
IPUMS NHGIS
An NHGIS extract definition (collection = "nhgis"
) must include at
least one dataset, time series table, or shapefile specification.
Create a dataset specification with ds_spec()
. Each dataset
must be associated with a selection of data_tables
and geog_levels
. Some
datasets also support the selection of years
and breakdown_values
.
Create an NHGIS time series table specification with tst_spec()
. Each time
series table must be associated with a selection of geog_levels
and
may optionally be associated with a selection of years
.
IPUMS IHGIS
An IHGIS extract definition (collection = "ihgis"
) must include a dataset
specification. IHGIS does not support time series table or shapefile
specifications.
Create a dataset specification with ds_spec()
. Each dataset must be
associated with a selection of data_tables
and tabulation_geographies
.
See examples or vignette("ipums-api-agg")
for more details about
specifying datasets and time series tables in an aggregate data extract
definition.
Value
An object of class agg_extract
containing
the extract definition.
See Also
get_metadata_catalog()
and get_metadata()
to find data to include in
an extract definition.
submit_extract()
to submit an extract request for processing.
save_extract_as_json()
and define_extract_from_json()
to share an
extract definition.
Examples
# Extract definition for tables from an NHGIS dataset
# Use `ds_spec()` to create an NHGIS dataset specification
nhgis_extract <- define_extract_agg(
"nhgis",
description = "Example NHGIS extract",
datasets = ds_spec(
"1990_STF3",
data_tables = "NP57",
geog_levels = c("county", "tract")
)
)
nhgis_extract
# Extract definition for tables from an IHGIS dataset
define_extract_agg(
"ihgis",
description = "Example IHGIS extract",
datasets = ds_spec(
"KZ2009pop",
data_tables = c("KZ2009pop.AAA", "KZ2009pop.AAB"),
tabulation_geographies = c("KZ2009pop.g0", "KZ2009pop.g1")
)
)
# Use `tst_spec()` to create an NHGIS time series table specification
define_extract_agg(
"nhgis",
description = "Example NHGIS extract",
time_series_tables = tst_spec("CL8", geog_levels = "county"),
tst_layout = "time_by_row_layout"
)
# To request multiple datasets, provide a list of `ds_spec` objects
define_extract_agg(
"nhgis",
description = "Extract definition with multiple datasets",
datasets = list(
ds_spec("2014_2018_ACS5a", "B01001", c("state", "county")),
ds_spec("2015_2019_ACS5a", "B01001", c("state", "county"))
)
)
# If you need to specify the same table or geographic level for
# many datasets, you may want to make a set of datasets before defining
# your extract request:
dataset_names <- c("2014_2018_ACS5a", "2015_2019_ACS5a")
dataset_spec <- purrr::map(
dataset_names,
~ ds_spec(
.x,
data_tables = "B01001",
geog_levels = c("state", "county")
)
)
define_extract_agg(
"nhgis",
description = "Extract definition with multiple datasets",
datasets = dataset_spec
)
# You can request datasets, time series tables, and shapefiles in the same
# definition:
define_extract_agg(
"nhgis",
description = "Extract with datasets and time series tables",
datasets = ds_spec("1990_STF1", c("NP1", "NP2"), "county"),
time_series_tables = tst_spec("CL6", "state"),
shapefiles = "us_county_1990_tl2008"
)
# Geographic extents are applied to all datasets/time series tables in the
# definition
define_extract_agg(
"nhgis",
description = "Extent selection",
datasets = list(
ds_spec("2018_2022_ACS5a", "B01001", "blck_grp"),
ds_spec("2017_2021_ACS5a", "B01001", "blck_grp")
),
geographic_extents = c("010", "050")
)
# Extract specifications can be indexed by name
names(nhgis_extract$datasets)
nhgis_extract$datasets[["1990_STF3"]]
## Not run:
# Use the extract definition to submit an extract request to the API
submit_extract(nhgis_extract)
## End(Not run)
Define an extract request for an IPUMS microdata collection
Description
Define the parameters of an IPUMS microdata extract request to be submitted via the IPUMS API.
The IPUMS API currently supports the following microdata collections:
Note that not all extract request parameters and options apply to all collections. For a summary of supported features by collection, see the IPUMS API documentation.
Learn more about the IPUMS API in vignette("ipums-api")
and
microdata extract definitions in vignette("ipums-api-micro")
.
Usage
define_extract_micro(
collection,
description,
samples,
variables = NULL,
time_use_variables = NULL,
sample_members = NULL,
data_format = "fixed_width",
data_structure = "rectangular",
rectangular_on = NULL,
case_select_who = "individuals",
data_quality_flags = NULL
)
Arguments
collection |
Code for the IPUMS collection represented by this
extract request. See |
description |
Description of the extract. |
samples |
Vector of samples to include in the extract
request. Use |
variables |
Vector of variable names or a list of detailed
variable specifications to include in the extract
request. Use |
time_use_variables |
Vector of names of IPUMS-defined time use variables
or a list of specifications for user-defined time use variables
to include in the extract request. Use Time use variables are only available for IPUMS Time Use collections
( |
sample_members |
Indication of whether to include additional sample
members in the extract request. If provided, must be one of
Sample member selection is only available for the IPUMS ATUS collection
( |
data_format |
Format for the output extract data file. Either
Note that while Defaults to |
data_structure |
Data structure for the output extract data.
Defaults to |
rectangular_on |
If Defaults to |
case_select_who |
Indication of how to interpret any case selections included for variables in the extract definition.
Defaults to |
data_quality_flags |
Set to Use |
Value
An object of class micro_extract
containing
the extract definition.
See Also
submit_extract()
to submit an extract request for processing.
save_extract_as_json()
and define_extract_from_json()
to share an
extract definition.
Examples
usa_extract <- define_extract_micro(
collection = "usa",
description = "2013-2014 ACS Data",
samples = c("us2013a", "us2014a"),
variables = c("SEX", "AGE", "YEAR")
)
usa_extract
# Use `var_spec()` to created detailed variable specifications:
usa_extract <- define_extract_micro(
collection = "usa",
description = "Example USA extract definition",
samples = c("us2013a", "us2014a"),
variables = var_spec(
"SEX",
case_selections = "2",
attached_characteristics = c("mother", "father")
)
)
# For multiple variables, provide a list of `var_spec` objects and/or
# variable names.
cps_extract <- define_extract_micro(
collection = "cps",
description = "Example CPS extract definition",
samples = c("cps2020_02s", "cps2020_03s"),
variables = list(
var_spec("AGE", data_quality_flags = TRUE),
var_spec("SEX", case_selections = "2"),
"RACE"
)
)
cps_extract
# To recycle specifications to many variables, it may be useful to
# create variables prior to defining the extract:
var_names <- c("AGE", "SEX")
my_vars <- purrr::map(
var_names,
~ var_spec(.x, attached_characteristics = "mother")
)
ipumsi_extract <- define_extract_micro(
collection = "ipumsi",
description = "Extract definition with predefined variables",
samples = c("br2010a", "cl2017a"),
variables = my_vars
)
# Extract specifications can be indexed by name
names(ipumsi_extract$samples)
names(ipumsi_extract$variables)
ipumsi_extract$variables$AGE
# IPUMS Time Use collections allow selection of IPUMS-defined and
# user-defined time use variables:
define_extract_micro(
collection = "atus",
description = "ATUS extract with time use variables",
samples = "at2007",
time_use_variables = list(
"ACT_PCARE",
tu_var_spec(
"MYTIMEUSEVAR",
owner = "example@example.com"
)
)
)
## Not run:
# Use the extract definition to submit an extract request to the API
submit_extract(usa_extract)
## End(Not run)
Define an IPUMS NHGIS extract request
Description
Define the parameters of an IPUMS NHGIS extract request to be submitted via the IPUMS API.
This function has been deprecated in favor of define_extract_agg()
,
which can be used to define extracts for both IPUMS aggregate data
collections (IPUMS NHGIS and IPUMS IHGIS). Please use that function instead.
All NHGIS extract request parameters supported by define_extract_nhgis()
are supported by define_extract_agg()
.
Learn more about the IPUMS API in vignette("ipums-api")
and
NHGIS extract definitions in vignette("ipums-api-agg")
.
Usage
define_extract_nhgis(
description = "",
datasets = NULL,
time_series_tables = NULL,
shapefiles = NULL,
geographic_extents = NULL,
breakdown_and_data_type_layout = NULL,
tst_layout = NULL,
data_format = NULL
)
Arguments
description |
Description of the extract. |
datasets |
List of dataset specifications for any
datasets
to include in the extract request. Use |
time_series_tables |
List of time series table specifications for any
time series tables
to include in the extract request. Use |
shapefiles |
Names of any shapefiles to include in the extract request. |
geographic_extents |
Vector of geographic extents to use for
all of the Use |
breakdown_and_data_type_layout |
The desired layout
of any
Required if any |
tst_layout |
The desired layout of all
Required when an extract definition includes any |
data_format |
The desired format of the extract data file.
Note that by default, Required when an extract definition includes any |
Value
An object of class nhgis_extract
containing
the extract definition.
See Also
get_metadata_catalog()
to find data to include in an extract definition.
submit_extract()
to submit an extract request for processing.
save_extract_as_json()
and define_extract_from_json()
to share an
extract definition.
Examples
# Previously, you could create an NHGIS extract definition like so:
nhgis_extract <- define_extract_nhgis(
description = "Example NHGIS extract",
datasets = ds_spec(
"1990_STF3",
data_tables = "NP57",
geog_levels = c("county", "tract")
)
)
# Now, use the following:
nhgis_extract <- define_extract_agg(
collection = "nhgis",
description = "Example NHGIS extract",
datasets = ds_spec(
"1990_STF3",
data_tables = "NP57",
geog_levels = c("county", "tract")
)
)
Download a completed IPUMS data extract
Description
Download IPUMS data extract files via the IPUMS API and save them on your computer.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
download_extract(
extract,
download_dir = getwd(),
overwrite = FALSE,
progress = TRUE,
api_key = Sys.getenv("IPUMS_API_KEY")
)
Arguments
extract |
One of:
For a list of codes used to refer to each collection, see
|
download_dir |
Path to the directory where the files should be written. Defaults to current working directory. |
overwrite |
If |
progress |
If |
api_key |
API key associated with your user account. Defaults to the
value of the |
Details
For NHGIS extracts, data files and GIS files (shapefiles) will be saved in
separate .zip archives. download_extract()
will return a character vector
including the file paths to all downloaded files.
For microdata extracts, only the file path to the downloaded .xml DDI file will be returned, as it is sufficient for reading the data provided in the associated .dat.gz data file.
Value
The path(s) to the files required to read the data requested in the extract, invisibly.
For NHGIS, paths will be named with either "data"
(for tabular data
files) or "shape"
(for spatial data files) to
indicate the type of data the file contains.
See Also
read_ipums_micro()
or read_ipums_agg()
to read tabular
data from an IPUMS extract.
read_ipums_sf()
to read spatial data from an IPUMS extract.
ipums_list_files()
to list files in an IPUMS extract.
Examples
usa_extract <- define_extract_micro(
collection = "usa",
description = "2013-2014 ACS Data",
samples = c("us2013a", "us2014a"),
variables = c("SEX", "AGE", "YEAR")
)
## Not run:
submitted_extract <- submit_extract(usa_extract)
downloadable_extract <- wait_for_extract(submitted_extract)
# For microdata, the path to the DDI .xml codebook file is provided.
usa_xml_file <- download_extract(downloadable_extract)
# Load with a `read_ipums_micro_*()` function
usa_data <- read_ipums_micro(usa_xml_file)
# You can also download previous extracts with their collection and number:
nhgis_files <- download_extract("nhgis:1")
# NHGIS extracts return a path to both the tabular and spatial data files,
# as applicable.
nhgis_data <- read_ipums_agg(data = nhgis_files["data"])
# Load NHGIS spatial data
nhgis_geog <- read_ipums_sf(data = nhgis_files["shape"])
## End(Not run)
Download IPUMS supplemental data files
Description
Some IPUMS collections provide supplemental data files that are available outside of the IPUMS extract system. Use this function to download these files.
Currently, only IPUMS NHGIS files are supported.
In general, files found on an IPUMS project website that include
secure-assets
in their URL are available as supplemental data. See the
IPUMS developer documentation
for more information on available endpoints.
Usage
download_supplemental_data(
collection,
path,
download_dir = getwd(),
overwrite = FALSE,
progress = TRUE,
api_key = Sys.getenv("IPUMS_API_KEY")
)
Arguments
collection |
Code for the IPUMS collection represented by this
extract request. Currently, only |
path |
Path to the supplemental data file to download. See examples. |
download_dir |
Path to the directory where the files should be written. Defaults to current working directory. |
overwrite |
If |
progress |
If |
api_key |
API key associated with your user account. Defaults to the
value of the |
Value
The path to the downloaded supplemental data file
Examples
## Not run:
# Download a state-level tract to county crosswalk from NHGIS
file <- download_supplemental_data(
"nhgis",
"crosswalks/nhgis_tr1990_co2010_state/nhgis_tr1990_co2010_10.zip"
)
read_ipums_agg(file)
# Download 1980 Minnesota block boundary file
file <- download_supplemental_data(
"nhgis",
"blocks-1980/MN_block_1980.zip"
)
read_ipums_sf(file)
## End(Not run)
Create dataset and time series table specifications for IPUMS aggregate data extract definitions
Description
Provide specifications for individual datasets and time series tables when defining an IPUMS aggregate data extract request. This includes extract requests for IPUMS NHGIS and IPUMS IHGIS.
Use get_metadata()
to identify available values for dataset and
time series table specification parameters.
Learn more about aggregate data extract definitions in
vignette("ipums-api-agg")
.
Usage
ds_spec(
name,
data_tables = NULL,
geog_levels = NULL,
years = NULL,
breakdown_values = NULL,
tabulation_geographies = NULL
)
tst_spec(name, geog_levels = NULL, years = NULL)
Arguments
name |
Name of the dataset or (for IPUMS NHGIS) time series table. |
data_tables |
Vector of summary tables to retrieve for the given dataset. |
geog_levels |
Geographic levels
(e.g. Only applicable for IPUMS NHGIS extract definitions. |
years |
Years for which to obtain the data for the given dataset or time series table. For time series tables, all years are selected by default. For datasets,
use Only applicable for IPUMS NHGIS extract definitions. |
breakdown_values |
Breakdown values to apply to the given dataset. Only applicable for IPUMS NHGIS extract definitions. |
tabulation_geographies |
Tabulation geographies to apply to the given dataset. These represent the level of geographic aggregation for the requested data. Only applicable for IPUMS IHGIS extract definitions. |
Details
For IPUMS NHGIS extract definitions, data_tables
and geog_levels
are
required for all dataset specifications, and geog_levels
are required
for all time series table specifications.
For IPUMS IHGIS extract definitions, data_tables
and
tabulation_geographies
are required for all dataset specifications.
However, it is possible to make a temporary specification for an incomplete
dataset or time series table by omitting required values. This supports the
syntax used when modifying an existing extract (see
add_to_extract()
or
remove_from_extract()
).
Value
A ds_spec
or tst_spec
object.
Examples
dataset <- ds_spec(
"2013_2017_ACS5a",
data_tables = c("B00001", "B01002"),
geog_levels = "state"
)
tst <- tst_spec(
"CW5",
geog_levels = c("county", "tract"),
years = "1990"
)
# Use variable specifications in an extract definition:
define_extract_agg(
"nhgis",
description = "Example extract",
datasets = dataset,
time_series_tables = tst
)
# IHGIS datasets need a `tabulation_geographies` specification:
define_extract_agg(
"ihgis",
description = "Example extract",
datasets = ds_spec(
"AL2001pop",
data_tables = "AL2001pop.ADF",
tabulation_geographies = c("AL2001pop.g0", "AL2001pop.g1")
)
)
Browse definitions of previously submitted extract requests
Description
Retrieve definitions of an arbitrary number of previously submitted extract requests for a given IPUMS collection, starting from the most recent extract request.
To check the status of a particular extract request, use
get_extract_info()
.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
get_extract_history(
collection = NULL,
how_many = 10,
delay = 0,
api_key = Sys.getenv("IPUMS_API_KEY")
)
Arguments
collection |
Character string of the IPUMS collection for which to
retrieve extract history. Defaults to the current default
collection, if it exists. See For a list of codes used to refer to each collection, see
|
how_many |
The number of extract requests for which to retrieve information. Defaults to the 10 most recent extracts. |
delay |
Number of seconds to delay between successive API requests, if multiple requests are needed to retrieve all records. A delay is highly unlikely to be necessary and is intended only as a fallback in the event that you cannot retrieve your extract history without exceeding the API rate limit. |
api_key |
API key associated with your user account. Defaults to the
value of the |
Value
A list of ipums_extract
objects
See Also
get_extract_info()
to get the current status of a specific extract request.
Examples
## Not run:
# Get information for most recent extract requests.
# By default gets the most recent 10 extracts
get_extract_history("usa")
# Return only the most recent 3 extract definitions
get_extract_history("cps", how_many = 3)
# To get the most recent extract (for instance, if you have forgotten its
# extract number), use `get_last_extract_info()`
get_last_extract_info("nhgis")
## End(Not run)
# To browse your extract history by particular criteria, you can
# loop through the extract objects. We'll create a sample list of 2 extracts:
extract1 <- define_extract_micro(
collection = "usa",
description = "2013 ACS",
samples = "us2013a",
variables = var_spec(
"SEX",
case_selections = "2",
data_quality_flags = TRUE
)
)
extract2 <- define_extract_micro(
collection = "usa",
description = "2014 ACS",
samples = "us2014a",
variables = list(
var_spec("RACE"),
var_spec(
"SEX",
case_selections = "1",
data_quality_flags = FALSE
)
)
)
extracts <- list(extract1, extract2)
# `purrr::keep()`` is particularly useful for filtering:
purrr::keep(extracts, ~ "RACE" %in% names(.x$variables))
purrr::keep(extracts, ~ grepl("2014 ACS", .x$description))
# You can also filter on variable-specific criteria
purrr::keep(extracts, ~ isTRUE(.x$variables[["SEX"]]$data_quality_flags))
# To filter based on all variables in an extract, you'll need to
# create a nested loop. For instance, to find all extracts that have
# any variables with data_quality_flags:
purrr::keep(
extracts,
function(extract) {
any(purrr::map_lgl(
names(extract$variables),
function(var) isTRUE(extract$variables[[var]]$data_quality_flags)
))
}
)
# To peruse your extract history without filtering, `purrr::map()` is more
# useful
purrr::map(extracts, ~ names(.x$variables))
purrr::map(extracts, ~ names(.x$samples))
purrr::map(extracts, ~ .x$variables[["RACE"]]$case_selections)
# Once you have identified a past extract, you can easily download or
# resubmit it
## Not run:
extracts <- get_extract_history("nhgis")
extract <- purrr::keep(
extracts,
~ "CW3" %in% names(.x$time_series_tables)
)
download_extract(extract[[1]])
## End(Not run)
Retrieve the definition and latest status of an extract request
Description
Retrieve the latest status of an extract request.
get_last_extract_info()
is a convenience function to retrieve the most
recent extract for a given collection.
To browse definitions of your previously submitted extract requests, see
get_extract_history()
.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
get_extract_info(extract, api_key = Sys.getenv("IPUMS_API_KEY"))
get_last_extract_info(collection = NULL, api_key = Sys.getenv("IPUMS_API_KEY"))
Arguments
extract |
One of:
For a list of codes used to refer to each collection, see
|
api_key |
API key associated with your user account. Defaults to the
value of the |
collection |
Character string of the IPUMS collection for which to
retrieve extract history. Defaults to the current default
collection, if it exists. See For a list of codes used to refer to each collection, see
|
Value
An ipums_extract
object.
See Also
get_extract_history()
to browse past extract definitions
wait_for_extract()
to wait for an extract to finish processing.
download_extract()
to download an extract's data files.
save_extract_as_json()
and define_extract_from_json()
to share an
extract definition.
Examples
my_extract <- define_extract_micro(
collection = "usa",
description = "2013-2014 ACS Data",
samples = c("us2013a", "us2014a"),
variables = c("SEX", "AGE", "YEAR")
)
## Not run:
submitted_extract <- submit_extract(my_extract)
# Get latest info for the request associated with a given `ipums_extract`
# object:
updated_extract <- get_extract_info(submitted_extract)
updated_extract$status
# Or specify the extract collection and number:
get_extract_info("usa:1")
get_extract_info(c("usa", 1))
# If you have a default collection, you can use the extract number alone:
set_ipums_default_collection("nhgis")
get_extract_info(1)
# To get the most recent extract (for instance, if you have forgotten its
# extract number), use `get_last_extract_info()`
get_last_extract_info("nhgis")
## End(Not run)
Retrieve detailed metadata about an IPUMS data source
Description
Retrieve metadata containing API codes and descriptions for an IPUMS data source. See the IPUMS developer documentation for details about the metadata provided for individual data collections and API endpoints.
To retrieve a summary of all available data sources of a particular
type, use get_metadata_catalog()
. This output can be used to identify the
names of data sources for which to request detailed metadata.
Currently, comprehensive metadata is only available for IPUMS NHGIS
and IPUMS IHGIS. See get_sample_info()
to list basic sample information
for IPUMS microdata collections.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
get_metadata(
collection,
dataset = NULL,
data_table = NULL,
time_series_table = NULL,
api_key = Sys.getenv("IPUMS_API_KEY")
)
Arguments
collection |
Character string indicating the IPUMS collection for which to retrieve metadata. |
dataset |
Name of an individual dataset from an IPUMS aggregate data collection for which to retrieve metadata. |
data_table |
Name of an individual data table from an IPUMS aggregate
data collection for which to retrieve metadata. If provided and
|
time_series_table |
If |
api_key |
API key associated with your user account. Defaults to the
value of the |
Value
A named list of metadata for the specified data source.
See Also
get_metadata_catalog()
to obtain a summary of available data sources for
a given IPUMS data collection.
define_extract_agg()
to create an IPUMS aggregate data extract
definition.
Examples
## Not run:
library(dplyr)
# Get detailed metadata for a single source with its associated argument:
cs5_meta <- get_metadata("nhgis", time_series_table = "CS5")
cs5_meta$geog_levels
# Use the available values when defining an NHGIS extract request
define_extract_agg(
"nhgis",
time_series_tables = tst_spec("CS5", geog_levels = "state")
)
# Detailed metadata is also provided for datasets and data tables
get_metadata("nhgis", dataset = "1990_STF1")
get_metadata("nhgis", data_table = "NP1", dataset = "1990_STF1")
get_metadata("ihgis", dataset = "KZ2009pop")
# Iterate over data sources to retrieve detailed metadata for several
# records. For instance, to get variable metadata for a set of data tables:
tables <- c("NP1", "NP2", "NP10")
var_meta <- purrr::map(
tables,
function(dt) {
dt_meta <- get_metadata("nhgis", dataset = "1990_STF1", data_table = dt)
# This ensures you avoid hitting rate limit for large numbers of tables
Sys.sleep(1)
dt_meta$variables
}
)
## End(Not run)
Retrieve a catalog of available data sources for an IPUMS collection
Description
Retrieve summary metadata containing API codes and descriptions for all
available data sources of a given type for an IPUMS data collection.
See the IPUMS developer documentation
for details about the metadata provided for individual data collections
and API endpoints. Use catalog_types()
to determine available metadata
endpoints by collection.
To retrieve detailed metadata about a particular data source,
use get_metadata()
.
Currently, comprehensive metadata is only available for IPUMS NHGIS and IPUMS IHGIS, but a listing of samples is available for IPUMS microdata collections.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
get_metadata_catalog(
collection,
metadata_type,
delay = 0,
api_key = Sys.getenv("IPUMS_API_KEY")
)
catalog_types(collection)
Arguments
collection |
Character string indicating the IPUMS collection for which to retrieve metadata. |
metadata_type |
The type of data source for which to retrieve summary
metadata. Use |
delay |
Number of seconds to delay between successive API requests, if multiple requests are needed to retrieve all records. A delay is highly unlikely to be necessary and is intended only as a fallback in the event that you cannot retrieve all metadata records without exceeding the API rate limit. |
api_key |
API key associated with your user account. Defaults to the
value of the |
Value
A tibble
containing the catalog of
all data sources for the given collection
and metadata_type
.
For catalog_types()
, a character vector of valid catalog endpoints
for a given collection.
See Also
get_metadata()
to obtain detailed metadata for a single data source.
define_extract_agg()
to create an IPUMS aggregate data extract
definition.
define_extract_micro()
to create an IPUMS microdata extract definition.
Examples
# List available metadata catalog endpoints:
catalog_types("nhgis")
catalog_types("ihgis")
## Not run:
# Get summary metadata for all available sources of a given data type
get_metadata_catalog("nhgis", "datasets")
get_metadata_catalog("ihgis", "tabulation_geographies")
# Filter to identify data sources of interest by their metadata values
all_tsts <- get_metadata_catalog("nhgis", "time_series_tables")
tsts <- all_tsts %>%
filter(
grepl("Children", description),
grepl("Families", description),
geographic_integration == "Standardized to 2010"
)
tsts$name
## End(Not run)
List available data sources from IPUMS NHGIS
Description
This function has been deprecated because the IPUMS API now supports
metadata endpoints for multiple data collections. To obtain summary metadata,
please use get_metadata_catalog()
. To obtain detailed metadata, please use
get_metadata()
.
Learn more about the IPUMS API in vignette("ipums-api")
and
aggregate data extract definitions in vignette("ipums-api-agg")
.
Usage
get_metadata_nhgis(
type = NULL,
dataset = NULL,
data_table = NULL,
time_series_table = NULL,
delay = 0,
api_key = Sys.getenv("IPUMS_API_KEY")
)
Arguments
type |
One of |
dataset |
Name of an individual dataset for which to retrieve metadata. |
data_table |
Name of an individual data table for which to retrieve
metadata. If provided, an associated |
time_series_table |
Name of an individual time series table for which to retrieve metadata. |
delay |
Number of seconds to delay between successive API requests, if multiple requests are needed to retrieve all records. A delay is highly unlikely to be necessary and is intended only as a fallback in the event that you cannot retrieve all metadata records without exceeding the API rate limit. Only used if |
api_key |
API key associated with your user account. Defaults to the
value of the |
Value
If type
is provided, a tibble
of
summary metadata for all data sources of the provided type
.
Otherwise, a named list of metadata for the specified dataset
,
data_table
, or time_series_table
.
Metadata availability
The following sections summarize the metadata fields provided for each data type. Summary metadata include a subset of the fields provided for individual data sources.
Datasets:
-
name
: The unique identifier for the dataset. This is the value that is used to refer to the dataset when interacting with the IPUMS API. -
group
: The group of datasets to which the dataset belongs. For instance, 5 separate datasets are part of the"2015 American Community Survey"
group. -
description
: A short description of the dataset. -
sequence
: Order in which the dataset will appear in the metadata API and extracts. -
has_multiple_data_types
: Logical value indicating whether multiple data types exist for this dataset. For example, ACS datasets include both estimates and margins of error. -
data_tables
: Atibble
containing names, codes, and descriptions for all data tables available for the dataset. -
geog_levels
: Atibble
containing names, descriptions, and extent information for the geographic levels available for the dataset. Thehas_geog_extent_selection
field contains logical values indicating whether extent selection is allowed for the associated geographic level. Seegeographic_instances
below. -
breakdowns
: Atibble
containing names, types, descriptions, and breakdown values for all breakdowns available for the dataset. -
years
: A vector of years for which the dataset is available. This field is only present if a dataset is available for multiple years. Note that ACS datasets are not considered to be available for multiple years. -
geographic_instances
: Atibble
containing names and descriptions for all valid geographic extents for the dataset. This field is only present if at least one of the dataset'sgeog_levels
allows geographic extent selection.
Data tables:
-
name
: The unique identifier for the data table within its dataset. This is the value that is used to refer to the data table when interacting with the IPUMS API. -
description
: A short description of the data table. -
universe
: The statistical population measured by this data table (e.g. persons, families, occupied housing units, etc.) -
nhgis_code
: The code identifying the data table in the extract. Variables in the extract data will include column names prefixed with this code. -
sequence
: Order in which the data table will appear in the metadata API and extracts. -
dataset_name
: Name of the dataset to which this data table belongs. -
n_variables
: Number of variables included in this data table. -
variables
: Atibble
containing variable descriptions and codes for the variables included in the data table
Time series tables:
-
name
: The unique identifier for the time series table. This is the value that is used to refer to the time series table when interacting with the IPUMS API. -
description
: A short description of the time series table. -
geographic_integration
: The method by which the time series table aligns geographic units across time."Nominal"
integration indicates that geographic units are aligned by name (disregarding changes in unit boundaries)."Standardized"
integration indicates that data from multiple time points are standardized to the indicated year's census units. For more information, click here. -
sequence
: Order in which the time series table will appear in the metadata API and extracts. -
time_series
: Atibble
containing names and descriptions for the individual time series available for the time series table. -
years
: Atibble
containing information on the available data years for the time series table. -
geog_levels
: Atibble
containing names and descriptions for the geographic levels available for the time series table. Thehas_geog_extent_selection
field contains logical values indicating whether extent selection is allowed for the associated geographic level. -
geographic_instances
: Atibble
containing names and descriptions for all valid geographic extents for the time series table. Includes all states or state equivalents that are valid for any year in the time series table. (Some instances may be valid for some but not all years.)
Shapefiles:
-
name
: The unique identifier for the shapefile. This is the value that is used to refer to the shapefile when interacting with the IPUMS API. -
year
: The survey year in which the shapefile's represented areas were used for tabulations, which may be different than the vintage of the represented areas. For more information, click here. -
geographic_level
: The geographic level of the shapefile. -
extent
: The geographic extent covered by the shapefile. -
basis
: The derivation source of the shapefile. -
sequence
: Order in which the shapefile will appear in the metadata API and extracts.
See Also
define_extract_agg()
to create an IPUMS aggregate data extract
definition.
Examples
## Not run:
library(dplyr)
# Get summary metadata for all available sources of a given data type
# Previously:
get_metadata_nhgis("datasets")
# Now:
get_metadata_catalog("nhgis", "datasets")
# Get detailed metadata for a single source with its associated argument
# Previously:
cs5_meta <- get_metadata_nhgis(time_series_table = "CS5")
# Now:
cs5_meta <- get_metadata("nhgis", time_series_table = "CS5")
cs5_meta$geog_levels
# Use the available values when defining an NHGIS extract request
define_extract_agg(
"nhgis",
time_series_tables = tst_spec("CS5", geog_levels = "state")
)
## End(Not run)
List available samples for IPUMS microdata collections
Description
Retrieve sample IDs and descriptions for IPUMS microdata collections.
Currently supported microdata collections are:
IPUMS USA (
"usa"
)IPUMS CPS (
"cps"
)IPUMS International (
"ipumsi"
)IPUMS Time Use (
"atus"
,"ahtus"
,"mtus"
)IPUMS Health Surveys (
"nhis"
,"meps"
)
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
get_sample_info(
collection = NULL,
delay = 0,
api_key = Sys.getenv("IPUMS_API_KEY")
)
Arguments
collection |
Character string indicating the IPUMS microdata collection for which to retrieve sample information. |
delay |
Number of seconds to delay between successive API requests, if multiple requests are needed to retrieve all records. A delay is highly unlikely to be necessary and is intended only as a fallback in the event that you cannot retrieve all metadata records without exceeding the API rate limit. |
api_key |
API key associated with your user account. Defaults to the
value of the |
Value
A tibble
containing sample IDs and
descriptions for the indicated collection.
See Also
define_extract_micro()
to create an IPUMS microdata
extract definition.
Examples
## Not run:
get_sample_info("usa")
get_sample_info("cps")
get_sample_info("ipumsi")
get_sample_info("atus")
get_sample_info("meps")
## End(Not run)
Bind multiple data frames by row, preserving labelled attributes
Description
Analogous to dplyr::bind_rows()
, but preserves the
labelled attributes provided with IPUMS data.
Usage
ipums_bind_rows(..., .id = NULL)
Arguments
... |
Data frames or |
.id |
The name of an optional identifier column. Provide a string to create an output column that identifies each input. The column will use names if available, otherwise it will use positions. |
Value
Returns the same type as the first input. Either a data.frame
,
tbl_df
, or grouped_df
Examples
file <- ipums_example("nhgis0712_csv.zip")
d1 <- read_ipums_agg(
file,
file_select = 1,
verbose = FALSE
)
d2 <- read_ipums_agg(
file,
file_select = 2,
verbose = FALSE
)
# Variables have associated label attributes:
ipums_var_label(d1$PMSAA)
# Preserve labels when binding data sources:
d <- ipums_bind_rows(d1, d2)
ipums_var_label(d$PMSAA)
# dplyr `bind_rows()` drops labels:
d <- dplyr::bind_rows(d1, d2)
ipums_var_label(d$PMSAA)
Callback classes
Description
These classes are used to define callback behaviors for use with
read_ipums_micro_chunked()
. They are based on the
callback
classes from readr, but
have been adapted to include handling of implicit decimal values and
variable/value labeling for use with IPUMS microdata extracts.
Details
- IpumsSideEffectCallback
-
Callback function that is used only for side effects, no results are returned.
Initialize with a function that takes 2 arguments. The first argument (
x
) should correspond to the data chunk and the second (pos
) should correspond to the position of the first observation in the chunk.If the function returns
FALSE
, no more chunks will be read. - IpumsDataFrameCallback
-
Callback function that combines the results from each chunk into a single output
data.frame
(or similar) object.Initialize the same way as you would
IpumsSideEffectCallback
. The provided function should return an object that inherits fromdata.frame
.The results from each application of the callback function will be added to the output
data.frame
. - IpumsListCallback
-
Callback function that returns a list, where each element contains the result from a single chunk.
Initialize the same was as you would
IpumsSideEffectCallback
. - IpumsBiglmCallback
-
Callback function that performs a linear regression on a dataset by chunks using the biglm package.
Initialize with a function that takes 2 arguments: The first argument should correspond to a formula specifying the regression model. The second should correspond to a function that prepares the data before running the regression analysis. This function follows the conventions of the functions used in other callbacks. Any additional arguments passed to this function are passed to biglm.
- IpumsChunkCallback
-
(Advanced) Callback interface definition. All callback functions for IPUMS data should inherit from this class, and should use private method
ipumsify
on the data to handle implicit decimals and value labels.
Collect data into R session with IPUMS attributes
Description
Convenience wrapper around dplyr's collect()
and
set_ipums_var_attributes()
. Use this to attach variable labels when
collecting data from a database.
Usage
ipums_collect(data, ddi, var_attrs = c("val_labels", "var_label", "var_desc"))
Arguments
data |
A dplyr |
ddi |
An ipums_ddi object created with |
var_attrs |
Variable attributes to add to the output. Defaults to
all available attributes.
See |
Value
A local tibble
with the requested
attributes attached.
List IPUMS data collections
Description
List IPUMS data collections with their corresponding codes used by the IPUMS API. Note that some data collections do not yet have API support.
Currently, ipumsr supports extract definitions for the following collections:
IPUMS USA (
"usa"
)IPUMS CPS (
"cps"
)IPUMS International (
"ipumsi"
)IPUMS Time Use (
"atus"
,"ahtus"
,"mtus"
)IPUMS Health Surveys (
"nhis"
,"meps"
)IPUMS NHGIS (
"nhgis"
)IPUMS IHGIS (
"ihgis"
)
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
ipums_data_collections()
Value
A tibble
with four columns containing the
full collection name, the type of data the collection provides,
the collection code used by the IPUMS API, and the
status of API support for the collection.
Examples
ipums_data_collections()
ipums_ddi
class
Description
The ipums_ddi
class provides a data structure for storing the metadata
information contained in IPUMS codebook files. These objects are primarily
used when loading IPUMS data, but can also be
used to explore metadata for an IPUMS extract.
For microdata projects, this information is provided in DDI codebook (.xml) files.
For NHGIS, this information is provided in .txt codebook files.
For IHGIS, this information is provided in a collection of .csv files.
The codebook file contains metadata about the extract files themselves, including file name, file path, and extract date as well as information about variables present in the data, including variable names, descriptions, data types, implied decimals, and positions in the fixed-width files.
This information is used to correctly parse IPUMS fixed-width files and attach additional variable metadata to data upon load.
Note that codebook metadata for aggregate data extracts can also be stored in
an ipums_ddi
object, even though these codebooks are not distributed as
.xml files. These files do not adhere to the same standards as
the DDI codebook files, so some ipums_ddi
fields will be left blank when
reading aggregate data codebooks.
Creating an ipums_ddi
object
To create an
ipums_ddi
object from an IPUMS microdata extract, useread_ipums_ddi()
.To create an
ipums_ddi
object from an IPUMS NHGIS extract, useread_nhgis_codebook()
.To create an
ipums_ddi
object from an IPUMS IHGIS extract, useread_ihgis_codebook()
.
Loading data
To load the data associated with an
ipums_ddi
object, useread_ipums_micro()
,read_ipums_micro_chunked()
, orread_ipums_micro_yield()
View metadata
Use
ipums_var_info()
to explore variable-level metadata for the variables included in a dataset.Use
ipums_file_info()
to explore file-level metadata for an extract.
Get path to IPUMS example datasets
Description
Construct file path to example extracts included with ipumsr. These data are used in package examples and can be used to experiment with ipumsr functionality.
Usage
ipums_example(path = NULL)
Arguments
path |
Name of file. If |
Value
The path to a specific example file or a vector of all available files.
Examples
# List all available example files
ipums_example()
# Get path to a specific example file
file <- ipums_example("cps_00157.xml")
read_ipums_micro(file)
ipums_extract
class
Description
The ipums_extract
class provides a data structure for storing the
extract definition and status of an IPUMS data extract request. Both
submitted and unsubmitted extract requests are stored in ipums_extract
objects.
ipums_extract
objects are further divided into microdata
and aggregate data classes, and will also include
a collection-specific extract subclass to accommodate differences in
extract options and content across collections.
Currently supported collections are:
IPUMS microdata
IPUMS aggregate data
Learn more about the IPUMS API in vignette("ipums-api")
.
Properties
Objects of class ipums_extract
have:
A
class
attribute of the formc("{collection}_extract", "{collection_type}_extract", "ipums_extract")
. For instance,c("cps_extract", "micro_extract", "ipums_extract")
.A base type of
"list"
.A
names
attribute that is a character vector the same length as the underlying list.
All ipums_extract
objects will include several core fields identifying
the extract and its status:
-
collection
: the collection for the extract request. -
description
: the description of the extract request. -
submitted
: logical indicating whether the extract request has been submitted to the IPUMS API for processing. -
download_links
: links to the downloadable data, if the extract request was completed at the time it was last checked. -
number
: the number of the extract request. Withcollection
, this uniquely identifies an extract request for a given user. -
status
: status of the extract request at the time it was last checked. One of"unsubmitted"
,"queued"
,"started"
,"produced"
,"canceled"
,"failed"
, or"completed"
.
Creating or obtaining an extract
Create an
ipums_extract
object from scratch with the appropriatedefine_extract_*()
function.For microdata extracts, use
define_extract_micro()
For aggregate data extracts, use
define_extract_agg()
Use
get_extract_info()
to get the definition and latest status of a previously-submitted extract request.Use
get_extract_history()
to get the definitions and latest status of multiple previously-submitted extract requests.
Submitting an extract
Use
submit_extract()
to submit an extract request for processing through the IPUMS API.Use
wait_for_extract()
to periodically check the status of a submitted extract request until it is ready to download.Use
is_extract_ready()
to manually check whether a submitted extract request is ready to download.
Downloading an extract
Download the data contained in a completed extract with
download_extract()
.
Saving an extract
Save an extract to a JSON-formatted file with
save_extract_as_json()
.Create an
ipums_extract
object from a saved JSON-formatted definition withdefine_extract_from_json()
.
Get file information for an IPUMS extract
Description
Get information about the IPUMS project, date, notes, conditions, and citation requirements for an extract based on an ipums_ddi object.
ipums_conditions()
is a convenience function that provides conditions and
citation information for a recently loaded dataset.
Usage
ipums_file_info(object, type = NULL)
ipums_conditions(object = NULL)
Arguments
object |
An For |
type |
Type of file information to display. If |
Value
For ipums_file_info()
, if type = NULL
, a named list of metadata
information. Otherwise, a string containing the requested information.
Examples
ddi <- read_ipums_ddi(ipums_example("cps_00157.xml"))
ipums_file_info(ddi)
List files contained within a zipped IPUMS extract
Description
Identify the files that can be read from an IPUMS extract.
Usage
ipums_list_files(file, file_select = NULL, types = NULL)
Arguments
file |
Path to a .zip archive containing the IPUMS extract to be examined. |
file_select |
If the path in While less useful, this can also be provided as a string specifying an exact file name or an integer to match files by index position. |
types |
One or more of |
Value
A tibble
containing the types and names of
the available files.
See Also
read_ipums_micro()
or read_ipums_agg()
to read tabular data
from an IPUMS extract.
read_ipums_sf()
to read spatial data from an IPUMS extract.
Examples
nhgis_file <- ipums_example("nhgis0712_csv.zip")
# 2 available data files in this extract (with codebooks)
ipums_list_files(nhgis_file)
# Look for files that match a particular pattern:
ipums_list_files(nhgis_file, file_select = matches("ds136"))
Join tabular data to geographic boundaries
Description
These functions are analogous to dplyr's joins, except that:
They operate on a data frame and an
sf
objectThey retain the variable attributes provided in IPUMS files and loaded by ipumsr data-reading functions
They handle minor incompatibilities between attributes in spatial and tabular data that emerge in some IPUMS files
Usage
ipums_shape_left_join(
data,
shape_data,
by,
suffix = c("", "SHAPE"),
verbose = TRUE
)
ipums_shape_right_join(
data,
shape_data,
by,
suffix = c("", "SHAPE"),
verbose = TRUE
)
ipums_shape_inner_join(
data,
shape_data,
by,
suffix = c("", "SHAPE"),
verbose = TRUE
)
ipums_shape_full_join(
data,
shape_data,
by,
suffix = c("", "SHAPE"),
verbose = TRUE
)
Arguments
data |
A tibble or data frame. Typically, this will contain data that has been aggregated to a specific geographic level. |
shape_data |
An |
by |
Character vector of variables to join by. See |
suffix |
If there are non-joined duplicate variables in the two data sources, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2. Defaults to adding the |
verbose |
If |
Value
An sf
object containing the joined data
Examples
data <- read_ipums_agg(
ipums_example("nhgis0972_csv.zip"),
verbose = FALSE
)
sf_data <- read_ipums_sf(ipums_example("nhgis0972_shape_small.zip"))
joined_data <- ipums_shape_inner_join(data, sf_data, by = "GISJOIN")
colnames(joined_data)
Get contextual information about variables in an IPUMS data source
Description
Summarize the variable metadata for the variables found in an ipums_ddi
object or data frame. Provides descriptions of variable
content (var_label
and var_desc
) as well as labels of particular
values for each variable (val_labels
).
ipums_var_info()
produces a tibble
summary
of multiple variables at once.
ipums_var_label()
, ipums_var_desc()
, and ipums_val_labels()
provide
specific metadata for a single variable.
Usage
ipums_var_info(object, vars = NULL)
ipums_var_label(object, var = NULL)
ipums_var_desc(object, var = NULL)
ipums_val_labels(object, var = NULL)
Arguments
object |
An ipums_ddi object, a data frame containing variable
metadata (as produced by most ipumsr data-reading functions), or
a |
vars , var |
A tidyselect selection identifying
the variable(s) to include in the output. Only |
Details
For ipums_var_info()
, if the provided object
is a
haven::labelled()
vector (i.e. a single column from a data frame), the summary output will
include the variable label, variable description, and value labels, if
applicable.
If it is a data frame, the same information will be
provided for all variables present in the data or to those indicated in
vars
.
If it is an ipums_ddi object, the summary will also include information used when reading the data from disk, including start/end positions for columns in the fixed-width file, implied decimals, and variable types.
Providing an ipums_ddi
object is the most robust way to access
variable metadata, as many data processing operations will remove these
attributes from data frame-like objects.
Value
For ipums_var_info()
, a tibble
containing
variable information.
Otherwise, a length-1 character vector with the requested variable information.
See Also
read_ipums_ddi()
or read_nhgis_codebook()
to read IPUMS metadata files.
Examples
ddi <- read_ipums_ddi(ipums_example("cps_00157.xml"))
# Info for all variables in a data source
ipums_var_info(ddi)
# Metadata for individual variables
ipums_var_desc(ddi, MONTH)
ipums_var_label(ddi, MONTH)
ipums_val_labels(ddi, MONTH)
# NHGIS also supports variable-level metadata, though many fields
# are not relevant and remain blank:
cb <- read_nhgis_codebook(ipums_example("nhgis0972_csv.zip"))
ipums_var_info(cb)
View a static webpage with variable metadata from an IPUMS extract
Description
For a given ipums_ddi
object or data frame, display metadata about
its contents in the RStudio viewer pane. This includes extract-level
information as well as metadata for the variables included in the
input object.
It is also possible to save the output to an external HTML file without launching the RStudio viewer.
Usage
ipums_view(x, out_file = NULL, launch = TRUE)
Arguments
x |
An Note that file-level information (e.g. extract notes) is only
available when |
out_file |
Optional location to save the output HTML file. If |
launch |
Logical indicating whether to launch the HTML file in the
RStudio viewer pane. If |
Details
ipums_view()
requires that the htmltools, shiny, and DT packages are
installed. If launch = TRUE
, RStudio and the rstudioapi package must
also be available.
Note that if launch = FALSE
and out_file
is unspecified, the output
file will be written to a temporary directory. Some operating systems
may be unable to open the HTML file from the temporary directory; we
suggest that you manually specify the out_file
location in this case.
Value
The file path to the output HTML file (invisibly, if launch = TRUE
)
Examples
ddi <- read_ipums_ddi(ipums_example("cps_00157.xml"))
## Not run:
ipums_view(ddi)
ipums_view(ddi, "codebook.html", launch = FALSE)
## End(Not run)
Launch a browser window to an IPUMS metadata page
Description
Launch the documentation webpage for a given
IPUMS project and variable. The project can be provided in the form
of an ipums_ddi
object or can be manually specified.
This provides access to more extensive variable metadata than may be
contained within an ipums_ddi
object itself.
Note that some IPUMS projects (e.g. IPUMS NHGIS) do not have
variable-specific pages. In these cases, ipums_website()
will launch the
project's main data selection page.
Usage
ipums_website(
x,
var = NULL,
launch = TRUE,
verbose = TRUE,
homepage_if_missing = FALSE
)
Arguments
x |
An |
var |
Name of the variable to load. If |
launch |
If |
verbose |
If |
homepage_if_missing |
If |
Details
If launch = TRUE
, you will need a valid registration for the specified
project to successfully launch the webpage.
Not all IPUMS variables are found at webpages that exactly match the variable
names that are included in completed extract files (and ipums_ddi
objects).
Therefore, there may be some projects and variables for which
ipums_website()
will launch the page for a different variable or an
invalid page.
Value
The URL to the IPUMS webpage for the indicated project and variable
(invisibly if launch = TRUE
)
Examples
ddi <- read_ipums_ddi(ipums_example("cps_00157.xml"))
## Not run:
# Launch webpage for particular variable
ipums_website(ddi, "MONTH")
## End(Not run)
# Can also specify an IPUMS project instead of an `ipums_ddi` object
ipums_website("IPUMS CPS", var = "RECTYPE", launch = FALSE)
# Shorthand project names from `ipums_data_collections()` are also accepted:
ipums_website("ipumsi", var = "YEAR", launch = FALSE)
Report on observations dropped during a join
Description
Helper to display observations that were not matched when joining tabular and spatial data.
Usage
join_failures(join_results)
Arguments
join_results |
A data frame that has just been created by an ipums shape join. |
Value
A list of data frames, where the first element (shape
) includes
the observations dropped from the shapefile and the second (data
)
includes the
observations dropped from the data file.
Make a label placeholder object
Description
Define a new label/value pair. For use in functions like lbl_relabel()
and lbl_add()
.
Usage
lbl(...)
Arguments
... |
Either one or two arguments specifying the label ( If arguments are named, they must be named If a single unnamed value is passed, it is used as the |
Details
Several lbl_*()
functions include arguments that can be passed a function
of .val
and/or .lbl
. These refer to the existing values and
labels in the input vector, respectively.
Use .val
to refer to the values in the vector's value labels.
Use .lbl
to refer to the label names in the vector's value labels.
Note that not all lbl_*()
functions support both of these arguments.
Value
A label_placeholder
object
See Also
Other lbl_helpers:
lbl_add()
,
lbl_clean()
,
lbl_define()
,
lbl_na_if()
,
lbl_relabel()
,
zap_ipums_attributes()
Examples
# Label placeholder with no associated value
lbl("New label")
# Label placeholder with a value/label pair
lbl(10, "New label")
# Use placeholders as inputs to other label handlers
x <- haven::labelled(
c(100, 200, 105, 990, 999, 230),
c(`Unknown` = 990, NIU = 999)
)
x <- lbl_add(
x,
lbl(100, "$100"),
lbl(105, "$105"),
lbl(200, "$200"),
lbl(230, "$230")
)
lbl_relabel(x, lbl(9999, "Missing") ~ .val > 900)
Add labels for unlabelled values
Description
Add labels for values that don't already have them in a
labelled
vector.
Usage
lbl_add(x, ...)
lbl_add_vals(x, labeller = as.character, vals = NULL)
Arguments
x |
A |
... |
Arbitrary number of label placeholders created with |
labeller |
A function that takes values being added as an argument and returns the labels to associate with those values. By default, uses the values themselves after converting to character. |
vals |
Vector of values to be labelled. If |
Value
A labelled
vector
See Also
Other lbl_helpers:
lbl()
,
lbl_clean()
,
lbl_define()
,
lbl_na_if()
,
lbl_relabel()
,
zap_ipums_attributes()
Examples
x <- haven::labelled(
c(100, 200, 105, 990, 999, 230),
c(`Unknown` = 990, NIU = 999)
)
# Add new labels manually
lbl_add(
x,
lbl(100, "$100"),
lbl(105, "$105"),
lbl(200, "$200"),
lbl(230, "$230")
)
# Add labels for all unlabelled values
lbl_add_vals(x)
# Update label names while adding
lbl_add_vals(x, labeller = ~ paste0("$", .))
# Add labels for select values
lbl_add_vals(x, vals = c(100, 200))
Clean unused labels
Description
Remove labels that do not appear in the data. When converting labelled values to a factor, this avoids the creation of additional factor levels.
Usage
lbl_clean(x)
Arguments
x |
A |
Value
A labelled
vector
See Also
Other lbl_helpers:
lbl()
,
lbl_add()
,
lbl_define()
,
lbl_na_if()
,
lbl_relabel()
,
zap_ipums_attributes()
Examples
x <- haven::labelled(
c(1, 2, 3, 1, 2, 3, 1, 2, 3),
c(Q1 = 1, Q2 = 2, Q3 = 3, Q4 = 4)
)
lbl_clean(x)
# Compare the factor levels of the normal and cleaned labels after coercion
as_factor(lbl_clean(x))
as_factor(x)
Define labels for an unlabelled vector
Description
Create a labelled
vector from an unlabelled
vector using lbl_relabel()
syntax, allowing for the grouping of multiple
values into a single label. Values not assigned a label remain unlabelled.
Usage
lbl_define(x, ...)
Arguments
x |
An unlabelled vector |
... |
Arbitrary number of two-sided formulas. The left hand side should be a label placeholder created with The right hand side should be a function taking Can be provided as an anonymous function or formula. See Details section. |
Details
Several lbl_*()
functions include arguments that can be passed a function
of .val
and/or .lbl
. These refer to the existing values and
labels in the input vector, respectively.
Use .val
to refer to the values in the vector's value labels.
Use .lbl
to refer to the label names in the vector's value labels.
Note that not all lbl_*()
functions support both of these arguments.
Value
A labelled
vector
See Also
Other lbl_helpers:
lbl()
,
lbl_add()
,
lbl_clean()
,
lbl_na_if()
,
lbl_relabel()
,
zap_ipums_attributes()
Examples
age <- c(10, 12, 16, 18, 20, 22, 25, 27)
# Group age values into two label groups.
# Values not captured by the right hand side functions remain unlabelled
lbl_define(
age,
lbl(1, "Pre-college age") ~ .val < 18,
lbl(2, "College age") ~ .val >= 18 & .val <= 22
)
Convert labelled data values to NA
Description
Convert data values in a labelled
vector
to NA
based on the value labels associated with that vector. Ignores
values that do not have a label.
Usage
lbl_na_if(x, .predicate)
Arguments
x |
A |
.predicate |
A function taking Can be provided as an anonymous function or formula. See Details section. |
Details
Several lbl_*()
functions include arguments that can be passed a function
of .val
and/or .lbl
. These refer to the existing values and
labels in the input vector, respectively.
Use .val
to refer to the values in the vector's value labels.
Use .lbl
to refer to the label names in the vector's value labels.
Note that not all lbl_*()
functions support both of these arguments.
Value
A labelled
vector
See Also
Other lbl_helpers:
lbl()
,
lbl_add()
,
lbl_clean()
,
lbl_define()
,
lbl_relabel()
,
zap_ipums_attributes()
Examples
x <- haven::labelled(
c(10, 10, 11, 20, 30, 99, 30, 10),
c(Yes = 10, `Yes - Logically Assigned` = 11, No = 20, Maybe = 30, NIU = 99)
)
# Convert labelled values greater than 90 to `NA`
lbl_na_if(x, function(.val, .lbl) .val >= 90)
# Can use purrr-style notation
lbl_na_if(x, ~ .lbl %in% c("Maybe"))
# Or refer to named function
na_function <- function(.val, .lbl) .val >= 90
lbl_na_if(x, na_function)
Modify value labels for a labelled vector
Description
Update the mapping between values and labels in a
labelled
vector. These functions allow you to
simultaneously update data values and the existing value labels.
Modifying data values directly does not result in updated value labels.
Use lbl_relabel()
to manually specify new value/label mappings. This
allows for the addition of new labels.
Use lbl_collapse()
to collapse detailed labels into more general
categories. Values can be grouped together and associated with individual
labels that already exist in the labelled
vector.
Unlabelled values will be converted to NA
.
Usage
lbl_relabel(x, ...)
lbl_collapse(x, .fun)
Arguments
x |
A |
... |
Arbitrary number of two-sided formulas. The left hand side should be a label placeholder created with The right hand side should be a function taking Can be provided as an anonymous function or formula. See Details section. |
.fun |
A function taking Can be provided as an anonymous function or formula. See Details section. |
Details
Several lbl_*()
functions include arguments that can be passed a function
of .val
and/or .lbl
. These refer to the existing values and
labels in the input vector, respectively.
Use .val
to refer to the values in the vector's value labels.
Use .lbl
to refer to the label names in the vector's value labels.
Note that not all lbl_*()
functions support both of these arguments.
Value
A labelled
vector
See Also
Other lbl_helpers:
lbl()
,
lbl_add()
,
lbl_clean()
,
lbl_define()
,
lbl_na_if()
,
zap_ipums_attributes()
Examples
x <- haven::labelled(
c(10, 10, 11, 20, 21, 30, 99, 30, 10),
c(
Yes = 10, `Yes - Logically Assigned` = 11,
No = 20, Unlikely = 21, Maybe = 30, NIU = 99
)
)
# Convert cases with value 11 to value 10 and associate with 10's label
lbl_relabel(x, 10 ~ .val == 11)
lbl_relabel(x, lbl("Yes") ~ .val == 11)
# To relabel using new value/label pairs, use `lbl()` to define a new pair
lbl_relabel(
x,
lbl(10, "Yes/Yes-ish") ~ .val %in% c(10, 11),
lbl(90, "???") ~ .val == 99 | .lbl == "Maybe"
)
# Collapse labels to create new label groups
lbl_collapse(x, ~ (.val %/% 10) * 10)
# These are equivalent
lbl_collapse(x, ~ ifelse(.val == 10, 11, .val))
lbl_relabel(x, 11 ~ .val == 10)
Read metadata from an IHGIS extract's codebook files
Description
Read the variable metadata contained in an IHGIS extract into an
ipums_ddi
object.
Because IHGIS variable metadata do not adhere to all the standards of
microdata DDI files, some of the ipums_ddi
fields will not be populated.
This function is marked as experimental while we determine whether there may be a more robust way to standardize codebook reading across IPUMS aggregate data collections.
Usage
read_ihgis_codebook(cb_file, tbls_file = NULL, raw = FALSE)
Arguments
cb_file |
Path to a .zip archive containing an IHGIS extract, an IHGIS
data dictionary ( |
tbls_file |
If |
raw |
If If |
Details
IHGIS extracts store variable and geographic metadata in multiple files:
-
_datadict.csv
contains the data dictionary with metadata about the variables included across all files in the extract. -
_tables.csv
contains metadata about all IHGIS tables included in the extract. -
_geog.csv
contains metadata about the tabulation geographies included for any tables in the extract. -
_codebook.txt
contains table and variable metadata in human readable form and contains citation information for IHGIS data.
By default, read_ihgis_codebook()
uses information from all these files and
assumes they exist in the provided extract (.zip) file or directory.
If you have unzipped your IHGIS extract and moved the _tables.csv
file,
you will need to provide the path to that file in the tbls_file
argument.
Certain variable metadata can still be loaded without the _geog.csv
or
_codebook.txt
files. However, if raw = TRUE
, the _codebook.txt
file
must be present in the .zip archive or provided to cb_file
.
If you no longer have access to these files, consider resubmitting the extract request that produced the data.
Note that IHGIS codebooks contain metadata for all the datasets contained
in a given extract. Individual data files from the extract may not contain
all of the variables shown in the output of read_ihgis_codebook()
.
Value
If raw = FALSE
, an ipums_ddi
object with metadata about the variables
contained in the data for the extract associated with the given cb_file
.
If raw = TRUE
, a character vector with one element for each line of the
given cb_file
.
Examples
ihgis_file <- ipums_example("ihgis0014.zip")
ihgis_cb <- read_ihgis_codebook(ihgis_file)
# Variable labels and descriptions
ihgis_cb$var_info
# Citation information
ihgis_cb$conditions
# If variable metadata have been lost from a data source, reattach from
# the corresponding `ipums_ddi` object:
ihgis_data <- read_ipums_agg(
ihgis_file,
file_select = matches("AAA_g0"),
verbose = FALSE
)
ihgis_data <- zap_ipums_attributes(ihgis_data)
ipums_var_label(ihgis_data$AAA001)
ihgis_data <- set_ipums_var_attributes(ihgis_data, ihgis_cb)
ipums_var_label(ihgis_data$AAA001)
# Load in raw format
ihgis_cb_raw <- read_ihgis_codebook(ihgis_file, raw = TRUE)
# Use `cat()` to display in the R console in human readable format
cat(ihgis_cb_raw[1:21], sep = "\n")
Read data from an IPUMS aggregate data extract
Description
Read a .csv file from an extract downloaded from an IPUMS aggregate data collection (IPUMS NHGIS or IPUMS IHGIS).
To read spatial data from an NHGIS extract, use read_ipums_sf()
.
Usage
read_ipums_agg(
data_file,
file_select = NULL,
vars = NULL,
col_types = NULL,
n_max = Inf,
guess_max = min(n_max, 1000),
var_attrs = c("val_labels", "var_label", "var_desc"),
remove_extra_header = TRUE,
file_encoding = NULL,
verbose = TRUE
)
Arguments
data_file |
Path to a .zip archive containing an IPUMS NHGIS or IPUMS IHGIS extract or a single .csv file from such an extract. |
file_select |
If |
vars |
Names of variables to include in the output. Accepts a
vector of names or a tidyselect selection.
If |
col_types |
One of
See |
n_max |
Maximum number of lines to read. |
guess_max |
For .csv files, maximum number of lines to use for guessing column types. Will never use more than the number of lines read. |
var_attrs |
Variable attributes to add from the codebook (.txt) file included in the extract. Defaults to all available attributes. See |
remove_extra_header |
If This header row is not
usually needed as it contains similar information to that
included in the |
file_encoding |
Encoding for the file to be loaded. For NHGIS extracts, defaults to ISO-8859-1. For IHGIS extracts, defaults to UTF-8. If the default encoding produces unexpected characters, adjust the encoding here. |
verbose |
Logical controlling whether to display output when loading
data. If Will be overridden by |
Value
A tibble
containing the data found in
data_file
See Also
read_ipums_sf()
to read spatial data from an IPUMS extract.
read_nhgis_codebook()
or read_ihgis_codebook()
to read metadata about
an IPUMS aggregate data extract.
ipums_list_files()
to list files in an IPUMS extract.
Examples
nhgis_file <- ipums_example("nhgis0972_csv.zip")
ihgis_file <- ipums_example("ihgis0014.zip")
# Provide the .zip archive directly to load the data inside:
read_ipums_agg(nhgis_file)
# For extracts that contain multiple files, use `file_select` to specify
# a single file to load. This accepts a tidyselect expression:
read_ipums_agg(ihgis_file, file_select = matches("AAA_g0"), verbose = FALSE)
# Or an index position:
read_ipums_agg(ihgis_file, file_select = 2, verbose = FALSE)
# Variable metadata is automatically attached to data, if available
ihgis_data <- read_ipums_agg(ihgis_file, file_select = 2, verbose = FALSE)
ipums_var_info(ihgis_data)
# Column types are inferred from the data. You can
# manually specify column types with `col_types`. This may be useful for
# geographic codes, which should typically be interpreted as character values
read_ipums_agg(nhgis_file, col_types = list(MSA_CMSAA = "c"), verbose = FALSE)
# You can also read in a subset of the data file:
read_ipums_agg(
nhgis_file,
n_max = 15,
vars = c(GISJOIN, YEAR, D6Z002),
verbose = FALSE
)
Read metadata about an IPUMS microdata extract from a DDI codebook (.xml) file
Description
Reads the metadata about an IPUMS extract from a DDI codebook into an ipums_ddi object.
These metadata contains parsing instructions for the associated fixed-width data file, contextual labels for variables and values in the data, and general extract information.
See Downloading IPUMS files below for information about downloading IPUMS DDI codebook files.
Usage
read_ipums_ddi(ddi_file, lower_vars = FALSE)
Arguments
ddi_file |
Path to a DDI .xml file downloaded from IPUMS. See Downloading IPUMS files below. |
lower_vars |
Logical indicating whether to convert variable names to
lowercase. Defaults to |
Value
An ipums_ddi object with metadata information.
Downloading IPUMS files
The DDI codebook (.xml) file provided with IPUMS microdata extracts can be downloaded through the IPUMS extract interface or (for some collections) within R using the IPUMS API.
If using the IPUMS extract interface:
Download the DDI codebook by right clicking on the DDI link in the Codebook column of the extract interface and selecting Save as... (on Safari, you may have to select Download Linked File As...). Be sure that the codebook is downloaded in .xml format.
If using the IPUMS API:
For supported collections, use
download_extract()
to download a completed extract via the IPUMS API. This automatically downloads both the DDI codebook and the data file from the extract and returns the path to the codebook file.
See Also
read_ipums_micro()
, read_ipums_micro_chunked()
and
read_ipums_micro_yield()
to read data from IPUMS microdata extracts.
ipums_var_info()
and ipums_file_info()
to view metadata about an
ipums_ddi object.
ipums_list_files()
to list files in an IPUMS extract.
Examples
# Example codebook file
ddi_file <- ipums_example("cps_00157.xml")
# Load data into an `ipums_ddi` obj
ddi <- read_ipums_ddi(ddi_file)
# Use the object to load its associated data
cps <- read_ipums_micro(ddi)
head(cps)
# Or get metadata information directly
ipums_var_info(ddi)
ipums_file_info(ddi)[1:2]
# If variable metadata have been lost from a data source, reattach from
# its corresponding `ipums_ddi` object:
cps <- zap_ipums_attributes(cps)
ipums_var_label(cps$STATEFIP)
cps <- set_ipums_var_attributes(cps, ddi$var_info)
ipums_var_label(cps$STATEFIP)
Read data from an IPUMS microdata extract
Description
Read a microdata dataset downloaded from the IPUMS extract system.
Two files are required to load IPUMS microdata extracts:
A DDI codebook file (.xml) used to parse the extract's data file
A data file (either .dat.gz or .csv.gz)
See Downloading IPUMS files below for more information about downloading these files.
read_ipums_micro()
and read_ipums_micro_list()
differ in their handling
of extracts that contain multiple record types. See Data structures
below.
Note that Stata, SAS, and SPSS file formats are not supported by ipumsr readers. Convert your extract to fixed-width or CSV format, or see haven for help loading these files.
Usage
read_ipums_micro(
ddi,
vars = NULL,
n_max = Inf,
data_file = NULL,
verbose = TRUE,
var_attrs = c("val_labels", "var_label", "var_desc"),
lower_vars = FALSE
)
read_ipums_micro_list(
ddi,
vars = NULL,
n_max = Inf,
data_file = NULL,
verbose = TRUE,
var_attrs = c("val_labels", "var_label", "var_desc"),
lower_vars = FALSE
)
Arguments
ddi |
Either a path to a DDI .xml file downloaded from
IPUMS, or an
ipums_ddi object parsed by |
vars |
Names of variables to include in the output. Accepts a
vector of names or a tidyselect selection.
If For hierarchical data, the |
n_max |
The maximum number of lines to read. For
|
data_file |
Path to the data (.gz) file associated with
the provided |
verbose |
Logical indicating whether to display IPUMS conditions and progress information. |
var_attrs |
Variable attributes from the DDI to add to the columns of
the output data. Defaults to all available attributes.
See |
lower_vars |
If reading a DDI from a file,
a logical indicating whether to convert variable names to lowercase.
Defaults to This argument will be ignored if argument If |
Value
read_ipums_micro()
returns a single
tibble
object.
read_ipums_micro_list()
returns a list of tibble
objects with one
entry for each record type.
Data structures
Files from IPUMS projects that contain data for multiple types of records (e.g. household records and person records) may be either rectangular or hierarchical.
Rectangular data are transformed such that each row of data represents only one type of record. For instance, each row will represent a person record, and all household-level information for that person will be included in the same row.
Hierarchical data have records of different types interspersed in a single file. For instance, a household record will be included in its own row followed by the person records associated with that household.
Hierarchical data can be read in two different formats:
-
read_ipums_micro()
reads data into atibble
where each row represents a single record, regardless of record type. Variables that do not apply to a particular record type will be filled withNA
in rows of that record type. For instance, a person-specific variable will be missing in all rows associated with household records. -
read_ipums_micro_list()
reads data into a list oftibble
objects, where each list element contains only one record type. Each list element is named with its corresponding record type.
Downloading IPUMS files
You must download both the DDI codebook and the data file from the IPUMS
extract system to load the data into R. read_ipums_micro_*()
functions
assume that the data file and codebook share a common base file name and
are present in the same directory. If this is not the case, provide a
separate path to the data file with the data_file
argument.
If using the IPUMS extract interface:
Download the data file by clicking Download .dat under Download Data.
Download the DDI codebook by right clicking on the DDI link in the Codebook column of the extract interface and selecting Save as... (on Safari, you may have to select Download Linked File as...). Be sure that the codebook is downloaded in .xml format.
If using the IPUMS API:
For supported collections, use
download_extract()
to download a completed extract via the IPUMS API. This automatically downloads both the DDI codebook and the data file from the extract and returns the path to the codebook file.
See Also
read_ipums_micro_chunked()
and
read_ipums_micro_yield()
to read data from large IPUMS
microdata extracts in chunks.
read_ipums_ddi()
to read metadata associated with an IPUMS microdata
extract.
read_ipums_sf()
to read spatial data from an IPUMS extract.
ipums_list_files()
to list files in an IPUMS extract.
Examples
# Codebook for rectangular example file
cps_rect_ddi_file <- ipums_example("cps_00157.xml")
# Load data based on codebook file info
cps <- read_ipums_micro(cps_rect_ddi_file)
head(cps)
# Can also load data from a pre-existing `ipums_ddi` object
# (This may be useful to retain codebook metadata even if lost from data
# during processing)
ddi <- read_ipums_ddi(cps_rect_ddi_file)
cps <- read_ipums_micro(ddi, verbose = FALSE)
# Codebook for hierarchical example file
cps_hier_ddi_file <- ipums_example("cps_00159.xml")
# Read in "long" format to get a single data frame
read_ipums_micro(cps_hier_ddi_file, verbose = FALSE)
# Read in "list" format and you get a list of multiple data frames
cps_list <- read_ipums_micro_list(cps_hier_ddi_file)
head(cps_list$PERSON)
head(cps_list$HOUSEHOLD)
# Use the `%<-%` operator from zeallot to unpack into separate objects
c(household, person) %<-% read_ipums_micro_list(cps_hier_ddi_file)
head(person)
head(household)
Read data from an IPUMS microdata extract by chunk
Description
Read a microdata dataset downloaded from the IPUMS extract system in chunks.
Use these functions to read a file that is too large to store in memory at a single time. The file is processed in chunks of a given size, with a provided callback function applied to each chunk.
Two files are required to load IPUMS microdata extracts:
A DDI codebook file (.xml) used to parse the extract's data file
A data file (either .dat.gz or .csv.gz)
See Downloading IPUMS files below for more information about downloading these files.
read_ipums_micro_chunked()
and read_ipums_micro_list_chunked()
differ
in their handling of extracts that contain multiple record types.
See Data structures below.
Note that Stata, SAS, and SPSS file formats are not supported by ipumsr readers. Convert your extract to fixed-width or CSV format, or see haven for help loading these files.
Usage
read_ipums_micro_chunked(
ddi,
callback,
chunk_size = 10000,
vars = NULL,
data_file = NULL,
verbose = TRUE,
var_attrs = c("val_labels", "var_label", "var_desc"),
lower_vars = FALSE
)
read_ipums_micro_list_chunked(
ddi,
callback,
chunk_size = 10000,
vars = NULL,
data_file = NULL,
verbose = TRUE,
var_attrs = c("val_labels", "var_label", "var_desc"),
lower_vars = FALSE
)
Arguments
ddi |
Either a path to a DDI .xml file downloaded from
IPUMS, or an
ipums_ddi object parsed by |
callback |
An ipums_callback object, or a function
that will be converted to an |
chunk_size |
Integer number of observations to read per chunk. Higher values use more RAM, but typically result in faster processing. Defaults to 10,000. |
vars |
Names of variables to include in the output. Accepts a
vector of names or a tidyselect selection.
If For hierarchical data, the |
data_file |
Path to the data (.gz) file associated with
the provided |
verbose |
Logical indicating whether to display IPUMS conditions and progress information. |
var_attrs |
Variable attributes from the DDI to add to the columns of
the output data. Defaults to all available attributes.
See |
lower_vars |
If reading a DDI from a file,
a logical indicating whether to convert variable names to lowercase.
Defaults to This argument will be ignored if argument Note that if reading in chunks from a .csv or .csv.gz file, the callback function will be called before variable names are converted to lowercase, and thus should reference uppercase variable names. |
Value
Depends on the provided callback object. See ipums_callback.
Data structures
Files from IPUMS projects that contain data for multiple types of records (e.g. household records and person records) may be either rectangular or hierarchical.
Rectangular data are transformed such that each row of data represents only one type of record. For instance, each row will represent a person record, and all household-level information for that person will be included in the same row.
Hierarchical data have records of different types interspersed in a single file. For instance, a household record will be included in its own row followed by the person records associated with that household.
Hierarchical data can be read in two different formats:
-
read_ipums_micro_chunked()
reads each chunk of data into atibble
where each row represents a single record, regardless of record type. Variables that do not apply to a particular record type will be filled withNA
in rows of that record type. For instance, a person-specific variable will be missing in all rows associated with household records. The providedcallback
function should therefore operate on atibble
object. -
read_ipums_micro_list_chunked()
reads each chunk of data into a list oftibble
objects, where each list element contains only one record type. Each list element is named with its corresponding record type. The providedcallback
function should therefore operate on a list object. In this case, the chunk size references the total number of rows across record types, rather than in each record type.
Downloading IPUMS files
You must download both the DDI codebook and the data file from the IPUMS
extract system to load the data into R. read_ipums_micro_*()
functions
assume that the data file and codebook share a common base file name and
are present in the same directory. If this is not the case, provide a
separate path to the data file with the data_file
argument.
If using the IPUMS extract interface:
Download the data file by clicking Download .dat under Download Data.
Download the DDI codebook by right clicking on the DDI link in the Codebook column of the extract interface and selecting Save as... (on Safari, you may have to select Download Linked File as...). Be sure that the codebook is downloaded in .xml format.
If using the IPUMS API:
For supported collections, use
download_extract()
to download a completed extract via the IPUMS API. This automatically downloads both the DDI codebook and the data file from the extract and returns the path to the codebook file.
See Also
read_ipums_micro_yield()
for more flexible handling of large
IPUMS microdata files.
read_ipums_micro()
to read data from an IPUMS microdata extract.
read_ipums_ddi()
to read metadata associated with an IPUMS microdata
extract.
read_ipums_sf()
to read spatial data from an IPUMS extract.
ipums_list_files()
to list files in an IPUMS extract.
Examples
suppressMessages(library(dplyr))
# Example codebook file
cps_rect_ddi_file <- ipums_example("cps_00157.xml")
# Function to extract Minnesota cases from CPS example
# (This can also be accomplished by including case selections
# in an extract definition)
#
# Function must take `x` and `pos` to refer to data and row position,
# respectively.
filter_mn <- function(x, pos) {
x[x$STATEFIP == 27, ]
}
# Initialize callback
filter_mn_callback <- IpumsDataFrameCallback$new(filter_mn)
# Process data in chunks, filtering to MN cases in each chunk
read_ipums_micro_chunked(
cps_rect_ddi_file,
callback = filter_mn_callback,
chunk_size = 1000,
verbose = FALSE
)
# Tabulate INCTOT average by state without storing full dataset in memory
read_ipums_micro_chunked(
cps_rect_ddi_file,
callback = IpumsDataFrameCallback$new(
function(x, pos) {
x %>%
mutate(
INCTOT = lbl_na_if(
INCTOT,
~ grepl("Missing|N.I.U.", .lbl)
)
) %>%
filter(!is.na(INCTOT)) %>%
group_by(STATEFIP = as_factor(STATEFIP)) %>%
summarize(INCTOT_SUM = sum(INCTOT), n = n(), .groups = "drop")
}
),
chunk_size = 1000,
verbose = FALSE
) %>%
group_by(STATEFIP) %>%
summarize(avg_inc = sum(INCTOT_SUM) / sum(n))
# `x` will be a list when using `read_ipums_micro_list_chunked()`
read_ipums_micro_list_chunked(
ipums_example("cps_00159.xml"),
callback = IpumsSideEffectCallback$new(function(x, pos) {
print(
paste0(
nrow(x$PERSON), " persons and ",
nrow(x$HOUSEHOLD), " households in this chunk."
)
)
}),
chunk_size = 1000,
verbose = FALSE
)
# Using the biglm package, you can even run a regression without storing
# the full dataset in memory
if (requireNamespace("biglm")) {
lm_results <- read_ipums_micro_chunked(
ipums_example("cps_00160.xml"),
IpumsBiglmCallback$new(
INCTOT ~ AGE + HEALTH, # Model formula
function(x, pos) {
x %>%
mutate(
INCTOT = lbl_na_if(
INCTOT,
~ grepl("Missing|N.I.U.", .lbl)
),
HEALTH = as_factor(HEALTH)
)
}
),
chunk_size = 1000,
verbose = FALSE
)
summary(lm_results)
}
Read data from an IPUMS microdata extract in yields
Description
Read a microdata dataset downloaded from the IPUMS extract system into an
object that can read and operate on a group ("yield") of lines at a time.
Use these functions to read a file that is too large to store in memory at
a single time. They represent a more flexible implementation of
read_ipums_micro_chunked()
using R6.
Two files are required to load IPUMS microdata extracts:
A DDI codebook file (.xml) used to parse the extract's data file
A data file (either .dat.gz or .csv.gz)
See Downloading IPUMS files below for more information about downloading these files.
read_ipums_micro_yield()
and read_ipums_micro_list_yield()
differ
in their handling of extracts that contain multiple record types.
See Data structures below.
Note that these functions only support fixed-width (.dat) data files.
Usage
read_ipums_micro_yield(
ddi,
vars = NULL,
data_file = NULL,
verbose = TRUE,
var_attrs = c("val_labels", "var_label", "var_desc"),
lower_vars = FALSE
)
read_ipums_micro_list_yield(
ddi,
vars = NULL,
data_file = NULL,
verbose = TRUE,
var_attrs = c("val_labels", "var_label", "var_desc"),
lower_vars = FALSE
)
Arguments
ddi |
Either a path to a DDI .xml file downloaded from
IPUMS, or an
ipums_ddi object parsed by |
vars |
Names of variables to include in the output. Accepts a
vector of names or a tidyselect selection.
If For hierarchical data, the |
data_file |
Path to the data (.gz) file associated with
the provided |
verbose |
Logical indicating whether to display IPUMS conditions and progress information. |
var_attrs |
Variable attributes from the DDI to add to the columns of
the output data. Defaults to all available attributes.
See |
lower_vars |
If reading a DDI from a file,
a logical indicating whether to convert variable names to lowercase.
Defaults to This argument will be ignored if argument If |
Value
A HipYield R6 object (see Details section)
Methods summary:
These functions return a HipYield R6 object with the following methods:
-
yield(n = 10000)
reads the next "yield" from the data.For
read_ipums_micro_yield()
, returns atibble
with up ton
rows.For
read_ipums_micro_list_yield()
, returns a list of tibbles with a total of up ton
rows across list elements.If fewer than
n
rows are left in the data, returns all remaining rows. If no rows are left in the data, returnsNULL
. -
reset()
resets the data so that the next yield will read data from the start. -
is_done()
returns a logical indicating whether all rows in the file have been read. -
cur_pos
contains the next row number that will be read (1-indexed).
Data structures
Files from IPUMS projects that contain data for multiple types of records (e.g. household records and person records) may be either rectangular or hierarchical.
Rectangular data are transformed such that each row of data represents only one type of record. For instance, each row will represent a person record, and all household-level information for that person will be included in the same row.
Hierarchical data have records of different types interspersed in a single file. For instance, a household record will be included in its own row followed by the person records associated with that household.
Hierarchical data can be read in two different formats:
-
read_ipums_micro_yield()
produces an object that yields data as atibble
whose rows represent single records, regardless of record type. Variables that do not apply to a particular record type will be filled withNA
in rows of that record type. For instance, a person-specific variable will be missing in all rows associated with household records. -
read_ipums_micro_list_yield()
produces an object that yields data as a list oftibble
objects, where each list element contains only one record type. Each list element is named with its corresponding record type. In this case, when usingyield()
,n
refers to the total number of rows across record types, rather than in each record type.
Downloading IPUMS files
You must download both the DDI codebook and the data file from the IPUMS
extract system to load the data into R. read_ipums_micro_*()
functions
assume that the data file and codebook share a common base file name and
are present in the same directory. If this is not the case, provide a
separate path to the data file with the data_file
argument.
If using the IPUMS extract interface:
Download the data file by clicking Download .dat under Download Data.
Download the DDI codebook by right clicking on the DDI link in the Codebook column of the extract interface and selecting Save as... (on Safari, you may have to select Download Linked File as...). Be sure that the codebook is downloaded in .xml format.
If using the IPUMS API:
For supported collections, use
download_extract()
to download a completed extract via the IPUMS API. This automatically downloads both the DDI codebook and the data file from the extract and returns the path to the codebook file.
See Also
read_ipums_micro_chunked()
to read data from large IPUMS
microdata extracts in chunks.
read_ipums_micro()
to read data from an IPUMS microdata extract.
read_ipums_ddi()
to read metadata associated with an IPUMS microdata
extract.
read_ipums_sf()
to read spatial data from an IPUMS extract.
ipums_list_files()
to list files in an IPUMS extract.
Examples
# Create an IpumsLongYield object
long_yield <- read_ipums_micro_yield(ipums_example("cps_00157.xml"))
# Yield the first 10 rows of the data
long_yield$yield(10)
# Yield the next 20 rows of the data
long_yield$yield(20)
# Check the current position after yielding 30 rows
long_yield$cur_pos
# Reset to the beginning of the file
long_yield$reset()
# Use a loop to flexibly process the data in pieces. Count all Minnesotans:
total_mn <- 0
while (!long_yield$is_done()) {
cur_data <- long_yield$yield(1000)
total_mn <- total_mn + sum(as_factor(cur_data$STATEFIP) == "Minnesota")
}
total_mn
# Can also read hierarchical data as list:
list_yield <- read_ipums_micro_list_yield(ipums_example("cps_00159.xml"))
# Yield size is based on total rows for all list elements
list_yield$yield(10)
Read spatial data from an IPUMS extract
Description
Read a spatial data file (also referred to as a GIS file or shapefile) from
an IPUMS extract into an sf
object from the
sf package.
Usage
read_ipums_sf(
shape_file,
file_select = NULL,
vars = NULL,
encoding = NULL,
bind_multiple = FALSE,
add_layer_var = NULL,
verbose = FALSE
)
Arguments
shape_file |
Path to a single .shp file or a .zip archive containing at least one .shp file. See Details section. |
file_select |
If |
vars |
Names of variables to include in the output. Accepts a
character vector of names or a tidyselect selection.
If |
encoding |
Encoding to use when reading the shape file. If |
bind_multiple |
If |
add_layer_var |
If The column name will always be prefixed with |
verbose |
If |
Details
Some IPUMS products provide shapefiles in a "nested" .zip archive. That is, each shapefile (including a .shp as well as accompanying files) is compressed in its own archive, and the collection of all shapefiles provided in an extract is also compressed into a single .zip archive.
read_ipums_sf()
is designed to handle this structure. However, if any files
are altered such that an internal .zip archive contains multiple
shapefiles, this function will throw an error. If this is the case, you may
need to manually unzip the downloaded file before loading it into R.
Value
An sf object
See Also
read_ipums_micro()
or read_ipums_agg()
to read tabular data from
an IPUMS extract.
ipums_list_files()
to list files in an IPUMS extract.
Examples
# Example shapefile from NHGIS
shape_ex1 <- ipums_example("nhgis0972_shape_small.zip")
data_ex1 <- read_ipums_agg(ipums_example("nhgis0972_csv.zip"), verbose = FALSE)
sf_data <- read_ipums_sf(shape_ex1)
sf_data
# To combine spatial data with tabular data without losing the attributes
# included in the tabular data, use an ipums shape join:
ipums_shape_full_join(data_ex1, sf_data, by = "GISJOIN")
shape_ex2 <- ipums_example("nhgis0712_shape_small.zip")
# Shapefiles are provided in .zip archives that may contain multiple
# files. Select a single file with `file_select`:
read_ipums_sf(shape_ex2, file_select = matches("us_pmsa_1990"))
# Or row-bind files with `bind_multiple`. This may be useful for files of
# the same geographic level that cover different extents
read_ipums_sf(
shape_ex2,
file_select = matches("us_pmsa"),
bind_multiple = TRUE
)
Read tabular data from an NHGIS extract
Description
Read a .csv or fixed-width (.dat) file downloaded from the NHGIS extract system.
This function has been deprecated in favor of read_ipums_agg()
, which
can read .csv files from both IPUMS aggregate data collections
(IPUMS NHGIS and IPUMS IHGIS). Please use that function instead.
Note that fixed-width file reading is not supported in read_ipums_agg()
and
will likely be retired with read_nhgis()
. We therefore encourage you to
create NHGIS extracts in .csv format going forward. For previously-submitted
fixed-width extracts, we suggest
regenerating them in .csv format and loading them with read_ipums_agg()
.
Use the data_format
argument of define_extract_agg()
to create a
.csv extract for submission via the IPUMS API.
To read spatial data from an NHGIS extract, use read_ipums_sf()
.
Usage
read_nhgis(
data_file,
file_select = NULL,
vars = NULL,
col_types = NULL,
n_max = Inf,
guess_max = min(n_max, 1000),
do_file = NULL,
var_attrs = c("val_labels", "var_label", "var_desc"),
remove_extra_header = TRUE,
verbose = TRUE
)
Arguments
data_file |
Path to a .zip archive containing an NHGIS extract or a single file from an NHGIS extract. |
file_select |
If |
vars |
Names of variables to include in the output. Accepts a
vector of names or a tidyselect selection.
If |
col_types |
One of
See |
n_max |
Maximum number of lines to read. |
guess_max |
For .csv files, maximum number of lines to use for guessing column types. Will never use more than the number of lines read. |
do_file |
For fixed-width files, path to the .do file associated with
the provided By default, looks in the same path as |
var_attrs |
Variable attributes to add from the codebook (.txt) file included in the extract. Defaults to all available attributes. See |
remove_extra_header |
If This header row is not
usually needed as it contains similar information to that
included in the |
verbose |
Logical controlling whether to display output when loading
data. If Will be overridden by |
Details
The .do file that is included when downloading an NHGIS fixed-width
extract contains the necessary metadata (e.g. column positions and implicit
decimals) to correctly parse the data file. read_nhgis()
uses this
information to parse and recode the fixed-width data appropriately.
If you no longer have access to the .do file, consider resubmitting the extract that produced the data. You can also change the desired data format to produce a .csv file, which does not require additional metadata files to be loaded.
For more about resubmitting an existing extract via the IPUMS API, see
vignette("ipums-api", package = "ipumsr")
.
Value
A tibble
containing the data found in
data_file
See Also
read_ipums_sf()
to read spatial data from an IPUMS extract.
read_nhgis_codebook()
to read metadata about an IPUMS NHGIS extract.
ipums_list_files()
to list files in an IPUMS extract.
Examples
# Example files
csv_file <- ipums_example("nhgis0972_csv.zip")
fw_file <- ipums_example("nhgis0730_fixed.zip")
# Previously:
read_nhgis(csv_file)
# For CSV files, please update to use the following:
read_ipums_agg(csv_file)
# Fixed-width files are parsed with the correct column positions
# and column types automatically:
read_nhgis(fw_file, file_select = contains("ts"), verbose = FALSE)
Read metadata from an NHGIS codebook (.txt) file
Description
Read the variable metadata contained in the .txt codebook file included with
NHGIS extracts into an ipums_ddi
object.
Because NHGIS variable metadata do not
adhere to all the standards of microdata DDI files, some of the ipums_ddi
fields will not be populated.
This function is marked as experimental while we determine whether there may be a more robust way to standardize codebook reading across IPUMS aggregate data collections.
Usage
read_nhgis_codebook(cb_file, file_select = NULL, raw = FALSE)
Arguments
cb_file |
Path to a .zip archive containing an NHGIS extract or to an NHGIS codebook (.txt) file. |
file_select |
If |
raw |
If |
Value
If raw = FALSE
, an ipums_ddi
object with metadata about the
variables contained in the data for the extract associated with the given
cb_file
.
If raw = TRUE
, a character vector with one element for each
line of the given cb_file
.
See Also
read_ipums_agg()
to read tabular data from an IPUMS NHGIS extract.
read_ipums_sf()
to read spatial data from an IPUMS extract.
ipums_list_files()
to list files in an IPUMS extract.
Examples
# Example file
nhgis_file <- ipums_example("nhgis0972_csv.zip")
# Read codebook as an `ipums_ddi` object:
codebook <- read_nhgis_codebook(nhgis_file)
# Variable-level metadata about the contents of the data file:
ipums_var_info(codebook)
ipums_var_label(codebook, "PMSA")
# If variable metadata have been lost from a data source, reattach from
# the corresponding `ipums_ddi` object:
nhgis_data <- read_ipums_agg(nhgis_file, verbose = FALSE)
nhgis_data <- zap_ipums_attributes(nhgis_data)
ipums_var_label(nhgis_data$PMSA)
nhgis_data <- set_ipums_var_attributes(nhgis_data, codebook)
ipums_var_label(nhgis_data$PMSA)
# You can also load the codebook in raw format to display in the console
codebook_raw <- read_nhgis_codebook(nhgis_file, raw = TRUE)
# Use `cat` for human-readable output
cat(codebook_raw[1:20], sep = "\n")
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- haven
- lifecycle
- readr
- tidyselect
all_of
,any_of
,contains
,ends_with
,everything
,last_col
,matches
,num_range
,one_of
,starts_with
- zeallot
Remove values from an existing IPUMS extract definition
Description
Remove values for specific fields in an existing ipums_extract
object. This function is an S3 generic whose behavior will depend on the
subclass (i.e. collection) of the extract being modified.
To remove from an IPUMS Microdata extract definition, click here. This includes:
IPUMS USA
IPUMS CPS
IPUMS International
IPUMS Time Use (ATUS, AHTUS, MTUS)
IPUMS Health Surveys (NHIS, MEPS)
To remove from an IPUMS aggregate data extract definition, click here. This includes:
IPUMS NHGIS
IPUMS IHGIS
This function is marked as experimental because it is typically not the best
option for maintaining reproducible extract definitions and may be retired
in the future. For reproducibility, users should strive to build extract
definitions with define_extract_micro()
or define_extract_agg()
.
If you have a complicated extract definition to revise, but do not have
the original extract definition code that created it, we suggest that you
save the revised extract as a JSON file with save_extract_as_json()
. This
will create a stable version of the extract definition that
can be used in the future as needed.
To add new values to an extract, see add_to_extract()
.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
remove_from_extract(extract, ...)
Arguments
extract |
An |
... |
Additional arguments specifying the extract fields and values to remove from the extract definition. |
Value
An object of the same class as extract
containing the modified
extract definition
See Also
add_to_extract()
to add values to an extract definition.
define_extract_micro()
or define_extract_agg()
to define an
extract request manually
submit_extract()
to submit an extract request for processing.
Examples
# Microdata extracts
usa_extract <- define_extract_micro(
collection = "usa",
description = "USA example",
samples = c("us2013a", "us2014a"),
variables = list(
var_spec("AGE"),
var_spec("SEX", case_selections = "2"),
var_spec("YEAR")
)
)
# Remove variables from an extract definition
remove_from_extract(
usa_extract,
samples = "us2014a",
variables = c("AGE", "SEX")
)
# Remove detailed specifications for an existing variable
remove_from_extract(
usa_extract,
variables = var_spec("SEX", case_selections = "2")
)
# NHGIS extracts
nhgis_extract <- define_extract_agg(
"nhgis",
datasets = ds_spec(
"1990_STF1",
data_tables = c("NP1", "NP2", "NP3"),
geog_levels = "county"
),
time_series_tables = tst_spec("A00", geog_levels = "county")
)
# Remove an existing dataset or time series table
remove_from_extract(nhgis_extract, datasets = "1990_STF1")
# Remove detailed specifications from an existing dataset or
# time series table
remove_from_extract(
nhgis_extract,
datasets = ds_spec("1990_STF1", data_tables = "NP1")
)
Remove values from an existing NHGIS extract definition
Description
Remove existing values from an IPUMS aggregate data extract definition. All fields are optional, and if omitted, will be unchanged.
This function is marked as experimental because it is typically not the best
option for maintaining reproducible extract definitions and may be retired
in the future. For reproducibility, users should strive to build extract
definitions with define_extract_agg()
.
If you have a complicated extract definition to revise, but do not have
the original extract definition code that created it, we suggest that you
save the revised extract as a JSON file with save_extract_as_json()
. This
will create a stable version of the extract definition that
can be used in the future as needed.
To add new values to an IPUMS NHGIS extract definition, use
add_to_extract()
.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
## S3 method for class 'agg_extract'
remove_from_extract(
extract,
datasets = NULL,
time_series_tables = NULL,
geographic_extents = NULL,
shapefiles = NULL,
...
)
Arguments
extract |
An |
datasets |
Dataset specifications to remove from the extract definition.
All |
time_series_tables |
Names of the time series tables
to remove from the extract definition. All |
geographic_extents |
Geographic extents to remove from the extract definition. |
shapefiles |
Shapefiles to remove from the extract definition. |
... |
Ignored |
Details
Any extract fields that are rendered irrelevant after modifying the extract
will be automatically removed. (For instance, if all time_series_tables
are removed from an extract, tst_layout
will also be
removed.) Thus, it is not necessary to explicitly remove these values.
If the supplied extract definition comes from a previously submitted extract request, this function will reset the definition to an unsubmitted state.
Value
A modified agg_extract
object
See Also
add_to_extract()
to add values
to an extract definition.
submit_extract()
to submit an extract request.
download_extract()
to download extract data files.
define_extract_agg()
to create a new extract definition.
Examples
extract <- define_extract_agg(
"nhgis",
datasets = ds_spec(
"1990_STF1",
data_tables = c("NP1", "NP2", "NP3"),
geog_levels = "county"
),
time_series_tables = list(
tst_spec("CW3", c("state", "county")),
tst_spec("CW5", c("state", "county"))
)
)
# Providing names of datasets or time series tables will remove them and
# all of their associated specifications from the extract:
remove_from_extract(
extract,
time_series_tables = c("CW3", "CW5")
)
# To remove detailed specifications from a dataset or time series table,
# use `ds_spec()` or `tst_spec()`. The named dataset or time series table
# will be retained in the extract, but modified by removing the indicated
# specifications:
remove_from_extract(
extract,
datasets = ds_spec("1990_STF1", data_tables = c("NP2", "NP3"))
)
# To make multiple modifications, use a list of `ds_spec()` or `tst_spec()`
# objects:
remove_from_extract(
extract,
time_series_tables = list(
tst_spec("CW3", geog_levels = "county"),
tst_spec("CW5", geog_levels = "state")
)
)
Remove values from an existing extract definition for an IPUMS microdata project
Description
Remove existing values from an IPUMS microdata extract definition. All fields are optional, and if omitted, will be unchanged.
This function is marked as experimental because it is typically not the best
option for maintaining reproducible extract definitions and may be retired
in the future. For reproducibility, users should strive to build extract
definitions with define_extract_micro()
.
If you have a complicated extract definition to revise, but do not have
the original extract definition code that created it, we suggest that you
save the revised extract as a JSON file with save_extract_as_json()
. This
will create a stable version of the extract definition that
can be used in the future as needed.
To add new values to an IPUMS microdata extract definition, see
add_to_extract()
.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
## S3 method for class 'micro_extract'
remove_from_extract(
extract,
samples = NULL,
variables = NULL,
time_use_variables = NULL,
sample_members = NULL,
...
)
Arguments
extract |
An |
samples |
Character vector of sample names to remove from the extract definition. |
variables |
Names of the variables to remove from the extract definition. All variable-specific fields for the indicated variables will also be removed. For removing values from variable-specific fields while retaining the variable, see examples. |
time_use_variables |
Names of the time use variables to remove from the extract definition. All time use variable-specific fields for the indicated time use variables will also be removed. For removing time use variable-specific fields while retaining the time use variable, see examples. |
sample_members |
Sample members to remove from the extract definition. |
... |
Ignored |
Details
If the supplied extract definition comes from a previously submitted extract request, this function will reset the definition to an unsubmitted state.
Value
A modified micro_extract
object
See Also
add_to_extract()
to add values
to an extract definition.
submit_extract()
to submit an extract request.
download_extract()
to download extract data files.
define_extract_micro()
to create a new extract
definition from scratch.
Examples
usa_extract <- define_extract_micro(
collection = "usa",
description = "USA example",
samples = c("us2013a", "us2014a"),
variables = list(
var_spec("AGE", data_quality_flags = TRUE),
var_spec("SEX", case_selections = "1"),
"RACE"
)
)
# Providing names of samples or variables will remove them and
# all of their associated specifications from the extract:
remove_from_extract(
usa_extract,
samples = "us2014a",
variables = c("AGE", "RACE")
)
# To remove detailed specifications from a variable or time use variable,
# indicate the specifications to remove within `var_spec()` or
# `tu_var_spec()`. The named variable will be retained in the extract, but
# modified by removing the indicated specifications.
remove_from_extract(
usa_extract,
variables = var_spec("SEX", case_selections = "1")
)
# To make multiple modifications, use a list of `var_spec()` objects.
remove_from_extract(
usa_extract,
variables = list(
var_spec("SEX", case_selections = "1"),
var_spec("AGE")
)
)
Store an extract definition in JSON format
Description
Write an ipums_extract
object to a JSON file, or
read an extract definition from such a file.
Use these functions to store a copy of an extract definition outside of your R environment and/or share an extract definition with another registered IPUMS user.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
save_extract_as_json(extract, file, overwrite = FALSE)
define_extract_from_json(extract_json)
Arguments
extract |
An |
file |
File path to which to write the JSON-formatted extract definition. |
overwrite |
If |
extract_json |
Path to a file containing a JSON-formatted extract definition. |
Value
An ipums_extract
object.
API Version Compatibility
As of v0.6.0, ipumsr only supports IPUMS API version 2. If you have stored
an extract definition made using version beta or version 1 of the IPUMS
API, you will not be able to load it using define_extract_from_json()
. The
API version for the request should be stored in the saved JSON file. (If
there is no "api_version"
or "version"
field in the JSON file, the
request was likely made under version beta or version 1.)
If the extract definition was originally made under your user account and
you know its corresponding extract number, use get_extract_info()
to obtain
a definition compliant with IPUMS API version 2. You can then save this
definition to JSON with save_extract_as_json()
.
Otherwise, you will need to update the JSON file to be compliant with
IPUMS API version 2. In general, this should only require renaming
all JSON fields written in snake_case
to camelCase
. For instance,
"data_tables"
would become "dataTables"
, "data_format"
would become
"dataFormat"
, and so on. You will also need to change the "api_version"
field to "version"
and set it equal to 2
. If you are unable to create
a valid extract by modifying the file, you may have to recreate the
definition manually using the define_extract_micro()
or
define_extract_agg()
.
See the IPUMS developer documentation for more details on API versioning and breaking changes introduced in version 2.
See Also
define_extract_micro()
or define_extract_agg()
to define an
extract request manually
get_extract_info()
to obtain a past extract to save.
submit_extract()
to submit an extract request for processing.
add_to_extract()
and remove_from_extract()
to
revise an extract definition.
Examples
my_extract <- define_extract_micro(
collection = "usa",
description = "2013-2014 ACS Data",
samples = c("us2013a", "us2014a"),
variables = c("SEX", "AGE", "YEAR")
)
extract_json_path <- file.path(tempdir(), "usa_extract.json")
save_extract_as_json(my_extract, file = extract_json_path)
copy_of_my_extract <- define_extract_from_json(extract_json_path)
identical(my_extract, copy_of_my_extract)
file.remove(extract_json_path)
tidyselect selection language in ipumsr
Description
Slightly modified implementation of tidyselect selection language in ipumsr.
Syntax
In general, the selection language in ipumsr operates the same as in tidyselect.
Where applicable, variables can be selected with:
A character vector of variable names (
c("var1", "var2")
)A bare vector of variable names (
c(var1, var2)
)A selection helper from tidyselect (
starts_with("var")
). See below for a list of helpers.
Primary differences
tidyselect selection is generally intended for use with column variables in data.frame-like objects. In contrast, ipumsr allows selection language syntax in other cases as well (for instance, when selecting files from within a .zip archive). ipumsr functions will indicate whether they support the selection language.
Selection with
where()
is not consistently supported.
Selection helpers (from tidyselect)
-
var1
:var10
: variables lying betweenvar1
on the left andvar10
on the right. -
starts_with("a")
: names that start with"a"
-
ends_with("z")
: names that end with"z"
-
contains("b")
: names that contain"b"
-
matches("x.y")
: names that match regular expressionx.y
-
num_range(x, 1:4)
: names following the patternx1, x2, ..., x4
-
all_of(vars)
/any_of(vars)
: matches names stored in the character vectorvars
.all_of(vars)
will error if the variables aren't present;any_of(vars)
will match just the variables that exist. -
everything()
: all variables -
last_col()
: furthest column to the right
Operators for combining those selections:
-
!selection
: only variables that don't matchselection
-
selection1 & selection2
: only variables included in bothselection1
andselection2
-
selection1 | selection2
: all variables that match eitherselection1
orselection2
Examples
cps_file <- ipums_example("cps_00157.xml")
# Load 3 variables by name
read_ipums_micro(
cps_file,
vars = c("YEAR", "MONTH", "PERNUM"),
verbose = FALSE
)
# "Bare" variables are supported
read_ipums_micro(
cps_file,
vars = c(YEAR, MONTH, PERNUM),
verbose = FALSE
)
# Standard tidyselect selectors are also supported
read_ipums_micro(cps_file, vars = starts_with("ASEC"), verbose = FALSE)
# Selection methods can be combined
read_ipums_micro(
cps_file,
vars = c(YEAR, MONTH, contains("INC")),
verbose = FALSE
)
read_ipums_micro(
cps_file,
vars = starts_with("S") & ends_with("P"),
verbose = FALSE
)
# Other selection arguments also support this syntax.
# For instance, load a particular file based on a tidyselect match:
read_ipums_agg(
ipums_example("nhgis0731_csv.zip"),
file_select = contains("nominal_state"),
verbose = FALSE
)
Set your IPUMS API key
Description
Set your IPUMS API key as the value associated with the IPUMS_API_KEY
environment variable.
The key can be stored for the duration of your session or for future
sessions. If saved for future sessions, it is added to the .Renviron
file in your home directory. If you choose to save your key to .Renviron
,
this function will create a backup copy of the file before modifying.
This function is modeled after the census_api_key()
function
from tidycensus.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
set_ipums_api_key(api_key, save = overwrite, overwrite = FALSE, unset = FALSE)
Arguments
api_key |
API key associated with your user account. |
save |
If |
overwrite |
If |
unset |
If |
Value
The value of api_key
, invisibly.
See Also
set_ipums_default_collection()
to set a default collection.
Set your default IPUMS collection
Description
Set the default IPUMS collection as the value associated with the
IPUMS_DEFAULT_COLLECTION
environment variable. If this environment variable
exists, IPUMS API functions that require a collection specification will use
the value of IPUMS_DEFAULT_COLLECTION
, unless another collection is
indicated.
The default collection can be stored for the duration of your session or
for future sessions. If saved for future sessions, it is added to the
.Renviron
file in your home directory. If you choose to save your key
to .Renviron
, this function will create a backup copy of the file before
modifying.
This function is modeled after the census_api_key()
function
from tidycensus.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
set_ipums_default_collection(
collection = NULL,
save = overwrite,
overwrite = FALSE,
unset = FALSE
)
Arguments
collection |
Character string of the collection to set as your default collection. The collection must currently be supported by the IPUMS API. For a list of codes used to refer to each collection, see
|
save |
If |
overwrite |
If |
unset |
if |
Value
The value of collection
, invisibly.
See Also
set_ipums_api_key()
to set an API key.
Examples
set_ipums_default_collection("nhgis")
## Not run:
# Extract info will now be retrieved for the default collection:
get_last_extract_info()
get_extract_history()
is_extract_ready(1)
get_extract_info(1)
# Equivalent to:
get_extract_info("nhgis:1")
get_extract_info(c("nhgis", 1))
# Other collections can be specified explicitly
# Doing so does not alter the default collection
is_extract_ready("usa:2")
## End(Not run)
# Remove the variable from the environment and .Renviron, if saved
set_ipums_default_collection(unset = TRUE)
Add IPUMS variable attributes to a data frame
Description
Add variable attributes from an ipums_ddi object to a data frame. These provide contextual information about the variables and values contained in the data columns.
Most ipumsr data-reading functions automatically add these attributes. However, some data processing operations may remove attributes, or you may wish to store data in an external database that does not support these attributes. In these cases, use this function to manually attach this information.
Usage
set_ipums_var_attributes(
data,
var_info,
var_attrs = c("val_labels", "var_label", "var_desc")
)
Arguments
data |
|
var_info |
An ipums_ddi object or a data frame containing
variable information. Variable information can be obtained by calling
|
var_attrs |
Variable attributes from the DDI to add to the columns of the output data. Defaults to all available attributes. |
Details
Attribute val_labels
adds the haven_labelled
class
and the corresponding value labels for applicable variables. For more
about the haven_labelled
class, see
vignette("semantics", package = "haven")
.
Attribute var_label
adds a short summary of the variable's
contents to the "label"
attribute. This label is viewable in the
RStudio Viewer.
Attribute var_desc
adds a longer description of the variable's
contents to the "var_desc"
attribute, when available.
Variable information is attached to the data by column name. If column
names in data
do not match those found in var_info
, attributes
will not be added.
Value
data
, with variable attributes attached
Examples
ddi_file <- ipums_example("cps_00157.xml")
# Load metadata into `ipums_ddi` object
ddi <- read_ipums_ddi(ddi_file)
# Load data
cps <- read_ipums_micro(ddi)
# Data includes variable metadata:
ipums_var_desc(cps$INCTOT)
# Some operations remove attributes, even if they do not alter the data:
cps$INCTOT <- ifelse(TRUE, cps$INCTOT, NA)
ipums_var_desc(cps$INCTOT)
# We can reattach metadata from the separate `ipums_ddi` object:
cps <- set_ipums_var_attributes(cps, ddi)
ipums_var_desc(cps$INCTOT)
Submit an extract request via the IPUMS API
Description
Submit an extract request via the IPUMS API and return an
ipums_extract
object containing the extract
definition with a newly-assigned extract request number.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
submit_extract(extract, api_key = Sys.getenv("IPUMS_API_KEY"))
Arguments
extract |
An |
api_key |
API key associated with your user account. Defaults to the
value of the |
Value
An ipums_extract
object containing the
extract definition and newly-assigned extract number of the submitted
extract.
Note that some unspecified extract fields may be populated with default values and therefore change slightly upon submission.
See Also
wait_for_extract()
to wait for an extract to finish processing.
get_extract_info()
and is_extract_ready()
to check the status of an
extract request.
download_extract()
to download an extract's data files.
Examples
my_extract <- define_extract_micro(
collection = "cps",
description = "2018-2019 CPS Data",
samples = c("cps2018_05s", "cps2019_05s"),
variables = c("SEX", "AGE", "YEAR")
)
## Not run:
# Store your submitted extract request to obtain the extract number
submitted_extract <- submit_extract(my_extract)
submitted_extract$number
# This is useful for checking the extract request status
get_extract_info(submitted_extract)
# You can always get the latest status, even if you forget to store the
# submitted extract request object
submitted_extract <- get_last_extract_info("cps")
# You can also check if submitted extract is ready
is_extract_ready(submitted_extract)
# Or have R check periodically and download when ready
downloadable_extract <- wait_for_extract(submitted_extract)
## End(Not run)
Create variable and sample specifications for IPUMS microdata extract requests
Description
Provide specifications for individual variables and time use variables when defining an IPUMS microdata extract request.
Currently, no additional specifications are available for IPUMS samples.
Note that not all variable-level options are available across all IPUMS data collections. For a summary of supported features by collection, see the IPUMS API documentation.
Learn more about microdata extract definitions in
vignette("ipums-api-micro")
.
Usage
var_spec(
name,
case_selections = NULL,
case_selection_type = NULL,
attached_characteristics = NULL,
data_quality_flags = NULL,
adjust_monetary_values = NULL,
preselected = NULL
)
tu_var_spec(name, owner = NULL)
samp_spec(name)
Arguments
name |
Name of the sample, variable, or time use variable. |
case_selections |
A character vector of values of the given variable that should be used to select cases. Values should be specified exactly as they appear in the "CODES" tab for the given variable in the web-based extract builder, including zero-padding (e.g. see the "CODES" tab for IPUMS CPS variable EDUC). |
case_selection_type |
One of Defaults to |
attached_characteristics |
Whose characteristics should be attached, if
any? Accepted values are For data collections with information on same-sex couples, specifying
|
data_quality_flags |
Logical indicating whether to include data quality flags for the given variable. By default, data quality flags are not included. |
adjust_monetary_values |
Logical indicating whether to include the variable's inflation-adjusted equivalent, if available. |
preselected |
Logical indicating whether the variable is preselected. This is not needed for external use. |
owner |
For user-defined time use variables, the email of the user account associated with the time use variable. Currently, only the email of the user submitting the extract request is supported. |
Value
A var_spec
, tu_var_spec
, or samp_spec
object.
Examples
var1 <- var_spec(
"SCHOOL",
case_selections = c("1", "2"),
data_quality_flags = TRUE
)
var2 <- var_spec(
"RACE",
case_selections = c("140", "150"),
case_selection_type = "detailed",
attached_characteristics = c("mother", "spouse")
)
# Use variable specifications in a microdata extract definition:
extract <- define_extract_micro(
collection = "usa",
description = "Example extract",
samples = "us2017b",
variables = list(var1, var2)
)
extract$variables$SCHOOL
extract$variables$RACE
# For IPUMS Time Use collections, use `tu_var_spec()` to include user-defined
# time use variables
my_time_use_variable <- tu_var_spec(
"MYTIMEUSEVAR",
owner = "example@example.com"
)
# IPUMS-defined time use variables can be included either as `tu_var_spec`
# objects or with just the variable name:
define_extract_micro(
collection = "atus",
description = "Requesting user- and IPUMS-defined time use variables",
samples = "at2007",
time_use_variables = list(
my_time_use_variable,
tu_var_spec("ACT_PCARE"),
"ACT_SOCIAL"
)
)
Wait for an extract request to finish processing
Description
Wait for an extract request to finish by periodically checking its status via the IPUMS API until it is complete.
is_extract_ready()
is a convenience function to check if an extract
is ready to download without committing your R session to waiting for
extract completion.
Learn more about the IPUMS API in vignette("ipums-api")
.
Usage
wait_for_extract(
extract,
initial_delay_seconds = 0,
max_delay_seconds = 300,
timeout_seconds = 10800,
verbose = TRUE,
api_key = Sys.getenv("IPUMS_API_KEY")
)
is_extract_ready(extract, api_key = Sys.getenv("IPUMS_API_KEY"))
Arguments
extract |
One of:
For a list of codes used to refer to each collection, see
|
initial_delay_seconds |
Seconds to wait before first status check. The wait time will automatically increase by 10 seconds between each successive check. |
max_delay_seconds |
Maximum interval to wait between status checks.
When the wait interval reaches this value, checks will continue to
occur at |
timeout_seconds |
Maximum total number of seconds to continue waiting for the extract before throwing an error. Defaults to 10,800 seconds (3 hours). |
verbose |
If |
api_key |
API key associated with your user account. Defaults to the
value of the |
Details
The status
of a submitted extract will be one of "queued"
, "started"
,
"produced"
, "canceled"
, "failed"
, or "completed"
.
To be ready to download, an extract must have a "completed"
status.
However, some requests that are "completed"
may still be unavailable for
download, as extracts expire and are removed from IPUMS servers after a set
period of time (72 hours for microdata collections, 2 weeks for IPUMS NHGIS).
Therefore, these functions also check the download_links
field of the
extract request to determine if data are available for download. If an
extract has expired (that is, it has completed but its download links are
no longer available), these functions will warn that the extract request
must be resubmitted.
Value
For wait_for_extract()
, an
ipums_extract
object containing the extract
definition and the URLs from which to download extract files.
For is_extract_ready()
, a logical value indicating
whether the extract is ready to download.
See Also
download_extract()
to download an extract's data files.
get_extract_info()
to obtain the definition of a submitted extract request.
Examples
my_extract <- define_extract_micro(
collection = "ipumsi",
description = "Botswana data",
samples = c("bw2001a", "bw2011a"),
variables = c("SEX", "AGE", "YEAR")
)
## Not run:
submitted_extract <- submit_extract(my_extract)
# Wait for a particular extract request to complete by providing its
# associated `ipums_extract` object:
downloadable_extract <- wait_for_extract(submitted_extract)
# Or by specifying the collection and number for the extract request:
downloadable_extract <- wait_for_extract("ipumsi:1")
# If you have a default collection, you can use the extract number alone:
set_ipums_default_collection("ipumsi")
downloadable_extract <- wait_for_extract(1)
# Use `download_extract()` to download the completed extract:
files <- download_extract(downloadable_extract)
# Use `is_extract_ready()` if you don't want to tie up your R session by
# waiting for completion
is_extract_ready("usa:1")
## End(Not run)
Remove label attributes from a data frame or labelled vector
Description
Remove all label attributes (value labels, variable labels, and variable descriptions) from a data frame or vector.
Usage
zap_ipums_attributes(x)
Arguments
x |
A data frame or labelled vector (for instance, from a data frame column) |
Value
An object of the same type as x
without "val_labels"
,
"var_label
", and "var_desc"
attributes.
See Also
Other lbl_helpers:
lbl()
,
lbl_add()
,
lbl_clean()
,
lbl_define()
,
lbl_na_if()
,
lbl_relabel()
Examples
cps <- read_ipums_micro(ipums_example("cps_00157.xml"))
attributes(cps$YEAR)
attributes(zap_ipums_attributes(cps$YEAR))
cps <- zap_ipums_attributes(cps)
attributes(cps$YEAR)
attributes(cps$INCTOT)