--- title: "Introduction to bivariateLeaflet" author: - Michael Duprey, Chris Inkpen; RTI International output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to bivariateLeaflet} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 7, warning = FALSE, message = FALSE ) ``` ## Introduction The `bivariateLeaflet` package provides tools for creating interactive bivariate choropleth maps using Leaflet. Bivariate choropleth maps allow you to visualize the relationship between two variables simultaneously across geographic regions. ## Statement of need The `bivariateLeaflet` package addresses a need in geospatial data visualization by providing a tool to create interactive bivariate choropleth maps using Leaflet. These maps, which enable the simultaneous visualization of relationships between two variables across geographic regions, are powerful tools for analyzing complex datasets. Despite their potential, creating bivariate maps has historically been a challenging task, requiring both technical expertise and substantial time investment. This package simplifies the process, making bivariate mapping more accessible to users of R. It is particularly valuable for researchers and analysts working in fields such as demography, justice, environmental studies, and public health. For instance, `bivariateLeaflet` allows users to explore the relationship between income and education, analyze correlations between temperature and precipitation, or visualize connections between healthcare access and health outcomes. By integrating seamlessly with the spatial data format, `sf`, and offering an intuitive interface, the package enables users to generate interactive maps that are both visually compelling and informative. The package’s functionality includes the ability to create bivariate maps with customizable color schemes, handle data challenges such as missing values and outliers, and provide clear, interpretable legends for effective communication. Through its integration with Leaflet, it ensures that the resulting visualizations are interactive and web-ready, enabling users to share insights widely. ## Installation You can install the package from CRAN: ```{r eval=FALSE} install.packages("bivariateLeaflet") ``` Or install the development version from GitHub: ```{r eval=FALSE} # install.packages("devtools") devtools::install_github("maduprey/bivariateLeaflet") ``` ## Data Preparation The package works with spatial data from various sources. We'll demonstrate using census data at different geographic levels. ### Census Tract Level Data ```{r eval=FALSE} library(tidycensus) library(tidyr) library(dplyr) library(sf) # Get census API key if you haven't already # census_api_key("YOUR_KEY_HERE") # Get ACS data for DC census tracts tract_data <- get_acs( geography = "tract", variables = c( "B01003_001", # Total population "B19013_001" # Median household income ), state = "DC", year = 2020, geometry = TRUE ) # Pivot data to wide format tract_data_wide <- tract_data %>% select(-moe) %>% pivot_wider( names_from = variable, values_from = estimate ) ``` ### County Level Data ```{r eval=FALSE} # Get county-level data for entire US county_data <- get_acs( geography = "county", variables = c( "B01003_001", # Total population "B19013_001" # Median household income ), year = 2020, geometry = TRUE ) county_data_wide <- county_data %>% select(-moe) %>% pivot_wider( names_from = variable, values_from = estimate ) ``` ### Block Group Level Data ```{r eval=FALSE} # Get block group data for DC blockgroup_data <- get_acs( geography = "block group", variables = c( "B01003_001", # Total population "B19013_001" # Median household income ), state = "DC", year = 2020, geometry = TRUE ) blockgroup_data_wide <- blockgroup_data %>% select(-moe) %>% pivot_wider( names_from = variable, values_from = estimate ) ``` ## Basic Usage Let's start with creating a basic bivariate choropleth map using the DC census tract data: ```{r} library(bivariateLeaflet) ``` ```{r eval=FALSE} # Create basic map tract_map <- create_bivariate_map( data = tract_data_wide, var_1 = "B01003_001", # Total population var_2 = "B19013_001" # Median household income ) # Display the map tract_map ``` ### Understanding the Default Color Scheme The default color scheme uses a 3x3 matrix where: - Darker blues indicate higher values in both variables - Lighter colors indicate lower values - The diagonal represents areas where both variables have similar relative values ```{r} # Display the default color matrix create_default_color_matrix() ``` ## Advanced Usage ### Custom Color Schemes ```{r eval=FALSE} sequential_colors <- matrix(c( "#49006a", "#2d004b", "#1a0027", "#8c96c6", "#8856a7", "#810f7c", "#edf8fb", "#bfd3e6", "#9ebcda" ), nrow = 3, byrow = TRUE) sequential_map <- create_bivariate_map( data = tract_data_wide, var_1 = "B01003_001", var_2 = "B19013_001", color_matrix = sequential_colors ) sequential_map ``` ### Handling Missing Data Census data often includes missing values (NA) for various reasons. Here's how to handle them: ```{r eval=FALSE} # Identify tracts with missing data missing_data <- tract_data_wide %>% mutate( missing_pop = is.na(B01003_001), missing_income = is.na(B19013_001) ) # Create a map excluding missing data clean_map <- create_bivariate_map( data = missing_data %>% filter(!missing_pop & !missing_income), var_1 = "B01003_001", var_2 = "B19013_001" ) clean_map ``` ### Working with Different Geographic Levels #### County Level Map ```{r eval=FALSE} # Create a map of US counties county_map <- create_bivariate_map( data = county_data_wide, var_1 = "B01003_001", var_2 = "B19013_001" ) county_map ``` #### Block Group Level Map ```{r eval=FALSE} # Create a detailed block group map blockgroup_map <- create_bivariate_map( data = blockgroup_data_wide, var_1 = "B01003_001", var_2 = "B19013_001" ) blockgroup_map ``` ### Custom Labels You can create custom tooltips for your map by providing your own labels: ```{r eval=FALSE} # Create custom labels with tract names and formatted values custom_labels <- sprintf( "Census Tract: %s
Population: %s
Median Income: $%s", tract_data_wide$NAME, format(tract_data_wide$B01003_001, big.mark = ","), format(tract_data_wide$B19013_001, big.mark = ",") ) # Create map with custom labels map_custom_labels <- create_bivariate_map( data = tract_data_wide, var_1 = "B01003_001", var_2 = "B19013_001", custom_labels = custom_labels ) map_custom_labels ``` This will create tooltips that show: - The census tract name - Population with thousands separator - Formatted median income with dollar sign and thousands separator ## Best Practices When creating bivariate choropleth maps: 1. **Variable Selection** - Choose variables that have a meaningful relationship - Consider whether the variables are independent - Think about the story you want to tell 2. **Data Distribution** - Check the distribution of your variables before mapping - Consider transforming highly skewed data - Be aware of outliers that might affect the visualization 3. **Color Schemes** - Use colorblind-friendly palettes - Ensure sufficient contrast between categories - Consider your audience when choosing colors 4. **Scale and Geography** - Choose an appropriate geographic level for your analysis - Consider the modifiable areal unit problem (MAUP) - Be consistent with projections 5. **Legend and Labels** - Provide clear, descriptive labels - Include units of measurement - Explain how to interpret the bivariate color scheme ## Error Handling The package includes various checks and warnings: ```{r error=TRUE} # Missing variable try(calculate_tertiles(data.frame(x = 1:5), "nonexistent", "also_nonexistent")) # Too few unique values test_data <- data.frame( var1 = c(1,1,1,2), var2 = c(1,2,3,4) ) calculate_tertiles(test_data, "var1", "var2") ``` ## Common Issues and Solutions ### Handling Extreme Outliers ```{r eval=FALSE} # Identify outliers using IQR method outlier_data <- tract_data_wide %>% mutate( pop_outlier = B01003_001 > quantile(B01003_001, 0.75, na.rm = TRUE) + 1.5 * IQR(B01003_001, na.rm = TRUE), income_outlier = B19013_001 > quantile(B19013_001, 0.75, na.rm = TRUE) + 1.5 * IQR(B19013_001, na.rm = TRUE) ) # Create map excluding outliers no_outliers_map <- create_bivariate_map( data = outlier_data %>% filter(!pop_outlier & !income_outlier), var_1 = "B01003_001", var_2 = "B19013_001" ) ``` ### Working with Different Projections ```{r eval=FALSE} # Reproject data if needed reprojected_data <- tract_data_wide %>% st_transform(4326) # WGS 84 # Create map with reprojected data projected_map <- create_bivariate_map( data = reprojected_data, var_1 = "B01003_001", var_2 = "B19013_001" ) ``` ## Further Reading For more information about bivariate choropleth maps: - Joshua Stevens' guide: [Bivariate Choropleth Maps](https://www.joshuastevens.net/cartography/make-a-bivariate-choropleth-map/) - Timo Grossenbacher's blog post: [Bivariate Maps with ggplot2 and sf](https://timogrossenbacher.ch/2019/04/bivariate-maps-with-ggplot2-and-sf/) - Census Bureau resources: [Working with Census Data](https://www.census.gov/programs-surveys/acs/guidance.html) ## Acknowledgements The development of this package was funded by Grant 2020-R2-CX-0027 from the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice. The opinions, findings, and conclusions or recommendations expressed are those of the authors and do not necessarily reflect those of the U.S. Department of Justice.