--- title: "Get Data Matchready" output: rmarkdown::html_vignette description: > This vingette shows you how to use the package for preparing datasets to be linked vignette: > %\VignetteIndexEntry{get_data_matchready} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Want to link up your data from different sources? Awesome! Just a heads-up, you'll probably need to do some cleaning first. Let's dive in and see how our package makes getting your SGIC data ready super easy. We\`ll start by loading trustmebro: ```{r setup, message=FALSE} library(trustmebro) ``` # Data Our key data set `trustmebro::sailor_keys` is a longitudinal data set in long format. It is a tibble with `r nrow(sailor_keys)` rows and `r ncol(sailor_keys)` columns. This data should be linked with our survey data `trustmebro::sailor_students`, a tibble with `r nrow(sailor_students)` rows and `r ncol(sailor_students)` columns. Let us take a quick look at the survey data: ```{r} print(trustmebro::sailor_students) ``` # Replace non-alphanumeric characters you don't want to deal with Yep, this data needs cleaning. There's a lot of unnecessary stuff, like whitespace. You see this all the time with survey data strings. We can replace all non-alphanumeric characters in string-variables of our data set `trustmebro::sailor_students` using `trustmebro::purge_string`: ```{r} purge_string(sailor_students, replacement = "#") ``` Please note that since we deal with data collected in Germany, umlauts remain unchanged from this. # Recode variables A few variables need recoding for further analysis. For that, we can provide a recode map: ```{r} recode_map <- c(MALE = "M", FEMALE = "F") ``` The recode_map is a named vector where the names represent categories (in this case, "Male" and "Female"), and the values ("M" and "F") are the corresponding codes used for those categories. It is used to map full category labels to shorter, standardized values. We can pass it to `trustmebro::recode_valinvec`, to recode the values accordingly. A new variable will be added that contains the recoded values ```{r} recode_valinvec(purge_string(sailor_students, replacement = "#"), gender, recode_map, gender_recode) ```