Get started

Core functionality

The package is quite small and contains the core function sanityTracker::add_sanity_check and a few convenience functions (use the prefix sc_) that basically call sanityTracker::add_sanity_check. Some of the convenience functions like sanityTracker::sc_left_join perform more than one check, other can check multiple columns at the same time like sanityTracker::sc_cols_non_NA.

The most helpful feature is that no matter how deep your sanity check is buried in your source code, sanityTracker will centralize the results AND if the defined check fails a few examples of the failed rows are stored for investigation.

The functions are more or less self explanatory, therefore we focus on the stored results and examples.

We start with a very simple check.

library(sanityTracker)
sc <- sanityTracker::add_sanity_check(
  fail_vec = mtcars$mpg > 30,
  description = "mpg should be below 30"
)
get_sanity_checks()
#>               description additional_desc data_name  n n_fail n_na counter_meas
#> 1: mpg should be below 30               -           32      4    0            -
#>       fail_vec_str param_name                      call
#> 1: mtcars$mpg > 30          - eval(expr, envir, enclos)

We see that from the 32 observations contained in mtcars 4 observations have a mpg above 30. It also tracked how we actually performed the check in the column fail_vec_str. Usually, if failures happen, the next step is to actually investigate those cases. For this purpose that package offers the parameter data:

sc <- sanityTracker::add_sanity_check(
  fail_vec = mtcars$mpg > 30,
  description = "mpg should be below 30. extract example ",
  data = mtcars,
  param_name = "mpg"
)
get_sanity_checks()
#>                                 description additional_desc data_name  n n_fail
#> 1:                   mpg should be below 30               -           32      4
#> 2: mpg should be below 30. extract example                -    mtcars 32      4
#>    n_na counter_meas    fail_vec_str param_name                      call
#> 1:    0            - mtcars$mpg > 30          - eval(expr, envir, enclos)
#> 2:    0            - mtcars$mpg > 30        mpg eval(expr, envir, enclos)
#>         example
#> 1:             
#> 2: <data.frame>

First note that we now see two lines where the first one is from our initial sanity check. Furthermore, the second line shows now that the column example is not empty:

get_sanity_checks()[["example"]][[2]]
#>                 mpg cyl disp hp drat    wt  qsec vs am gear carb
#> Fiat 128       32.4   4 78.7 66 4.08 2.200 19.47  1  1    4    1
#> Honda Civic    30.4   4 75.7 52 4.93 1.615 18.52  1  1    4    2
#> Toyota Corolla 33.9   4 71.1 65 4.22 1.835 19.90  1  1    4    1

If you call the sanity check from within a function, the results also appear in the global list of all sanity checks and the table shows the function call where the check happened.

g <- function(x) {
  sanityTracker::add_sanity_check(
    fail_vec = x$mpg > 30,
    description = "mpg should be below 30. check in function",
    data = x,
    param_name = "mpg"
  )
}
f <- function(x) {g(x = x)}
dummy <- f(x = mtcars)
get_sanity_checks()
#>                                  description additional_desc data_name  n
#> 1:                    mpg should be below 30               -           32
#> 2:  mpg should be below 30. extract example                -    mtcars 32
#> 3: mpg should be below 30. check in function               -         x 32
#>    n_fail n_na counter_meas    fail_vec_str param_name
#> 1:      4    0            - mtcars$mpg > 30          -
#> 2:      4    0            - mtcars$mpg > 30        mpg
#> 3:      4    0            -      x$mpg > 30        mpg
#>                         call      example
#> 1: eval(expr, envir, enclos)             
#> 2: eval(expr, envir, enclos) <data.frame>
#> 3:                  g(x = x) <data.frame>

The function sanityTracker::clear_sanity_checks discards all sanity checks that are currently stored.

sanityTracker::clear_sanity_checks()
sanityTracker::get_sanity_checks()
#> NULL

Convenience functions

Doing all checks with sanityTracker::add_sanity_check would be cumbersome. Therefore, the package provides some convenience functions to perform some standard checks like whether columns a, b, c, x, y, z are positive or do they contain missing values or is their combination unique. All convenience functions start with the prefix sc_. These functions provide additional information about the check(s) that they perform in the column additional_desc. So checking that all columns of mtcars are positive is quite easy.

sc <- sanityTracker::sc_cols_positive(
  object = mtcars,
  cols = names(mtcars),
  description = "Exemplary sanity checks"
)
get_sanity_checks()
#>                 description                           additional_desc data_name
#>  1: Exemplary sanity checks  Elements in 'mpg' should be in [0, Inf).    mtcars
#>  2: Exemplary sanity checks  Elements in 'cyl' should be in [0, Inf).    mtcars
#>  3: Exemplary sanity checks Elements in 'disp' should be in [0, Inf).    mtcars
#>  4: Exemplary sanity checks   Elements in 'hp' should be in [0, Inf).    mtcars
#>  5: Exemplary sanity checks Elements in 'drat' should be in [0, Inf).    mtcars
#>  6: Exemplary sanity checks   Elements in 'wt' should be in [0, Inf).    mtcars
#>  7: Exemplary sanity checks Elements in 'qsec' should be in [0, Inf).    mtcars
#>  8: Exemplary sanity checks   Elements in 'vs' should be in [0, Inf).    mtcars
#>  9: Exemplary sanity checks   Elements in 'am' should be in [0, Inf).    mtcars
#> 10: Exemplary sanity checks Elements in 'gear' should be in [0, Inf).    mtcars
#> 11: Exemplary sanity checks Elements in 'carb' should be in [0, Inf).    mtcars
#>      n n_fail n_na counter_meas
#>  1: 32      0    0            -
#>  2: 32      0    0            -
#>  3: 32      0    0            -
#>  4: 32      0    0            -
#>  5: 32      0    0            -
#>  6: 32      0    0            -
#>  7: 32      0    0            -
#>  8: 32      0    0            -
#>  9: 32      0    0            -
#> 10: 32      0    0            -
#> 11: 32      0    0            -
#>                                                                  fail_vec_str
#>  1: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#>  2: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#>  3: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#>  4: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#>  5: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#>  6: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#>  7: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#>  8: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#>  9: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#> 10: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#> 11: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#>     param_name                      call
#>  1:        mpg eval(expr, envir, enclos)
#>  2:        cyl eval(expr, envir, enclos)
#>  3:       disp eval(expr, envir, enclos)
#>  4:         hp eval(expr, envir, enclos)
#>  5:       drat eval(expr, envir, enclos)
#>  6:         wt eval(expr, envir, enclos)
#>  7:       qsec eval(expr, envir, enclos)
#>  8:         vs eval(expr, envir, enclos)
#>  9:         am eval(expr, envir, enclos)
#> 10:       gear eval(expr, envir, enclos)
#> 11:       carb eval(expr, envir, enclos)

Note that although the convenience functions do not explicitly list the parameter, description, counter_meas, data_name, example_size, param_name, call and fail_callback can be used via the ‘…’-argument.

clear_sanity_checks()
sc <- sanityTracker::sc_col_elements(
  object = mtcars,
  col = "carb",
  feasible_elements = 1:4,
  description = "Only usual number of carburetors",
  fail_callback = warning,
  call = "directly from vignette"
)
#> Warning in (function (fail_vec, description, counter_meas, data, data_name, :
#> Only usual number of carburetors/Elements in 'carb' should contain only '1',
#> '2', '3', '4'.: FAILED
get_sanity_checks()
#>                         description
#> 1: Only usual number of carburetors
#>                                               additional_desc data_name  n
#> 1: Elements in 'carb' should contain only '1', '2', '3', '4'.    mtcars 32
#>    n_fail n_na counter_meas                            fail_vec_str param_name
#> 1:      2    0            - !(object[[col]] %in% feasible_elements)       carb
#>                      call      example
#> 1: directly from vignette <data.frame>