---
title: "chk Families"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{chk Families}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(chk)
```
## Introduction
The `vld_` functions are used within the `chk_` functions.
The `chk_` functions (and their `vld_` equivalents) can be divided into the following families.
In the code in this examples, we will use `vld_*` functions
If you want to learn more about the logic behind some of the functions explained here, we recommend reading the book [Advanced R](https://adv-r.hadley.nz/) (Wickham, 2019).
For reasons of space, the `x_name = NULL` argument is not shown.
For a more simplified list of the `chk` functions, you can see the [Reference](https://poissonconsulting.github.io/chk/reference/index.html) section.
## `chk_` Functions
### Overview
```{r chk_, echo = FALSE, out.width= "100%", fig.align='center', fig.alt = "Classification of the chk functions by family"}
knitr::include_graphics("chk_diagram_II.png")
```
### Missing Input Checker
Check if the function input is missing or not
`chk_missing` function uses `missing()`
to check if an argument has been left out when the function is called.
Function | Code
:- | :---
`chk_missing()` | `missing()`
`chk_not_missing()` | `!missing()`
### `...` Checker
Check if the function input comes from `...` (`dot-dot-dot`) or not
The functions `chk_used(...)` and `chk_unused(...)` check
if any arguments have been provided through `...` (called `dot-dot-dot` or ellipsis), which is commonly used in R to allow a variable number of arguments.
Function | Code
:- | :---
`chk_used(...)` | `length(list(...)) != 0L`
`chk_unused(...)` | `length(list(...)) == 0L`
### External Data Source Checkers
Check if the function input is a valid external data source.
These `chk` functions check the existence of a file, the validity of its extension, and the existence of a directory.
Function | Code
:- | :---
`chk_file(x)` | `vld_string(x) && file.exists(x) && !dir.exists(x)`
`chk_ext(x, ext)` | `vld_string(x) && vld_subset(tools::file_ext(x), ext)`
`chk_dir(x)` | `vld_string(x) && dir.exists(x)`
### NULL checker
Check if the function input is NULL or not
Function | Code
:- | :---
`chk_null(x)` | `is.null(x)`
`chk_not_null(x)` | `!is.null(x)`
### Scalar Checkers
Check if the function input is a scalar.
In R, scalars are vectors of length 1.
Function | Code
:- | :------
`chk_scalar(x)` | `length(x) == 1L`
The following functions check if the functions inputs are vectors of length 1 of a particular data type.
Each data type has a special syntax to create an individual value or "scalar".
Function | Code
:- | :------
`chk_string(x)` | `is.character(x) && length(x) == 1L && !anyNA(x)`
`chk_number(x)` | `is.numeric(x) && length(x) == 1L && !anyNA(x)`
For logical data types, you can check flags using `chk_flag()`, which considers `TRUE` or `FALSE` as possible values, or use `chk_lgl()` to verify if a scalar is of type logical, including NA as element.
Function | Code
:- | :-
`chk_flag(x)` | `is.logical(x) && length(x) == 1L && !anyNA(x)`
`chk_lgl(x)` | `is.logical(x) && length(x) == 1L`
It is also possible to check if the user-provided argument is only `TRUE` or only `FALSE`:
Function | Code
:- | :-
`chk_true(x)` | `is.logical(x) && length(x) == 1L && !anyNA(x) && x`
`chk_false(x)` | `is.logical(x) && length(x) == 1L && !anyNA(x) && !x`
### Date or DateTime Checkers
Check if the function input is of class Date or DateTime
Date and datetime classes can be checked with `chk_date` and `chk_datetime`.
Function | Code
:- | :------
`chk_date(x)` | `inherits(x, "Date") && length(x) == 1L && !anyNA(x)`
`chk_date_time(x)` | `inherits(x, "POSIXct") && length(x) == 1L && !anyNA(x)`
### Time Zone Checker
Also you can check the time zone with `chk_tz()`.
The available time zones can be retrieved using the function `OlsonNames()`.
Function | Code
:- | :------
`chk_tz(x)` | `is.character(x) && length(x) == 1L && !anyNA(x) && x %in% OlsonNames()`
#### Data Structure Checker
Check if the function input has a specific data structure.
Vectors are a family of data types that come in two forms: atomic vectors and lists.
When vectors consist of elements of the same data type, they can be considered atomic, matrices, or arrays.
The elements in a list, however, can be of different types.
To check if a function argument is a vector you can use `chk_vector()`.
Function | Code
:- | :---
`chk_vector(x)` | `is.atomic(x) && !is.matrix(x) && !is.array(x)) || is.list(x)`
Pay attention that `chk_vector()` and `vld_vector()`
are different from `is.vector()`, that will return FALSE if the vector has any attributes except names.
```{r}
vector <- c(1, 2, 3)
is.vector(vector) # TRUE
vld_vector(vector) # TRUE
attributes(vector) <- list("a" = 10, "b" = 20, "c" = 30)
is.vector(vector) # FALSE
vld_vector(vector) # TRUE
```
Function | Code
:- | :---
`chk_atomic(x)` | `is.atomic(x)`
Notice that `is.atomic` is true for the types logical, integer, numeric, complex, character and raw. Also, it is TRUE for NULL.
```{r}
vector <- c(1, 2, 3)
is.atomic(vector) # TRUE
vld_vector(vector) # TRUE
is.atomic(NULL) # TRUE
vld_vector(NULL) # TRUE
```
The dimension attribute converts vectors into matrices and arrays.
Function | Code
:- | :---
`chk_array(x)` | `is.array(x)`
`chk_matrix(x)` | `is.matrix(x)`
When a vector is composed by heterogeneous data types, can be a list.
Data frames are among the most important S3 vectors, constructed on top of lists.
Function | Code
:- | :---
`chk_list(x)` | `is.list()`
`chk_data(x)` | `inherits(x, "data.frame")`
Be careful not to confuse the function `chk_data` with `check_data`.
Please read the `check_` functions section below and the function documentation.
### Data Type Checkers
Check if the function input has a data type.
You can use the function `typeof()` to confirm the data type.
Function | Code
:- | :---
`chk_environment(x)` | `is.environment(x)`
`chk_logical(x)` | `is.logical(x)`
`chk_character(x)` | `is.character(x)`
For numbers there are four functions.
R differentiates between doubles (`chk_double()`) and integers (`chk_integer()`).
You can also use the generic function `chk_numeric()`, which will detect both.
The third type of number is complex (`chk_complex()`).
Function | Code
:- | :---
`chk_numeric(x)` | `is.numeric(x)`
`chk_double(x)` | `is.double(x)`
`chk_integer(x)` | `is.integer(x)`
`chk_complex(x)` | `is.complex(x)`
Consider that to explicitly create an integer in R, you need to use the suffix `L`.
```{r}
vld_numeric(33) # TRUE
vld_double(33) # TRUE
vld_integer(33) # FALSE
vld_integer(33L) # TRUE
```
### Whole Number Checkers
These functions accept whole numbers, whether they are explicitly integers or double types without fractional parts.
Function | Code
:- | :---
`chk_whole_numeric` | `is.integer(x) || (is.double(x) && vld_true(all.equal(x[!is.na(x)], trunc(x[!is.na(x)]))))`
`chk_whole_number` | `vld_number(x) && (is.integer(x) || vld_true(all.equal(x, trunc(x))))`
`chk_count` | `vld_whole_number(x) && x >= 0`
If you want to consider both 3.0 and 3L as integers, it is safer to use the function `chk_whole_numeric`.
Here, `x` is valid if it's an integer or a double that can be converted to an integer without changing its value.
```{r}
# Integer vector
vld_whole_numeric(c(1L, 2L, 3L)) # TRUE
# Double vector representing whole numbers
vld_whole_numeric(c(1.0, 2.0, 3.0)) # TRUE
# Double vector with fractional numbers
vld_whole_numeric(c(1.0, 2.2, 3.0)) # FALSE
```
The function `chk_whole_number` is similar to `chk_whole_numeric`.
`chk_whole_number` checks if the number is of `length(x) == 1L`
```{r}
# Integer vector
vld_whole_numeric(c(1L, 2L, 3L)) # TRUE
vld_whole_number(c(1L, 2L, 3L)) # FALSE
vld_whole_number(c(1L)) # TRUE
```
`chk_count()` is a special case of `chk_whole_number`, differing in that it ensures values are non-negative whole numbers.
```{r}
# Positive integer
vld_count(1) #TRUE
# Zero
vld_count(0) # TRUE
# Negative number
vld_count(-1) # FALSE
# Non-whole number
vld_count(2.5) # FALSE
```
### Factor Checker
Check if the function input is a factor
Function | Code
:- | :------
`chk_factor` | `is.factor(x)`
`chk_character_or_factor` | `is.character(x) || is.factor(x)`
Factors can be specially confusing for users, because despite they are displayed as characters are built in top of integer vectors.
`chk` provides the function `chk_character_or_factor()` that allows detecting if the argument that the user is providing contains strings.
```{r}
# Factor with specified levels
vector_fruits <- c("apple", "banana", "apple", "orange", "banana", "apple")
factor_fruits <- factor(c("apple", "banana", "apple", "orange", "banana", "apple"),
levels = c("apple", "banana", "orange"))
is.factor(factor_fruits) # TRUE
vld_factor(factor_fruits) # TRUE
is.character(factor_fruits) # FALSE
vld_character(factor_fruits) # FALSE
vld_character_or_factor(factor_fruits) # TRUE
```
### All Elements Checkers
Check if the function input has a characteristic shared by all its elements.
If you want to apply any of the previously defined functions for `length(x) == 1L` to the elements of a vector, you can use `chk_all()`.
Function | Code
:- | :---
`chk_all(x, chk_fun, ...)` | `all(vapply(x, chk_fun, TRUE, ...))`
```{r}
vld_all(c(TRUE, TRUE, FALSE), chk_lgl) # FALSE
```
### Function Checker
Check if the function input is another function
`formals` refers to the count of the number of formal arguments
Function | Code
:- | :------
`chk_function` | `is.function(x) && (is.null(formals) || length(formals(x)) == formals)`
```{r}
vld_function(function(x) x, formals = 1) # TRUE
vld_function(function(x, y) x + y, formals = 1) # FALSE
vld_function(function(x, y) x + y, formals = 2) # TRUE
```
### Name Checkers
Check if the function input has names and are valid `chk_named` function works with vectors, lists, data frames, and matrices that have named columns or rows.
Do not confuse with `check_names`.
`chk_valid_name` function specifically designed to check if the elements of a character vector are valid R names.
If you want to know what is considered a valid name,
please refer to the documentation for the `make.names` function.
Function | Code
:- | :--
`chk_named(x)` | `!is.null(names(x))`
`chk_valid_name(x)` | `identical(make.names(x[!is.na(x)]), as.character(x[!is.na(x)]))`
```{r}
vld_valid_name(c("name1", NA, "name_2", "validName")) # TRUE
vld_valid_name(c(1, 2, 3)) # FALSE
vld_named(data.frame(a = 1:5, b = 6:10)) # TRUE
vld_named(list(a = 1, b = 2)) # TRUE
vld_named(c(a = 1, b = 2)) # TRUE
vld_named(c(1, 2, 3)) # FALSE
```
### Range Checkers
Check if the function input is part of a range of values.
The function input should be numeric.
Function | Code
:- | :---
`chk_range(x, range = c(0, 1))` | `all(x[!is.na(x)] >= range[1] & x[!is.na(x)] <= range[2])`
`chk_lt(x, value = 0)` | `all(x[!is.na(x)] < value)`
`chk_lte(x, value = 0)` | `all(x[!is.na(x)] <= value)`
`chk_gt(x, value = 0)` | `all(x[!is.na(x)] > value)`
`chk_gte(x, value = 0)` | `all(x[!is.na(x)] >= value)`
### Equal Checkers
Check if the function input is equal or similar to a predefined object.
The functions `chk_identical()`, `chk_equal()`, and `chk_equivalent()` are used to compare two objects, but they differ in how strict the comparison is.
`chk_equal` and `chk_equivalent`checks if x and y are numerically equivalent within a specified tolerance, but `chk_equivalent` ignores differences in attributes.
Function | Code
:-- | :-
`chk_identical(x, y)` | `identical(x, y)`
`chk_equal(x, y, tolerance = sqrt(.Machine$double.eps))` | `vld_true(all.equal(x, y, tolerance))`
`chk_equivalent(x, y, tolerance = sqrt(.Machine$double.eps))` | `vld_true(all.equal(x, y, tolerance, check.attributes = FALSE))`
In the case you want to compare the elements of a vector, you can use the `check_all_*` functions.
Function | Code
:-- | :--
`chk_all_identical(x)` | `length(x) < 2L || all(vapply(x, vld_identical, TRUE, y = x[[1]]))`
`chk_all_equal(x, tolerance = sqrt(.Machine$double.eps))` | `length(x) < 2L || all(vapply(x, vld_equal, TRUE, y = x[[1]], tolerance = tolerance))`
`chk_all_equivalent(x, tolerance = sqrt(.Machine$double.eps))` | `length(x) < 2L || all(vapply(x, vld_equivalent, TRUE, y = x[[1]], tolerance = tolerance))`
```{r}
vld_all_identical(c(1, 2, 3)) # FALSE
vld_all_identical(c(1, 1, 1)) # TRUE
vld_identical(c(1, 2, 3), c(1, 2, 3)) # TRUE
vld_all_equal(c(0.1, 0.12, 0.13))
vld_all_equal(c(0.1, 0.12, 0.13), tolerance = 0.2)
vld_equal(c(0.1, 0.12, 0.13), c(0.1, 0.12, 0.13)) # TRUE
vld_equal(c(0.1, 0.12, 0.13), c(0.1, 0.12, 0.4), tolerance = 0.5) # TRUE
x <- c(0.1, 0.1, 0.1)
y <- c(0.1, 0.12, 0.13)
attr(y, "label") <- "Numbers"
vld_equal(x, y, tolerance = 0.5) # FALSE
vld_equivalent(x, y, tolerance = 0.5) # TRUE
```
### Order Checker
Check if the function input are numbers in increasing order
`chk_sorted` function checks if `x` is sorted in non-decreasing order, ignoring any NA values.
Function | Code
:- | :--
`chk_sorted(x)` | `!is.unsorted(x, na.rm = TRUE)`
```{r}
# Checking if sorted
vld_sorted(c(1, 2, 3, NA, 4)) # TRUE
vld_sorted(c(3, 1, 2, NA, 4)) # FALSE
```
### Set Checkers
Check if the function input is composed by certain elements
The `setequal` function in R is used to check if two vectors contain exactly the same elements, regardless of the order or number of repetitions.
Function | Code
:- | :---
`chk_setequal(x, values)` | `setequal(x, values)`
```{r}
vld_setequal(c(1, 2, 3), c(3, 2, 1)) # TRUE
vld_setequal(c(1, 2, 3), c(3, 2, 1, 4)) # FALSE
vld_setequal(c(1, 2, 3, 4), c(3, 2, 1)) # FALSE
vld_setequal(c(1, 2), c(1, 1, 1, 1, 1, 1, 2, 1)) # TRUE
```
First, the `%in%` function is used to check whether
the elements of a vector `x` are present in a specified set of values.
This returns a logical vector, which is then simplified by `all()`. The `all()` function checks if all values in the vector are TRUE.
If the result is TRUE, it indicates that for `vld_` and `chk_subset()`, all elements in the `x` vector are present in `values`.
Similarly, for `vld_` and `chk_superset()`, it indicates that all elements of `values` are present in `x`.
Function | Code
:-- | :--
`chk_subset(x, values)` | `all(x %in% values)`
`chk_not_subset(x, values)` | `!any(x %in% values) || !length(x)`
`chk_superset(x, values)` | `all(values %in% x)`
```{r}
# When both function inputs have the same elements,
# all functions return TRUE
vld_setequal(c(1, 2, 3), c(3, 2, 1)) # TRUE
vld_subset(c(1, 2, 3), c(3, 2, 1)) # TRUE
vld_superset(c(1, 2, 3), c(3, 2, 1)) # TRUE
vld_setequal(c(1, 2), c(1, 1, 1, 1, 1, 1, 2, 1)) # TRUE
vld_subset(c(1, 2), c(1, 1, 1, 1, 1, 1, 2, 1)) # TRUE
vld_superset(c(1, 2), c(1, 1, 1, 1, 1, 1, 2, 1)) # TRUE
# When there are elements present in one vector but not the other,
# `vld_setequal()` will return FALSE
vld_setequal(c(1, 2, 3), c(3, 2, 1, 4)) # FALSE
vld_setequal(c(1, 2, 3, 4), c(3, 2, 1)) # FALSE
# When some elements of the `x` input are not present in `values`,
# `vld_subset()` returns FALSE
vld_subset(c(1, 2, 3, 4), c(3, 2, 1)) # FALSE
vld_superset(c(1, 2, 3, 4), c(3, 2, 1)) # TRUE
# When some elements of the `values` input are not present in `x`,
# `vld_superset()` returns FALSE
vld_subset(c(1, 2, 3), c(3, 2, 1, 4)) # TRUE
vld_superset(c(1, 2, 3), c(3, 2, 1, 4)) # FALSE
# An empty set is considered a subset of any set, and any set is a superset of an empty set.
vld_subset(c(), c("apple", "banana")) # TRUE
vld_superset(c("apple", "banana"), c()) # TRUE
```
`chk_orderset()` validate whether a given set of `values` in a vector x matches a specified set of allowed `values` (represented by `values`) while preserving the order of those values.
Function | Code
:-- | :--
`chk_orderset` | `vld_equivalent(unique(x[x %in% values]), values[values %in% x])`
```{r}
vld_orderset(c("A", "B", "C"), c("A", "B", "C", "D")) # TRUE
vld_orderset(c("C", "B", "A"), c("A", "B", "C", "D")) # FALSE
vld_orderset(c("A", "C"), c("A", "B", "C", "D")) # TRUE
```
### Class Checkers
Check if the function input belongs to a class or type.
These functions check if `x` is an S3 or S4 object of the specified class.
Function | Code
:- | :---
`chk_s3_class(x, class)` | `!isS4(x) && inherits(x, class)`
`chk_s4_class(x, class)` | `isS4(x) && methods::is(x, class)`
`chk_is()` checks if x inherits from a specified class, regardless of whether it is an S3 or S4 object.
Function | Code
:- | :---
`chk_is(x, class)` | `inherits(x, class)`
### REGEX Checker
Check if the function input matches a regular expression (REGEX).
`chk_match(x, regexp = ".+")` checks if the regular expression pattern specified by `regexp` matches all the non-missing values in the vector `x`.
If `regexp` it is not specified by the user, `chk_match` checks whether all non-missing values in `x` contain at least one character (regexp = ".+")
Function | Code
:- | :--
`chk_match(x, regexp = ".+")` | `all(grepl(regexp, x[!is.na(x)]))`
### Quality Checkers (Miscellaneous)
Check if the function input meet some user defined quality criteria.
`chk_not_empty` function checks if the length of the object is not zero.
For a data frame or matrix, the length corresponds to the number of elements (not rows or columns),
while for a vector or list, it corresponds to the number of elements.
`chk_not_any_na` function checks if there are no NA values present in the entire object.
Function | Code
:- | :--
`chk_not_empty(x)` | `length(x) != 0L`
`chk_not_any_na(x)` | `!anyNA(x)`
```{r}
vld_not_empty(c()) # FALSE
vld_not_empty(list()) # FALSE
vld_not_empty(data.frame()) # FALSE
vld_not_empty(data.frame(a = 1:3, b = 4:6)) # TRUE
vld_not_any_na(data.frame(a = 1:3, b = 4:6)) # TRUE
vld_not_any_na(data.frame(a = c(1, NA, 3), b = c(4, 5, 6))) # FALSE
```
The `chk_unique()` function is designed to verify that there are no duplicates elements in a vector.
Function | Code
:- | :--
`chk_unique(x, incomparables = FALSE)` | `!anyDuplicated(x, incomparables = incomparables)`
```{r}
vld_unique(c(1, 2, 3, 4)) # TRUE
vld_unique(c(1, 2, 2, 4)) # FALSE
```
The function `chk_length` checks whether the length of `x` is within a specified range.
It ensures that the length is at least equal to `length` and no more than `upper`.
It can be used with vectors, lists and data frames.
Function | Code
:- | :--
`chk_length(x, length = 1L, upper = length)` | `length(x) >= length && length(x) <= upper`
```{r}
vld_length(c(1, 2, 3), length = 2, upper = 5) # TRUE
vld_length(c("a", "b"), length = 3) # FALSE
vld_length(list(a = 1, b = 2, c = 3), length = 2, upper = 4) # TRUE
vld_length(list(a = 1, b = 2, c = 3), length = 4) # FALSE
# 2 columns
vld_length(data.frame(x = 1:3, y = 4:6), length = 1, upper = 3) # TRUE
vld_length(data.frame(x = 1:3, y = 4:6), length = 3) # FALSE
# length of NULL is 0
vld_length(NULL, length = 0) # TRUE
vld_length(NULL, length = 1) # FALSE
```
Another useful function is `chk_compatible_lenghts()`.
This function helps to check vectors could be 'strictly recycled'.
```{r}
a <- integer(0)
b <- numeric(0)
vld_compatible_lengths(a, b) # TRUE
a <- 1
b <- 2
vld_compatible_lengths(a, b) # TRUE
a <- 1:3
b <- 1:3
vld_compatible_lengths(a, b) # TRUE
b <- 1
vld_compatible_lengths(a, b) # TRUE
b <- 1:2
vld_compatible_lengths(a, b) # FALSE
b <- 1:6
vld_compatible_lengths(a, b) # FALSE
```
The `chk_join()` function is designed to validate whether the number of rows in the resulting data frame from merging two data frames (`x` and `y`) is equal to the number of rows in the first data frame (`x`).
This is useful when you want to ensure that a join operation does not change the number of rows in your main data frame.
Function | Code
:- | :--
`chk_join(x, y, by)` | `identical(nrow(x), nrow(merge(x, unique(y[if (is.null(names(by))) by else names(by)]), by = by)))`
```{r}
x <- data.frame(id = c(1, 2, 3), value_x = c("A", "B", "C"))
y <- data.frame(id = c(1, 2, 3), value_y = c("D", "E", "F"))
vld_join(x, y, by = "id") # TRUE
# Perform a join that reduces the number of rows
y <- data.frame(id = c(1, 2, 1), value_y = c("D", "E", "F"))
vld_join(x, y, by = "id") # FALSE
```
## `check_` functions
The `check_` functions combine several `chk_` functions internally.
Read the documentation for each function to learn more about its specific use.
Function | Description
:- | :--
`check_values(x, values)` | Checks values and S3 class of an atomic object.
`check_key(x, key = character(0), na_distinct = FALSE)` | Checks if columns have unique rows.
`check_data(x, values, exclusive, order, nrow, key)` | Checks column names, values, number of rows and key for a data.frame.
`check_dim(x, dim, values, dim_name)` | Checks dimension of an object.
`check_dirs(x, exists)` | Checks if all directories exist (or if exists = FALSE do not exist as directories or files).
`check_files(x, exists)` | Checks if all files exist (or if exists = FALSE do not exist as files or directories).
`check_names(x, names, exclusive, order)` | Checks the names of an object.
## References
Wickham, H. (2019). Advanced R, Second Edition (2nd ed.). Chapman and Hall/CRC.