runner
an R package for running operations.
Package contains standard running functions (aka. rolling) with
additional options like varying window size, lagging, handling missings
and windows depending on date. runner brings also rolling
streak and rolling which, what extends beyond range of functions already
implemented in R packages. This package can be successfully used to
manipulate and aggregate time series or longitudinal data.
Install package from from GitHub or from CRAN.
# devtools::install_github("gogonzo/runner")
install.packages("runner")runner package provides functions applied on running
windows. The most universal function is runner::runner
which gives user possibility to apply any R function f in
running window. In example below 4-months correlation is calculated
lagged by 1 month.
library(runner)
x <- data.frame(
date = seq.Date(Sys.Date(), Sys.Date() + 365, length.out = 20),
a = rnorm(20),
b = rnorm(20)
)
runner(
x,
lag = "1 months",
k = "4 months",
idx = x$date,
f = function(x) {
cor(x$a, x$b)
}
)There are different kinds of running windows and all of them are
implemented in runner.
Following diagram illustrates what running windows are - in this case
running windows of length k = 4. For each of 15 elements of
a vector each window contains current 4 elements.

k denotes number of elements in window. If
k is a single value then window size is constant for all
elements of x. For varying window size one should specify k
as integer vector of length(k) == length(x) where each
element of k defines window length. If k is
empty it means that window will be cumulative (like
base::cumsum). Example below illustrates window of
k = 4 for 10th element of vector x.

runner(1:15, k = 4)lag denotes how many observations windows will be lagged
by. If lag is a single value than it is constant for all
elements of x. For varying lag size one should specify lag
as integer vector of length(lag) == length(x) where each
element of lag defines lag of window. Default value of
lag = 0. Example below illustrates window of
k = 4 lagged by lag = 2 for 10-th element of
vector x. Lag can also be negative value, which shifts
window forward instead of backward.

runner(
1:15,
k = 4,
lag = 2
)Sometimes data points in dataset are not equally spaced (missing
weekends, holidays, other missings) and thus window size should vary to
keep expected time frame. If one specifies idx argument,
than running functions are applied on windows depending on date.
idx should be the same length as x of class
Date or integer. Including idx
can be combined with varying window size, than k will denote number of
periods in window different for each data point. Example below
illustrates window of size k = 5 lagged by
lag = 2. In parentheses ranges for each window.

idx <- Sys.Date() + c(4, 6, 7, 13, 17, 18, 18, 21, 27, 31, 37, 42, 44, 47, 48)
runner(
x = 1:15,
k = "5 days",
lag = "1 days",
idx = idx
)Runner by default returns vector of the same size as x
unless one puts any-size vector to at argument. Each
element of at is an index on which runner calculates
function. Below illustrates output of runner for
at = c(18, 27, 45, 31) which gives windows in ranges
enclosed in square brackets. Range for at = 27 is
[22, 26] which is not available in current indices.

idx <- c(4, 6, 7, 13, 17, 18, 18, 21, 27, 31, 37, 42, 44, 47, 48)
runner(
x = idx,
k = 5,
lag = 1,
idx = idx,
at = c(18, 27, 48, 31)
)NA paddingUsing runner one can also specify
na_pad = TRUE which would return NA for any
window which is partially out of range - meaning that there is no
sufficient number of observations to fill the window. By default
na_pad = FALSE, which means that incomplete windows are
calculated anyway. na_pad is applied on normal cumulative
windows and on windows depending on date. In example below two windows
exceed range given by idx so for these windows are empty
for na_pad = TRUE. If used sets na_pad = FALSE
first window will be empty (no single element within
[-2, 3]) and last window will return elements within
matching idx.

idx <- c(4, 6, 7, 13, 17, 18, 18, 21, 27, 31, 37, 42, 44, 47, 48)
runner(
x = idx,
k = 5,
lag = 1,
idx = idx,
at = c(4, 18, 48, 51),
na_pad = TRUE
)data.frameUser can also put data.frame into x
argument and apply functions which involve multiple columns. In example
below we calculate beta parameter of lm model on 1, 2, …, n
observations respectively. On the plot one can observe how
lm parameter adapt with increasing number of
observation.
date <- Sys.Date() + cumsum(sample(1:3, 40, replace = TRUE)) # unequaly spaced time series
x <- cumsum(rnorm(40))
y <- 30 * x + rnorm(40)
df <- data.frame(date, y, x)
slope <- runner(
df,
k = 10,
idx = "date",
function(x) {
coefficients(lm(y ~ x, data = x))[2]
}
)
plot(slope)
abline(h = 30, col = "blue")The runner function can also compute windows in parallel
mode. The function doesn’t initialize the parallel cluster automatically
but one have to do this outside and pass it to the runner
through cl argument.
library(parallel)
#
numCores <- detectCores()
cl <- makeForkCluster(numCores)
runner(
x = df,
k = 10,
idx = "date",
f = function(x) sum(x$x),
cl = cl
)
stopCluster(cl)With runner one can use any R functions, but some of
them are optimized for speed reasons. These functions are:
- aggregating functions - length_run, min_run,
max_run, minmax_run, sum_run,
mean_run, streak_run
- utility functions - fill_run, lag_run,
which_run