--- title: "Introduction to dRiftDM" output: rmarkdown::html_vignette bibliography: REFERENCES.bib vignette: > %\VignetteIndexEntry{Introduction to dRiftDM} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo = FALSE, message=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(dRiftDM) set.seed(1014) ``` When working with drift-diffusion models (DDMs) you probably want to: * Build an new or select an existing model * Explore the model behavior and validate that model parameters are estimated reliably * Fit the model to data * Explore the model fit The dRiftDM package helps you to do this: * With dedicated functions and workflows * Options to customize models * With efficient algorithms to derive the model predictions [see @Richteretal.2023] Three pre-built models are currently available: * The Ratcliff Diffusion Model [see `ratcliff_dm()`, @Ratcliff1978] * The Diffusion Model for Conflict Tasks [see `dmc_dm()`, @Ulrichetal.2015; @Janczyketal.2024] * The Shrinking Spotlight Model [see `ssp_dm()`, @Whiteetal.2011] This document introduces you to dRiftDM, focusing on first steps in exploring and fitting a DDM. ## An Examplary Model To explore some of the basic functions of dRiftDM, we'll use the Diffusion Model for Conflict Tasks (DMC). It is a diffusion model commonly employed in the context of cognitive psychology. To create the model, we call the pre-built function `dmc_fun()` and assign its output to a variable: ```{r} ddm <- dmc_dm() ``` ## Basic Properties of a Model When printing a model to the console, we obtain detailed information about it: ```{r} print(ddm) ``` Here we get a glimpse on the underlying structure of any model created with dRiftDM. For DMC this is: - The model is of type `dmc_dm` - The model has the parameters `muc`, `b`, ..., `alpha`, and the current parameter values for each conditions are shown under `Current Parameter Matrix`. - Below this, under `Unique Parameters`, we obtain how each parameter behaves across conditions. If a number is the same for a parameter across conditions, this means that this parameter is equated across conditions. For example, the parameter `muc` is assumed to be identical for the conditions `comp` and `incomp`. If a number is zero, this means that this parameter is assumed to be "fixed" and thus is not a "free" parameter that can be estimated. If an entry shows a "d", this means there is a special dependency, as listed under `Special Dependencies` (see `vignette("customize_ddms", "dRiftDM")` for more information). - When fitting or exploring a model, we will have to derive the model predictions in terms of the first-passage-times (i.e., the duration of central response selection in the context of psychology). The settings for this are shown under `Deriving PDFs`. Currently, predictions are derived by a numerical discretization of the Kolmogorov-Forward-Equation (`kfe`). The diffusion constant `sigma` is 1, the maximum time space is 3 seconds, and the discretization in time and space is done in steps of 0.001. # Exploring a Model To explore a model, dRiftDM provides: - `simulate_traces()` simulates realizations of the diffusion process - `calc_stats()` calculates summary statistics of model predictions ## `simulate_traces()` Realizations of a diffusion process, that is, single evidence accumulation *traces* for central response selection, are common ways to visualize a diffusion model. The first argument requires the model object. The second argument requires the number of realizations to simulate. For example, we can simulate 5 traces for DMC per condition: ```{r} five_traces <- simulate_traces(object = ddm, k = 5) five_traces ``` Per default, traces are simulated by assuming a fixed starting value of zero. To simulate traces with a starting point, we can set the argument `add_x = TRUE`: ```{r} five_traces <- simulate_traces(object = ddm, k = 5, add_x = TRUE) five_traces ``` In the context of DMC, starting values of the traces are drawn from a symmetric beta distribution [see @Ulrichetal.2015]. We can easily visualize these traces by calling the generic `plot()` method: ```{r} plot(five_traces, col = c("green", "red")) ``` When visualizing the basic model behavior, one often wants to display the expected time-course of the diffusion process. We can do so by eliminating the stochastic noise with setting the argument `sigma = 0`. ```{r} exp_behavior <- simulate_traces(object = ddm, k = 1, sigma = 0) plot(exp_behavior, col = c("green", "red")) ``` ## `calc_stats()` A DDM predicts response choices and response times, with the latter being the sum of the first-passage-time (i.e., the duration of central response selection) and the non-decision time. We can request summary statistics of this prediction with `calc_stats()`. The first argument requires the model object. The second argument a character vector, specifying the `type` of summary statistic. In the context of cognitive psychology, quantiles and so-called Conditional Accuracy Functions (CAFs) are common ways to summarize the model predictions: ```{r} sum_stats <- calc_stats(object = ddm, type = c("cafs", "quantiles")) sum_stats ``` We can visualize summary statistics with the `plot()` method: ```{r} plot(sum_stats, col = c("green", "red")) ``` # Changing Model Properties To get or set properties of the model, dRiftDM provides accessor/replacement methods for: - `coef()` accesses/replaces parameter values - `prms_solve()` accesses/replaces settings for deriving model predictions (this also includes changing the diffusion constant) - `solver()` accesses/replaces the method for deriving model predictions - `b_coding()` accesses/replaces the coding of the upper and lower boundary - `obs_data()` accesses/replaces the data set (of a single participant) attached to the model - `flex_prms()` accesses/replaces the object that controls how each parameter relates across conditions - `conds()` accesses/replaces the conditions of a model - `comp_funs()` accesses/replaces the underlying component functions for the drift rate, boundary, etc. `comp_funs()`, `flex_prms()`, and `conds()` are covered in `vignette("customize_ddms", "dRiftDM")`. ## `coef()` ```{r} coef(ddm) ``` This returns a unique representation of the parameters and their associated values. Note that this drops parameters that are not estimable. We can combine `coef()` with the `[]` operator to change the values of the parameters: ```{r} coef(ddm)["muc"] <- 5 coef(ddm) ``` To request the entire parameter matrix with all parameter values across conditions, we can set the argument `select_unique = FALSE`: ```{r} coef(ddm, select_unique = FALSE) ``` In this case, we can not combine `coef()` with the `[]` operator. To change a parameter value for a specific condition, you can use the function `modify_flex_prms()`. ## `prms_solve()` ```{r} prms_solve(ddm) ``` This shows the diffusion constant and the discretization settings. We can again use a combination with `[]` to modify these values. ```{r} prms_solve(ddm)["dt"] <- .0025 prms_solve(ddm) ``` ## `solver()` ```{r} solver(ddm) ``` This shows the currently set method for deriving the model's predicted probability density functions of response time and choice. Currently supported options are `"kfe"` and `"im_zero"`. While the `"kfe"` method can be applied to all models in dRiftDM, `"im_zero"` can only be used when the starting point is fixed to zero. ## `b_coding()` ```{r} b_coding(ddm) ``` This returns a list that specifies how the boundaries of a DDM are coded. We can change the boundary coding by modifying the returned list: ```{r} copy <- ddm # to not change the original model object b_coding(copy)$column <- "Response" b_coding(copy)$u_name_value <- c("left" = -1) b_coding(copy)$l_name_value <- c("right" = 1) b_coding(copy) ``` ## `obs_data()` We can set observed data of a single individual to a model (or access it) with `obs_data()`. When setting observed data, we have to make sure that the supplied `data.frame` provides columns matching with the boundary coding and the conditions of the model. ```{r} data <- dRiftDM::dmc_synth_data # some synthetic data suitable for DMC that ships with dRiftDM # the Cond column matches with conds(ddm). # The Error column matches b_coding(ddm) # the RT column is in seconds ;) head(data) obs_data(ddm) <- data ``` Note that the supplied data set is not stored "as is" within the model object. Thus, when accessing a data set of a model, the data is re-assembled, and this might change the order of rows or column with respect to the original data set. ## The `summary()` Function We can request a detailed summary of the model, providing information about it's core properties with the generic `summary()` function: ```{r} summary(ddm) ``` # Fitting a Model To fit a model to observed data you can use: - `estimate_model()` fits a model to the data of one participant - `estimate_model_ids()` a wrapper around `estimate_model()` to fit a model to multiple participants (participant-wise). ## `estimate_model()` Given a data set, the parameters of a model in dRiftDM are estimated via Differential Evolution and/or (bounded) Nelder-Mead. The cost function is based on the log-likelihood. The first argument requires the model. The second and third arguments are the lower and upper boundaries of the search space. Per default, Differential Evolution is used, as it is more robust. Yet, to ensure that this vignette runs quickly, we will use Nelder-Mead. A tricky choice regards the discretization settings. Per default, dRiftDM discretizes the time and evidence space in steps of 0.001. This is a conservative setting, ensuring high numerical accuracy. Yet, high numerical accuracy comes at the expense of a high computational burden that increases non-linearly. It is thus recommended to make the discretization more coarse to reduce the time waiting in front of the computer. As a rule of thumb, we currently recommend setting the discretization steps between 0.001 and 0.005. Yet, the impact of the respective settings depend on the model and its parameterization. As a consequence, users will have to make sure that their discretization settings lead to reasonable model predictions. Another choice is the maximum time space. It should be large enough to easily cover the longest response time in the observed data set. The following code adjusts the discretization settings, and subsequently fits DMC to a single data set. ```{r} # get some data (here we use again the synthetic data that ships with dRiftDM) data <- dRiftDM::dmc_synth_data head(data) # increase the discretization steps to 0.005 and set the maximum time space to 1.5 seconds prms_solve(ddm)["dt"] <- .005 prms_solve(ddm)["dx"] <- .005 prms_solve(ddm)["t_max"] <- 1.5 max(data$RT) # maximum time space easily covers the maximum RT # attach the data to the model obs_data(ddm) <- data # now call the estimation routine ddm <- estimate_model( drift_dm_obj = ddm, lower = c(muc = 1, b = .3, non_dec = .1, sd_non_dec = .005, tau = .03, A = .01, alpha = 2), upper = c(muc = 6, b = .9, non_dec = .5, sd_non_dec = .050, tau = .12, A = .15, alpha = 9), use_de_optim = FALSE, # overrule the default Differential Evolution setting use_nmkb = TRUE ) coef(ddm) ``` We can also extract the fit statistic and information criteria: ```{r} logLik(ddm) AIC(ddm) BIC(ddm) ``` ## Checking Model Fit (1) To qualitatively check if a model fits the data, we can use `calc_stats()` in combination with the `plot()` method. Circles indicate observed data while lines indicate the model's predictions. ```{r} check_fit <- calc_stats(object = ddm, type = c("cafs", "quantiles")) plot(check_fit, col = c("green", "red")) ``` ## `estimate_model_ids()` Oftentimes, we don't want to fit a model to a single individual, but to multiple individuals. For this, you can use `estimate_model_ids()`, which is a wrapper around `estimate_model()`. In the following code chunk, we fit DMC to the first two individuals of the flanker data set provided by @Ulrichetal.2015 (included in `dRiftDM`). ```{r} flanker_data <- dRiftDM::ulrich_flanker_data head(flanker_data) flanker_data <- flanker_data[flanker_data$ID %in% c(1, 2), ] obs_data(ddm) <- NULL # detach data (from the previous sections) to avoid a warning estimate_model_ids( drift_dm_obj = ddm, obs_data_ids = flanker_data, lower = c(muc = 1, b = .3, non_dec = .1, sd_non_dec = .005, tau = .03, A = .01, alpha = 2), upper = c(muc = 6, b = .9, non_dec = .5, sd_non_dec = .050, tau = .12, A = .15, alpha = 9), fit_procedure_name = "flanker_test_run", # a label to identify the fits fit_path = tempdir(), # to save fits in the working directory use getwd() use_de_optim = FALSE, # overrule the default Differential Evolution setting use_nmkb = TRUE ) ``` We can then load all saved fits using `load_fits_ids()`: ```{r} all_fits <- load_fits_ids(path = tempdir(), fit_procedure_name = "flanker_test_run") all_fits ``` A summary of the parameter estimates across all conditions can be requested via the generic `summary()` function: ```{r} summary(all_fits) ``` Note that most of DMC's parameters are per default equal across conditions. This is why you obtain exactly the same mean and standard errors for each condition. If you want to have the individual parameter estimates across conditions, you can call: ```{r} coef_fits <- coef(all_fits) coef_fits ``` Fit statistics and information criteria across individuals can be extracted as well: ```{r} logLik(all_fits) AIC(all_fits) BIC(all_fits) ``` ## Checking Model Fit (2) As before, we can check the model fit using a combination of `calc_stats()` and `plot()`: ```{r} check_fit <- calc_stats(object = all_fits, type = c("cafs", "quantiles")) plot(check_fit, col = c("green", "red")) ``` # Simulating Data To simulate data based on a model, we can use the `simulate_data()` function. The first argument takes the model object. The second argument is a numeric (vector), defining the number of trials per condition: ```{r} ddm <- ratcliff_dm() # a model for demonstration purpose sim_1 <- simulate_data(object = ddm, n = 200) head(sim_1) ``` This returns a single synthetic data set for one "participant". Note that the simulated response times depend on the time discretization. For example, if the time discretization is `dt` = .005, then the response times can only differ in steps of .005. By specifying the argument `k`, we can simulate multiple synthetic data sets simultaneously. In this case, however, we must also specify lower and upper bounds to define the simulation space when drawing random parameter combinations per data set (see the `simulate_data` documentation for more details): ```{r} sim_2 <- simulate_data( object = ddm, n = 200, k = 2, lower = c(muc = 1, b = .4, non_dec = .2), upper = c(muc = 6, b = .8, non_dec = .4) ) ``` This returns a list with the synthetic data sets and the corresponding parameter values: ```{r} head(sim_2$synth_data) head(sim_2$prms) ``` The returned data sets are in a format that can be passed directly to `estimate_model_ids()`, making parameter recovery exercises very easy. # Remark: Stripping Away "Unnecesary" Attributes and Class Labels Many functions in `dRiftDM` return summary statistics or data.frames. Often these returned objects are S3 objects with a custom class label. In principle, this is great because it allows you to reuse generic functions like `plot()`, `print()`, or `summary()` as we saw above. However, sometimes this also hides the true structure of the underlying data type. Therefore, we provide the `unpack_obj()` function, which allows you to strip away unnecessary attributes and class labels. This can be useful, for example, if you want to create your own plot or wrangle the data into a particular format. ```{r} ddm <- dmc_dm(dt = .005, dx = .005) traces <- simulate_traces(ddm, k = 2) # although this object is essentially a list of matrices, the class label ... class(traces) print(traces) # ... leads to nicely formatted output; but hides the underlying structure raw <- unpack_obj(traces) # provides the plain list of matrices head(t(raw$comp)) ``` ## References