AdhereR: Interactive plotting (and more) with Shiny

Dan Dediu ddediu@gmail.com

2022-06-23

Introduction

AdhereR is an R package that implements, in an open and standardized manner, various methods linked to the estimation of adherence to treatment from a variety of data sources and formats (please see the other vignettes in the package, by involving, for example browseVignettes(package="AdhereR") or by visiting the package’s site on CRAN). One of the main aims of the package is to allow users to produce high quality, publication-ready and highly customizable graphical representations of both the patterns in the raw data and of the various estimates of adherence. This can be normally achieved from an R session or script using the plot() function applied to an estimated CMA object (the raw patterns are plotted by creating a basic CMA0 object), as detailed in the AdhereR: Adherence to Medications vignette. However, while allowing for a very fine-grained control over the resulting plots, this requires a certain level of familiarity with R (loading the source data, creating the appropriate CMA object, invoking the plot() function with the desired parameters, and the export of the resulting plot in the desired format at the quality and with the other desired characteristics), on the one hand, and the process is rather cumbersome when the user wants to explore and understand the data, or to try various types of plotting in search for the optimal visualization, on the other.

These reasons prompted us to develop a fully interactive user interface that should hide the “gory details” of data loading, CMA computation and plot() invocation under an intuitive and easy to use point-and-click interface, while allowing fast exploration and customization of the plots. However, because this interactive user interface covers a rather particular set of use cases, tends to be rather “heavy” in terms of dependencies, and may not install or run properly in some environments (e.g., headless servers or older systems), we decided to implement it in a separate package, AdhereRViz that extends AdhereR (i.e., AdhereRViz requires AdhereR, but AdhereR cn happily run without AdhereRViz).

Overview

We use Shiny, which allows us to build a self-contained app that can be run locally or remotely inside a standard web browser (such as Firefox, Google Chrome, Safari, Internet Explorer, Edge or Opera) on multiple Operating Systems (such as Microsoft Windows, Apple’s [macOS]https://en.wikipedia.org/wiki/MacOS) and iOS, Google’s Android, and several flavors of Linux – e.g., Debian, Ubuntu, Fedora, RedHat, CentOS, Arch… – and BSD – e.g., FreeBSD) and devices ranging from desktop and laptop computers to mobile phones and tablets. The app’s interface uses standard controls and paradigms, ensuring a similar user experience across browsers, platforms and devices.

Launching the app

Locally, the app can be launched from a normal R session (including from within RStudio) or script with a single command; of course, the latest version of AdhereR and AdhereRViz must be installed on the system (using, for example, install.packages("AdhereRViz", dep=TRUE) or RStudio’s ToolsInstall Packages… menu; or, in case it is already installed, updated using update.packages() or RStudio’s ToolsCheck for Package Updates… menu), and loaded in the current session (using, for example, library(AdhereRViz) or require(AdhereRViz)). With these prerequisites in order, the app can be launched without any parameters with

plot_interactive_cma()

or, if so desired, by specifying a data source and the important column names and optionally the desired CMA, as in the following example, where we use CMA0 (i.e., the raw data) from the sample dataset med.events (see its structure in the Table below) included with the AdhereR package:

plot_interactive_cma(data=med.events, # included sample dataset
                     cma.class="simple", # simple cma, defaults to CMA0
                     # The important column names:
                     ID.colname="PATIENT_ID",
                     event.date.colname="DATE",
                     event.duration.colname="DURATION",
                     event.daily.dose.colname="PERDAY",
                     medication.class.colname="CATEGORY",
                     # The format of dates in the "DATE" column:
                     date.format="%m/%d/%Y");
The structure of the sample dataset med.events included in the AdhereR package. We shows the rows 23 to 33 (out of a total of 1080 rows), each row representing one event which is characterized at the minimum by: the patient it refers to (identified by the patient’s unique ID in column PATIENT_ID), the date it happened (in column DATE, recorded in a uniform format, here MM/DD/YYYY), its duration in days (in column DURATION); optionally we can also have info concerning the prescribed daily quantity or dose (column PERDAY) and the class or type of treatment (column CATEGORY). We will use this dataset throughout this vignette.
PATIENT_ID DATE PERDAY CATEGORY DURATION
1 03/22/2035 2 medB 30
1 03/31/2035 2 medB 30
2 01/20/2036 4 medA 50
2 03/10/2036 4 medA 50
2 08/01/2036 4 medA 50
2 08/01/2036 4 medB 60
2 09/21/2036 4 medB 60
2 01/24/2037 4 medB 60
2 04/16/2037 4 medB 60
2 05/08/2037 4 medB 60
3 04/13/2042 4 medA 50

The app can also be launched in the “standard way” using runApp() or RStudio’s ▶︎ Run app button.

Alternatively, the app may be made available on a remote server, such as on https://www.shinyapps.io/, in which case it can be accessed simply by pointing the web browser to the app’s internet address.

Please note that launching the App with no parameters, opens with a different screen (see the Selecting/changing the data source section for details).

Also note that there we provide a “stub” function plot_interactive_cma() in the package AdhereR, but this simply checks if AdhereRViz is installed and functional, and then tries to invoke plot_interactive_cma() from AdhereRViz.

The App’s User Interface (UI)

The App’s UI has several main elements which can be seen below. Most UI elements have tooltips that show up on hovering the mouse over the element and that offer specific information (but please note that these tooltips might need some time before showing up).

Overview of the User Interface. Screenshot (App is running in Firefox on macOS 10.13) of some of the main UI elements (dotted black ellipses or rectangles identified with black numbers). 1 is a button that opens a box giving information about the App. 2 is a button that exist the App cleanly (i.e., stops and disconnects the session). 3 is the main plotting area which displays the current plot. 4 shows some of the UI elements controlling the plot (element 4) size. 5 is the area where various parameters can be modified. 6 opens UI elements specific for saving the current plot to file. 7 shows the R code that can be used to generate the current plot. 8 gives access to UI controls that allow the computation of the current CMA for many more patients and saving the results to file. 9 displays messages, warnings or errors that might have been geenrated while constructing the current plot.

About the App

Clicking on the About button (element 1 in the overview figure) opens a box with info about the App, such as the version of the AdhereR package, and overview of the package and links to where more help (such as vignettes) can be found.

Cleanly exiting the App

It is recommended to cleanly exit the App by clicking the Exit… button (element 2 in the overview figure), as simply closing the browser will not normally also stop the R process running in the background. Please note that currently, exiting the App will not also close the browser window or tab in which the App was running…

The plotting area (and the messages)

The current plot is displayed in the UI element 3 (in the overview figure), a canvas that can be re-sized using the UI elements 4 in the overview figure (see also below) and which, when too big, can be scrolled horizontally and vertically at will. This canvas is currently passive in the sense that it simply displays a plot which which interaction is possible only using the other elements of the UI, but almost all aspects of this plot can be tweaked using controls from the left-hand side vertical panel (element 5 in the overview figure; see details below).

While the interpretation of these plots should be relatively intuitive, it is nevertheless detailed in the AdhereR: Adherence to Medications vignette.

Element 9 in the overview figure displays most of the information messages, warnings and errors generated during the plotting process (please note that currently some messages, warnings and errors might not be captured and only shown in the R console). For example, here, the informational message Plotting patient ID ‘1’ with CMA ‘CMA9’ Plotting patient ID ‘2’ with CMA ‘CMA9’ means that the computation and plotting of CMA9 for patients with IDs 1 and 2 was successful.

Elements 4 in the overview figure allow the control of the horizontal and vertical dimensions of the plot 3 either coupled (i.e., keeping the current width–to-height ratio), when the Keep ratio switch is ON, or independently of each other, when the switch is OFF (in which case a new slider controlling the plot height appears). The interaction with the slider(s) can be done either with mouse or with the arrow keys.

Please note that there is a minimum size requirement for a plot to be displayed, otherwise an error of the type

Plotting area is too small (it must be at least 10 x 0.5 characters per event, but now it is only 31.1 x 0.5)!

is thrown, in which case either the plotting area needs to be increased using the Plot width (and, if visible, Plot height) slider(s), or the number of patients or the duration to be shown need to be reduced. Alternatively, the Advanced section (see Section Setting parameters for details) can be used to decrease these minimum requirements (but this not recommended in most cases).

Saving the current plot to file

The current plot can be exported to a variety of formats by turning the Save plot! switch ON:

Saving the current plot to file. A zoom-in of the new controls (element 10) revealed by turning the Save plot! switch (UI element 6) ON. Also visible is an explanatory tooltip .

These new UI elements (10) allow:

By pressing the Save plot button, the user can select the location and file name (relative to the local machine) under which the plot will be exported.

Viewing, copying and using the R code that would produce the current plot

While the main use scenarios for this App are built around interactivity, the user may want to generate the same (or similar) plots as the one currently displayed (in element 3 in the overview figure). To allow this, we provide the Show R code… button (element 7 in the overview figure), which opens a box with the clearly commented R code:

Viewing the R code that can generates the current plot.

Clicking the Copy to clipboard button copies the R code to the clipboard, from where it can be pasted into an editor of choice (such as RStudio). In particular, for the plot shown in the overview figure, the R code displayed is:

# The R code corresponding to the currently displayed Shiny plot:
# 
# Extract the data for the selected 2 patient(s) with ID(s):
# "1", "2"
# 
# We denote here by DATA the data you are using in the Shiny plot.
# This was manually defined as an object of class data.frame
# (or derived from it, such a data.table) that was already in
# memory under the name 'med.events'.
# Assuming this object still exists with the same name, then:

DATA <- med.events;

# These data has 5 columns, and contains info for 100 patients.
# 
# To allow using data from other sources than a "data.frame"
# and other similar structures (for example, from a remote SQL
# database), we use a metchanism to request the data for the
# selected patients that uses a function called
# "get.data.for.patients.fnc()" which you may have redefined
# to better suit your case (chances are, however, that you are
# using its default version appropriate to the data source);
# in any case, the following is its definition:
get.data.for.patients.fnc <- function(patientid, d, idcol, cols=NA, maxrows=NA) d[ d[[idcol]] %in% patientid, ]
# Try to extract the data only for the selected patient ID(s):
.data.for.selected.patients. <- get.data.for.patients.fnc(
    c("1", "2"),
    DATA, ### don't forget to put here your REAL DATA! ###
    "PATIENT_ID"
);
# Compute the appropriate CMA:
cma <- CMA9(data=.data.for.selected.patients.,
            # (please note that even if some parameters are
            # not relevant for a particular CMA type, we
            # nevertheless pass them as they will be ignored)
            ID.colname="PATIENT_ID",
            event.date.colname="DATE",
            event.duration.colname="DURATION",
            event.daily.dose.colname="PERDAY",
            medication.class.colname="CATEGORY",
            carry.only.for.same.medication=FALSE,
            consider.dosage.change=FALSE,
            followup.window.start=0,
            followup.window.start.unit="days",
            followup.window.duration=730,
            followup.window.duration.unit="days",
            observation.window.start=0,
            observation.window.start.unit="days",
            observation.window.duration=730,
            observation.window.duration.unit="days",
            date.format="%m/%d/%Y"
           );

if( !is.null(cma) ) # if the CMA was computed ok
{
    # Try to plot it:
    plot(cma,
         # (same idea as for CMA: we send arguments even if
         # they aren't used in a particular case)
         align.all.patients=FALSE,
         align.first.event.at.zero=FALSE,
         show.legend=TRUE,
         legend.x="right",
         legend.y="bottom",
         legend.bkg.opacity=0.5,
         legend.cex=0.75,
         legend.cex.title=1,
         duration=NA,
         show.period="days",
         period.in.days=90,
         bw.plot=FALSE,
         col.na="#D3D3D3",
         unspecified.category.label="drug",
         col.cats=rainbow,
         lty.event="solid",
         lwd.event=2,
         pch.start.event=15,
         pch.end.event=16,
         col.continuation="#000000",
         lty.continuation="dotted",
         lwd.continuation=1,
         cex=1,
         cex.axis=1,
         cex.lab=1.25,
         highlight.followup.window=TRUE,
         followup.window.col="#00FF00",
         highlight.observation.window=TRUE,
         observation.window.col="#FFFF00",
         observation.window.density=35,
         observation.window.angle=-30,
         observation.window.opacity=0.3,
         show.real.obs.window.start=TRUE,
         real.obs.window.density=35,
         real.obs.window.angle=30,
         print.CMA=TRUE,
         CMA.cex=0.5,
         plot.CMA=TRUE,
         CMA.plot.ratio=0.1,
         CMA.plot.col="#90EE90",
         CMA.plot.border="#006400",
         CMA.plot.bkg="#7FFFD4",
         CMA.plot.text="#006400",
         plot.CMA.as.histogram=TRUE,
         show.event.intervals=TRUE,
         print.dose=TRUE,
         print.dose.outline.col="#FFFFFF",
         print.dose.centered=FALSE,
         plot.dose=FALSE,
         lwd.event.max.dose=8,
         plot.dose.lwd.across.medication.classes=FALSE,
         min.plot.size.in.characters.horiz=10,
         min.plot.size.in.characters.vert=0.5
    );
}

This code is pretty much ready to be run, except for some issues that might surround accessing the actual data used for plotting: the user is reminded of these through the yellow-on-red bold italic highlighting of DATA (not shown in the code listing above). In a nutshell (for details, see below), if (a) the user interactively uses the App to load or connect to a data source (such as an external file or an SQL database), then the identity of this data source is known (the file name or the database location), but if (b) the data source was passed to the plot_interactive_cma() function as the data argument, the App cannot know how this data source was named (and this “name” might not even exist if, for example, the data was created on-the-fly while calling the plot_interactive_cma() function). Wickedly, even in case (a), it is generally unsafe to assume that the data source will stay the same (or will be accessible in the same way) in the future. Thus, while we provide as much info about the data source used to produce the current plot as possible, we also warn the user to be careful when running this code!

Computing the CMA for several patients

By switching the UI element 8 (in the overview figure) Compute CMA for several patients… to ON, the user unlocks a new set of UI elements that allow the computation of the currently defined CMA for more patients and the export of the results to an external file.

First, it is important to highlight that the App is not intended for heavy computations, which explains why we are currently limiting this CMA computation to at most 100 patients, at most 5000 events across all patients (if more patients or events are selected, the computation will be done for only the first 100 and 5000, respectively) and for at most 5 minutes of running time (after which the computation is automatically stopped). If seriously heavy computation is needed, we recommend the use of the appropriate CMA() functions from and R session or script, which allow many types of parallel processing and the use of several types of data sources with very fine-grained control, as described in the vignettes AdhereR: Adherence to Medications and Using AdhereR with various database technologies for processing very large datasets. The R code needed to compute the current CMA can be accessed through the Show R code button (UI element 7).

The patients for which the computation of the CMA is to be performed can be done in two main ways:

  1. by individually selecting patients by their IDs, through a multiple selection dropdown list (element 11 in the figure below):

Selecting patients individually for the computation of CMA.

  1. by selecting a continuous range of patients using two sliders (element 15 in the figure below) which define a set of positions in a list (element 14 in the figure below); this list can contain the patient IDs in their original order in the data source, or can have them sorted in ascending or descending order by ID (element 13 in the figure below).

Selecting patients by range of positions in a list for the computation of CMA. In this example, we ordered the patients descreasingly by ID (using the combobox 13), resulting in a mapping of positions (#) to ID as shown in the list 14, where patient with ID “100” is on psition 1, patient “99” on position 2, etc. The sliders 15 define the range of positions (#) 3 to 24, which means that we selected patients with IDs “98”, “97”, “96” … “78” and “77”.

These two ways of selecting patients should be flexible enough for cover most cases of (semi-)interactive use; for more patients and/or the selection of patients based on more complex criteria, we suggest the use of the R code in a script.

After patients have been selected, the user can press the Compute CMA button (UI element 12) to access a specialized dialog box (see figure below) where the CMA computation can be started, its progress monitored, or stopped, and from where the results can be exported to file.

Starting, stopping and watchin the progress of CMA computation for several patients. The list of patients is given in UI element 16, and the progress of the computation is tracked by theprogress bar and individual success report (UI elements 17). The button Save results (as TSV) allows the user to select a file where the results will be exported as a TAB-separated CSV file.

Setting parameters

The left-hand panel has two tabs: Params and Data, and we are focusing here on Params, which contains various parameters customizing the computed CMA and the plotting of the results. UI element 5 in the overview figure shows part of this panel, but the following principles apply:

Basic information about the current dataset. Here, med.events, showing also the five important columns.

Folding and unfolding the contents of a section. The section Follow-up window (FUW) with content folded (hidden) on the left, and unfolded (visible) on the right.

We will now go through all sections one by one.

General settings

The General settings section is always visible and allows the selection of:

  • CMA type: the type of CMA to compute, which can be (please see Dima & Dediu, 2017, and the vignette AdhereR: Adherence to Medications for more details):

    • simple: one of the “simple” CMAs, currently CMA0 tot CMA9,
    • per episode: computes one of the “simple” CMAs repeatedly for each treatment episode,
    • sliding window: computes one of the “simple” CMAs repeatedly for a set of sliding windows
  • CMA to compute: the “simple” CMA to compute, either by itself (for CMA type == simple) or iteratively (for the other two “complex” types); please note that by definition CMA0 cannot be used with “complex” CMAs (which explains why it cannot be selected in these cases)

  • Patient(s) to plot: the list of patient IDs, selected from a drop-down list (which allows multiple selections) containing all the patient IDs in the current data source (at least one patient must be selected, otherwise an error is generated)

Depending on these selections the plot may change or various types of errors or warnings may be thrown.

Follow-up window (FUW) and Observation window (OW)

These two sections are very similar and allow the definition of the follow-up (FUW) and observation (OW) windows by specifying:

  • their start: this can be either:

    • the number of units (days, weeks, months or years) relative to the first event (for FUW) and to the start of the FUW (for OW), or
    • an absolute calendar date
  • and their duration as a number of units (days, weeks, months or years).

Carry over

This section is shown only for CMA5 to CMA9 and concerns the way carry over is considered:

  • For same treat.only: only for the treatment class or across classes
  • Consider dosage changes: should dosage changes be considered when computing the carry over?

Define episodes

This section is shown only for CMA per episodes and concerns the way treatment episodes are defined:

  • Treat. change starts new episode?: does changing the treatment class trigger a new episode?
  • Dose change starts new episode?: does changing the dose trigger a new episode?
  • Max. gap duration: the duration of a gap above which a new episode is triggered; the gap can be in units (Max. gap duration unit) of days, weeks, months or years, or as a percent of the last prescription
  • Append gap?: should the maximum permissible gap be added to the episodes with a gap larger than this maximum? If “no” (the default), nothing is added, if “yes” then the maximum permissible gap is appended at the end of the episodes
  • Plot CMA as histogram?: should the distribution of CMA estimates across episodes for a given participant be plotted as a histogram or as a barplot

Define sliding windows (SW)

This section is shown only for sliding windows and concerns the way this sequence of regularly spaced and uniform sliding windows is defined:

  • SW start: when is the first sliding window starting (relative to the start of the OW) in terms of units (SW start unit) that can be days, weeks, months or years

  • how long is one such sliding window SW duration in terms of SW duration unit (days, weeks, months or years)

  • the step between two consecutive sliding windows can be defined either in terms of:

    • SW number of steps: the total number of steps (i.e., sliding windows of the given duration covering the OW), or
    • the duration of a setp (i.e., one sliding window) in terms of how many (SW step duration) units (SW step unit; days, weeks, months or years) it lasts
  • Plot CMA as histogram?: should the distribution of CMA estimates across sliding windows for a given participant be plotted as a histogram or as a barplot

Defining sliding windows. Here we show 90-days sliding windows lagging by 60 days and starting right at the begining of the observation window. The regularly spaced bars at the top of the plot represent the sliding window, each with its own CMA estimate (here, CMA9). The histogram on the left (which can be toggled to a barplot) shows the distribution of the CMA estimates across the sliding windows.

Align patients

This section is shown only if there’s more than one patient selected, and controls the way the plots of several patients are displayed vertically:

  • Align patients?: should all the patients be vertically aligned relative to their first event?
  • Align 1st event at 0?: should the first event (across patients) be considered as the origin of time?

Vertically aligning multiple patients. The top panel shows two patients (with IDs ‘1’ and ‘15’) plotted using the actual dates of their events (as difference between the earilest event and their own) versus the same two patients aligned vertically relative to each other.

Duration & period

This section controls the amount of temporal information displayed (on the horizontal axis):

  • Duration (in days): the period to show (in days); this is independent of the length of the FUW, OW or the events actually displayed and can be used to zoom-in or zoom-out; if 0 it is automatically computed so as all the events in the plot are shown
  • Show period as: if “days”, it displays on the horizontal axis the number of days since the first plotted event on the horizontal axis; if “dates”, it displays the actual calendar dates – Period (in days): the interval (in days) at which info is shown on the horizontal axis and vertical dashed lines are drawn on the plot

CMA estimates

This section is shown for all CMAs except CMA0 and controls how the CMA estimates are to be shown on the plot:

  • Print CMA?: this is visible only for the “simple” CMAs and controls whether the CMA estimates should be shown next to the participant’s ID
  • Plot CMA?: should the CMA estimate be plotted next to the participant’s ID?

Show dose

This section is shown only a daily dose column is defined for the current data source, and only for CMA0, CMA5CMA9, per episodes, and sliding windows (CMA1CMA4 by definition are unaware of dose and treatment categories) and controls how the dose is visually shown (if at all):

  • Print it?: print the dosage (i.e., the actual numeric values) next to each event; if so:

    • Font size: which font size1 to use
    • Outline color: which outline color to use
    • Centered?: should the text be centered or not?
  • As line width?: show the dose as the event line width; if so:

    • Max dose width: what is the line width of the maximum given dose?
    • Global max?: should this maximum dose be computed across all treatment classes or per class?

Please see the overview figure for an example where the dose is printed.

Legend

This section controls the visual appearance of the legend:

  • Show legend?: should the legend be shown at all; if so:

    • Legend x: the legend’s horizontal position (“left” or “right”)
    • Legend y: the legend’s vertical position (“bottom” or “top”)
    • Title font size: the legend title’s font size
    • Text font size: the legend text and symbols’ font size
    • Legend bkg. opacity: the legend’s background opacity, between 0.0 (fully transparent) and 1.0 (fully opaque)

Aesthetics

This section controls many aspects of the visual presentation of the plots, including colors, font sizes and line styles; some of these depend on other factors, so may or may not be visible:

  • Grayscale?: when ON, it makes the plots use only shades of gray (and hides many other controls in this section), but when OFF, it allows various colors to be used
  • Missing data color: the colors of missing data
  • Unspec. cat. label: the label to use for an unspecified treatment class (free text)
  • Treatment palette: shown only if the treatment class column was defined, allows the selection of a color palette from which particular colors for the actual treatment classes will be picked2; currently the available palettes are: rainbow, heat.colors, terrain.colors, topo.colors, cm.colors, magma, inferno, plasma, viridis, cividis (the last two are color-blind-friendly3)
  • Event line style: controls the line style for plotting the events, and can be:

The possible line styles used for the event lines.

  • Event line width: the width of the event lines
  • Event start and Event end: the symbols used to mark the start and end of an event, and can be:

The possible point symbols used to mark the start and the end of an event.

  • attributes of the continuation lines between events (only for CMA0, per episode, and sliding windows): Cont. line color, Cont. line style and Cont. line width
  • Show event interv.?: should the event intervals be shown? (only for simple CMAs except CMA0)
  • the relative font size of various elements: General font size, Axis font size and Axis labels font size
  • follow-up window visual attributes: Show FUW? (should we show it or not), and if yes, FUW color, what color to use?
  • observation window visual attributes: Show OW? (should we show it or not), and if yes, with what color (OW color), line density (OW hash dens.) and angle (OW hash angle) and opacity (OW opacity)
  • CMA8 uses a “real observation window”, which ca be shown or not (Show real OW?) and whose attributes are the line density (Real OW hash dens) and angle (Real OW hash angle)
  • for all CMAs (except CMA0), we can control various attributes of the CMA estimate: the relative font size (CMA font size), the percent of the plotting area dedicated to plotting it (CMA plot area %), its color (CMA plot color), border color (CMA border color), background color (CMA bkg. color) and text color (CMA text color)

Advanced

This section controls several advanced settings:

  • Min plot size (horiz.) and Min plot size (vert.): the minimum plotting size (in characters) required for the whole duration to be plotted (horizontally) and for each event/episode/sliding window (vertically)

Selecting/changing the data source

If the interactive App was started with a given data source passed through the parameters to the plot_interactive_cma(...) arguments, this data source (if valid and well-defined) is automatically used, but it can be changed at any time (as described below). However, if the App was starting without any data source (i.e., plot_interactive_cma()), the user is forced to select a valid data source before being able to plot anything. The actual processes of selecting an initial data source or changing it later are identical, so we discuss here the case of no initial data source: when plot_interactive_cma() was invoked, the App is opened without any plotting and messaging area at all and the Data tab in the left-hand panel is automatically selected and the Params tab contains only a warning message:

Starting the App with no data source. The highlighted panel on the left (UI element 18) is now open at the Data tab, which allows the interactive selection of various types of data sources.

Starting the App with no data source. The Params tab on the left now contains only a warning message (and no settings).

The Data panel allows us to interactively select and change on-the-fly the data set to be used; currently, this can be:

The type of data source can be done with with list Datasource type at the top of the Data panel. We will go now through each of these types of data sources in turn.

Data already in memory

This is, in some respects, the simplest type of data source. When selected, the tab looks like:

Selecting and in-memory data source.

The In-memory dataset UI element contains the list of all objects in the current global environment derived from data.table that have at last 1 row and 3 columns, and they can be selected simply by clicking on their name (here, we select the med.events example dataset):

Selecting the med.events example dataset. Please note that the selection ca be done by looking through the list, or by typing the delete/backspace key (⌫) and then starting to type the name of the dataset.

After clicking on it, the dataset is selected and optionally available for inspection using the Peek at dataset button:

Peeking at (i.e., inspecting) the in-memory dataset med.events. This box shows basic info about the dataset, including the number of rows and columns and, for each column, its name and type, plus the first few rows.

If the dataset is not the desired one, it can be replaced with anything else using the interface, but, if it is the one, we can continue by selecting the important columns and the format of the dates.

Selecting the “important” columns – here, the one contaning the Patient IDs – from the in-memory dataset med.events. Please note that the list give extra summary info about the columns in the dataset (namely, its name, type, and first few values). The App automatically maps the first three columns in the dataset to the first three required columns (Patient ID column, Event date column and Event duration column) but this is most probably wrong and no type checks are done at this time. The “optional” columns (Daily dose column and Treatment class column may be left undefined by selecting the [not defined] value. Date format is different, being a free text field where the format of the dates in the dates column must be defined (please see, for example here, for how this format looks like).

Please note that, at this time, the dataset is not selected to be used for plotting: this is an explicit action done by pressing the Validate & use! button at the bottom, which does perform various checks (such as that each column is used at most once and that the types more or less fit the expected type and format, among others) and, if OK, makes this dataset the one to be used for plotting.

Load from file

This is a very useful case, where the data is stored in an external file. When selecting load from file in the Datasource type list, the panel becomes:

Loading a dataset from file.

The App supports loading data from several file formats:

  • Comma/TAB-separated (.csv; .tsv; .txt): this is the default format and refers to a class of open and flexible file formats where tabular data is stored as rows of values separated by a pre-defined delimiter; the best known are Comma-Separated Values (CSV) and TAB-Separated Values (TSV) formats, but the App allows a lot of flexibility by defining:

    • the Field separator can be: the [TAB] character (\t), the comma (,), one or more whitespaces, the semicolon (;) or the colon (:)
    • the individual values can be quoted (or not) using the Quote character: [none] (no quoting of values), single quotes (’) or double quotes (“)
    • the Decimal point character: dot (.) or comma (,)
    • the Missing data symbols: a free text listing (within double quotes and separated by commas) the symbol(s) to be interpreted as missing data (by default, “NA”)
    • if the 1st row of the file represents the header (containing the column names) or not (Header 1st row)
  • Serialized R object (.rds): this loads data (such as objects derived from data.frame) previously exported from R using readRDS() (usually as “.rds”)

  • Open Document Spreadsheet (.ods) and Microsoft Excel (.xls; .xlsx) loads data from these widespread formats, used by office suites such as LibreOffice/OpenOffice’s Calc and Micrsoft Office’s Excel programs, among others; for both these formats, the user can specify the particular sheet to be loaded for files containing more than one (Which sheet to load?)

  • SPSS (.sav; .por), SAS Transport data file (.xpt), SAS sas7bdat data file (.sas7bdat) and Stata (.dta): these are file formats exported by the popular statistical platforms IBM SPSS, SAS and Stata

Please note that while Comma/TAB-separated (.csv; .tsv; .txt), Serialized R object (.rds) and Open Document Spreadsheet (.ods) should be imported without issues, for the others there might limitations and fringe cases.

After the file format has been selected, the user can use the Load from file control (its Selecte button) to browse for the desired file and upload it. Basic checks might be performed and a file might be rejected, but if the loading was successful, a new set of UI elements becomes visible. These elements are virtually identical to those used for in-memory datasets.

Use an SQL database

This allows the access to data stored in standard Relational Database Management Systems (RDBMS’s) which use the Structured Query Language (SQL) – for more info about the facilities offered by AdhereR, please see the Using AdhereR with various database technologies for processing very large datasets vignette.

Currently, the App supports SQLite, a small engine designed to be embedded in larger applications and which stored the data in normal files, and MySQL/MariaDB, which are widely-used, full-featured free and open-source RDBMSs.

While SQLite is intended only as a demo of the App’s capabilities and uses an in-memory database with a single table that contains a verbatim copy of the med.events example dataset, MySQL/MariaDB allows the use of actual databases, local or over the internet. Except for the selection of the database, the UI for the two is identical, so we will only discuss here the MySQL/MariaDB case

We can connect to a local or remote server, and we can (optionally) define the following:

  • Host name/address: the fully qualified host name or IP address of the server (if [none] or localhost, the database is stored on the local machine)
  • TCP/IP port number: the port number (0 for default)
  • Database name: the name of the database
  • Username and Password: the username and password for accessing the database (the password is hidden using *)

Inputting the info needed to connect to a remote MySQL server. We use here a MySQL database contaning the med.events sample dataset hosted on the internet.

When clicking the Connect! button, the App attempts to connect the server, authenticate and access the desired database: if everything’s OK, it fetches basic information over all the tables/views in the database (avoiding thus unnecessary traffic) and displays it:

Basic information about an SQL database. This shows, for each table/view in the database, the columns with their name, type and first few values. The same info is accessible by clicking on the Peek at database… button.

New UI elements become visible, most being virtually identical to those used for loading from file and in-memory datasets, but the specific ones being:

  • the Disconnect! button; this disconnects from the database cleanly (not required by nice to do)
  • Which table/view lists the tables and views in the database, also showing for each the number of rows and columns

UI elements for using an SQL database.

For example, when using the MySQL database described in the vignette Using AdhereR with various database technologies for processing very large datasets in package AdhereR, we can create, for example, a view (named testview) that brings together these data in the needed format with the following SQL commands in MySQL Workbench:

USE med_events;

CREATE VIEW `testview` AS 
SELECT patients.`id`, `date`, `category`, `duration`, `perday`
FROM event_date
JOIN event_info
ON event_info.`id` = event_date.`id`
JOIN event_patients
ON event_patients.`id` = event_info.`id`
JOIN patients
ON patients.`id` = event_patients.`patient_id`;

Defining and using medication groups

Since AdhereRViz version 0.2/AdhereR version 0.7, medication groups can be defined and used for interactive plotting. For ore details about medication groups, please see the vignette “AdhereR: Adherence to Medications” in package AdhereR, but, fundamentally, they are named vectors of characters where the names are the unique names of groups of medication defined using R-like expressions that describe with of the events in the dataset are covered by a given medication group. Alternatively, one can use a column in the data itself to define the medication groups.

Medication groups can be passed with the medication.groups argument to the plot_interactive_cma() function, or can be interactively loaded through a new panel Groups of the user interface (please note that a valid dataset must have already been loaded):

Loading medication groups. The Groups tab allows the interactive selection of medication groups from already defined named character (or factor) vectors in memory (here, the pre-defined example med.groups from the AdhereR package). The switch allows us to instantly ignore or apply any medication groups that may have been defined.

Inspecting medication groups. Clicking on the “Check it!” button in the Groups tab allows the inspection of the currently selected vector contaning medication group definitions (here, the pre-defined example med.groups from the AdhereR package).

Pressing the “Use it!” button load the selected medication group definitions; please make sure they are correct and fit the currently loaded dataset! If the loading goes well, the plotting is automatically updated to reflect the medication groups:

Plotting medication groups. CMA1 for patients “1” and “2” in the example dataset med.events.ATC using the example medication groups med.groups from the AdhereR package.

The main differences to a plot without medication groups are:

These apply not only to simple CMAs, but also to sliding windows and per episode (not shown).

Various options related to plotting the medication groups can be changed in the using the new “Medication groups” user interface, probably the most important being the medication groups to be included in the plot:

Selecting which of the defined medication groups to plot. We may plot only a subset of all the defined medication groups, including the implicitly defined __ALL_OTHERS__ (here shown as “* (all others)*) that includes all events not selected by any of the explicitely defined groups. Don’t forget to press the “Show them!” button to apply your changes.

References

Dima A.L., Dediu D. (2017) Computation of adherence to medication and visualization of medication histories in R with AdhereR: Towards transparent and reproducible use of electronic healthcare data. PLoS ONE 12(4): e0174426. doi:10.1371/journal.pone.0174426.

Notes


  1. Please note that all font sizes are relative. Thus a font size of 1.0 means the default font size used for the plot (depending on resolution, etc.), while a value of 0.50 means half that and 1.25 means 25% bigger.↩︎

  2. We have decided against directly mapping each class to a particular color and, instead, automatically mapping them using a palette, because this accommodates more flexibly a varying number (or grouping) of classes; the mapping classes → colors is based on the alphabetic order of the class names.↩︎

  3. See for example here for a comparison and discussion.↩︎