Introduction

A heatmap is a graphical representation of data that uses a system of color-coding to represent different values. Heatmaps are used in various forms of analytics, however, this R package specifically focuses on providing an efficient way for creating interactive heatmaps for categorical data or continuous data that can be grouped into categories.

This package is originally being developed for Verkehrsbetriebe Zürich (VBZ), the public transport operator in the Swiss city of Zurich, to illustrate the utilization of different routes and vehicles during different times of the day. Therefore, it groups utilization data (e.g. persons per m^2) into different categories (e.g. low, medium, high utilization) and illustrates it for certain stops over time in a heatmap.

This package can easily be integrated into a shiny dashboard which supports additional interactions with other plots (e.g. boxplot, histogram, forecast) by using plotly events. A mini-demo app is provided in a separate github repository named catmaply_shiny.

This work is based on the plotly.js engine.

Please submit feature requests

This package is still under active development. If you have features you would like to have added, please submit your suggestions (and bug-reports) at: https://github.com/VerkehrsbetriebeZuerich/catmaply/issues/

News

You can see the most recent changes of the package in NEWS.md.

Installation

To install the latest (“cutting-edge”) GitHub version run:

# make sure that you have the corrent RTools installed.
# as you might need to build some packages from source
# if you don't have RTools installed, you can install it with:
# install.packages('installr'); install.Rtools() # not tested on windows
# or download it from here:
# https://cran.r-project.org/bin/windows/Rtools/
# in any case, make sure that you select the correct version, 
# otherwise the installation will fail.
# then you'll need devtools
# if (!require('devtools'))
  # install.packages('devtools')
# finally install the package
# devtools::install_github('VerkehrsbetriebeZuerich/catmaply')

To get the latest version on CRAN, perform:

#install.packages("catmaply")

Thereafter, you can start using the package as usual:

library(catmaply)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

Usage

Catmaply provides data of VBZ to easily start experimenting with the package. For demonstration purposes and simplicity, we will use it in this notebook. As usual, you can access the data as follows.

data("vbz")

df <- na.omit(vbz[[1]]) %>% 
  filter(.data$vehicle == "PO")

str(df)
#> tibble [1,707 × 13] (S3: tbl_df/tbl/data.frame)
#>  $ trip_seq              : int [1:1707] 6 24 22 4 14 42 44 50 28 30 ...
#>  $ stop_seq              : int [1:1707] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ stop_name             : chr [1:1707] "Zuerich, Bahnhof Altstetten N" "Zuerich, Bahnhof Altstetten N" "Zuerich, Bahnhof Altstetten N" "Zuerich, Bahnhof Altstetten N" ...
#>  $ trip_id               : int [1:1707] 44044 33329 22130 33325 33327 22134 33333 12702 82990 12698 ...
#>  $ circulation_name      : int [1:1707] 8 6 4 6 6 4 6 2 10 2 ...
#>  $ line_name             : int [1:1707] 4 4 4 4 4 4 4 4 4 4 ...
#>  $ vehicle               : Factor w/ 3 levels "","CO","PO": 3 3 3 3 3 3 3 3 3 3 ...
#>  $ occupancy             : num [1:1707] 18.38 67.77 86.8 5.98 21.04 ...
#>  $ occ_category          : int [1:1707] 1 2 3 1 1 1 1 1 2 1 ...
#>  $ departure_time        : Factor w/ 3276 levels "","04:58:12",..: 74 435 393 49 223 839 885 1023 519 563 ...
#>  $ number_of_measurements: int [1:1707] 58 47 45 47 47 45 49 51 42 51 ...
#>  $ occ_cat_name          : Factor w/ 6 levels "","high","low",..: 3 4 5 3 3 3 3 3 4 3 ...
#>  $ direction             : int [1:1707] 1 1 1 1 1 1 1 1 1 1 ...
#>  - attr(*, "na.action")= 'omit' Named int [1:281] 145 146 147 148 149 294 295 296 297 298 ...
#>   ..- attr(*, "names")= chr [1:281] "145" "146" "147" "148" ...

The main columns of the vbz data.frame can be described as follows:

Default behaviour

Catmaply expects at least arguments for both axis (x, y) and the fields (z). To visualize the occupancy for all stops and trips, we can put the stop_seq on y, trip_seq on x and occupancy on z as follows:

catmaply(
    df,
    x = trip_seq,
    y = stop_seq,
    z = occ_category
  ) 

By default, catmaply produces an interactive heatmap with a rangeslider, legend items that show and/or hide data for a specific occupancy category by clicking on it and a hover label, that shows the values for x, y and z on hover.

Also, please note that you can use both, column names with and without quotes as column references for e.g. x, y, z. E.g. if we want to put the stop_names on the y axis, we can simply put the stop_name on y and order it using stop_seq (the x axis has of course the same functionality); as shown in the following:

catmaply(
    df,
    x = trip_seq,
    y = stop_name,
    y_order = stop_seq,
    z = occ_category
  ) 

What about more expressive labels for the legend? You can use any column for the legend entries, as long as it matches the category in the fields. So, if you would like to show occ_cat_name (“low”, “medium”, “high”) in the legend instead of occ_category (1,2,3,..), you can set the parameter legend_col and overwrite the legend labels as follows:

catmaply(
    df,
    x = trip_seq,
    y = stop_name,
    y_order = stop_seq,
    z = occ_category,
    legend_col = occ_cat_name
  ) 

Not happy with the color palette? Then let’s change it :-). To change the color palette you can either submit a color palette vector or a function that is able to return one.

Note: that the color palette function needs to take n as first argument, whereas n defines the number of colors to be produced.

catmaply(
    df,
    x = trip_seq,
    y = stop_name,
    y_order = stop_seq,
    z = occ_category,
    color_palette = viridis::magma,
    legend_col = occ_cat_name
  )

You don’t like all this interactivity of the legend or you need more performance for large plots? Let’s turn the interactive legend off by setting legend_interactive = FALSE.

catmaply(
    df,
    x = trip_seq,
    y = stop_name,
    y_order = stop_seq,
    z = occ_category,
    color_palette = viridis::magma,
    legend_interactive = FALSE,
    legend_col = occ_cat_name
  )

Color ranges per category

How about illustrating differences in one category? You can illustrate values that are particularly low (high) within one category by specifying one color range per category.

To show one color range per category, we have to put a continuous number in the fields and categorize it with a categorical column, so in our example:

  • occupancy in the fields
  • occ_category is the categorization over these fields.

Also, lets add a more expressive x_label to the plot, essentially concatenating columns trip, vehicle and circulation_name.

df <- df%>%
  mutate(x_label = paste(
    formatC(trip_seq, width=3, flag="0"), 
    vehicle, 
    formatC(circulation_name, width=2, flag = "0"),
    sep="-")
  )

catmaply(
    df,
    x = x_label,
    x_order = trip_seq,
    y = stop_name,
    y_order = stop_seq,
    z = occupancy,
    categorical_color_range = TRUE,
    categorical_col = occ_category,
    color_palette = viridis::magma,
    legend_col = occ_cat_name
  )

This works also for non-interactive legends.

catmaply(
    df,
    x = x_label,
    x_order = trip_seq,
    y = stop_name,
    y_order = stop_seq,
    z = occupancy,
    categorical_color_range = TRUE,
    categorical_col = occ_category,
    color_palette = viridis::magma,
    legend_interactive = FALSE,
    legend_col = occ_cat_name
  )

Axis formatting

Now, lets mess around with axis formatting; let’s change

  • font_color to the “purplish” color used in the logo (#6D65AB)
  • font_size to 10 pt.
  • font_family to “verdana”
  • x_tickangle to 80 and, just for fun,
  • y_tickangle to -10
catmaply(
    df,
    x='x_label',
    x_order = 'trip_seq',
    x_tickangle = 80,
    y = "stop_name",
    y_order = "stop_seq",
    y_tickangle = -10,
    z = "occupancy",
    categorical_color_range = TRUE,
    categorical_col = 'occ_category',
    color_palette = viridis::magma,
    font_size = 10,
    font_color = '#6D65AB',
    font_family = "verdana",
    legend_col = occ_cat_name
    )

Hover

What about a custom hover label? Catmaply allows to define custom hover templates with the parameter hover_template; which can take html tags to make it more appealing. Here is an example that creates bold column names of the respective values; also it rounds the occupancy to 2 decimal points.

catmaply(
  df,
  x=x_label,
  x_order = trip_seq,
  x_tickangle = 80,
  y = stop_name,
  y_order = stop_seq,
  z = occupancy,
  categorical_color_range = TRUE,
  categorical_col = occ_category,
  color_palette = viridis::inferno,
  hover_template = paste(
    '<b>Trip</b>:', trip_seq,
    '<br><b>Stop Name</b>:', stop_name,
    '<br><b>Occupancy category</b>:', occ_cat_name,
    '<br><b>Occupancy</b>:', round(occupancy, 2),
    '<extra></extra>'
  ),
  legend_col = occ_cat_name
)

Note: usually it is a good idea to add the <extra></extra> tag at the end to hide trace information.

Legend and fancy hover template is too much info; you want simplicity and, thus, hide the legend altogether? Go ahead.. :-)

catmaply(
  df,
  x=x_label,
  x_order = trip_seq,
  x_tickangle = 80,
  y = stop_name,
  y_order = stop_seq,
  z = occupancy,
  categorical_color_range = TRUE,
  categorical_col = occ_category,
  color_palette = viridis::inferno,
  hover_template = paste(
    '<br><b>Trip</b>:', trip_seq,
    '<br><b>Stop Name</b>:', stop_name,
    '<br><b>Occupancy category</b>:', occ_cat_name,
    '<br><b>Occupancy</b>:', round(occupancy, 2),
    '<extra></extra>'
  ),
  legend_col = occ_cat_name,
  legend = FALSE
)

Minimalist or visual person that does not like hover nor legend…?

catmaply(
  df,
  x=x_label,
  x_order = trip_seq,
  x_tickangle = 80,
  y = stop_name,
  y_order = stop_seq,
  z = occupancy,
  categorical_color_range = TRUE,
  categorical_col = occ_category,
  color_palette = viridis::inferno,
  hover_hide = TRUE,
  legend_col = occ_cat_name,
  legend = FALSE
)

Ok, no hover but legend it is… (for completeness).

catmaply(
  df,
  x=x_label,
  x_order = trip_seq,
  x_tickangle = 80,
  y = stop_name,
  y_order = stop_seq,
  z = occupancy,
  categorical_color_range = TRUE,
  categorical_col = occ_category,
  color_palette = viridis::inferno,
  hover_hide = TRUE,
  legend_col = occ_cat_name,
  legend = TRUE
)

Time Axis

Hmm, didn’t we say that we want to show the development over time? Wouldn’t it make sense then, if we could illustrate time dynamically on the x axis?

Lets check out how a dynamic x axis can be created if you put a column of type PSIXct or POSIXt on the x axis. Lets check it out by calculating the departure datetime of each drive.

df <- df %>%
  na.omit() %>%
  dplyr::group_by(
    trip_seq
  ) %>%
  dplyr::mutate(
    departure_date_time = min(na.omit(lubridate::ymd_hms(paste("2020-08-01", departure_time))))
  ) %>%
  dplyr::ungroup()

catmaply(
  df,
  x=departure_date_time,
  y = stop_name,
  y_order = stop_seq,
  z = occupancy,
  categorical_color_range = TRUE,
  categorical_col = occ_category,
  color_palette = viridis::inferno,
  hover_template = paste(
    '<br><b>Trip</b>:', trip_seq,
    '<br><b>Stop Name</b>:', stop_name,
    '<br><b>Occupancy category</b>:', occ_cat_name,
    '<br><b>Occupancy</b>:', round(occupancy, 2),
    '<extra></extra>'
  )
)

Currently, formatting of the time axis is optimised to analyse daily data; e.g. if you sample the utilization throughout the year and then summarise it to get the utilization of a typical day. Thus, the formatting of the max zoom level is still hours and not years. However, you can change the individual formatting of the respective zoom level by setting the tickformatstops parameter. So, if you want to e.g. remove the h, m, s and ms that indicate the unit of time above, you could achieve this as follows (more infos can be found in the tick formatting example of plotly:


catmaply(
  df,
  x=departure_date_time,
  y = stop_name,
  y_order = stop_seq,
  z = occupancy,
  categorical_color_range = TRUE,
  categorical_col = occ_category,
  color_palette = viridis::inferno,
  hover_template = paste(
    '<br><b>Trip</b>:', trip_seq,
    '<br><b>Stop Name</b>:', stop_name,
    '<br><b>Occupancy category</b>:', occ_cat_name,
    '<br><b>Occupancy</b>:', round(occupancy, 2),
    '<extra></extra>'
  ),
  tickformatstops=list(
    list(dtickrange = list(NULL, 1000), value = "%H:%M:%S.%L"),
    list(dtickrange = list(1000, 60000), value = "%H:%M:%S"),
    list(dtickrange = list(60000, 3600000), value = "%H:%M"),
    list(dtickrange = list(3600000, 86400000), value = "%H:%M"),
    list(dtickrange = list(86400000, 604800000), value = "%H:%M"),
    list(dtickrange = list(604800000, "M1"), value = "%H:%M"),
    list(dtickrange = list("M1", "M12"), value = "%H:%M"),
    list(dtickrange = list("M12", NULL), value = "%H:%M")
  )
)

Slider

Besides the rangeslider, it is also possible to use a simple slider/scrollbar. This might be especially favourable for annotated heatmaps shown in the next section

You can switch to a slider as easy as follows:


catmaply(
  df,
  x=x_label,
  x_order = trip_seq,
  y = stop_name,
  y_order = stop_seq,
  z = occupancy,
  categorical_color_range = TRUE,
  categorical_col = occ_category,
  color_palette = viridis::inferno,
  hover_template = paste(
    '<br><b>Trip</b>:', trip_seq,
    '<br><b>Stop Name</b>:', stop_name,
    '<br><b>Occupancy category</b>:', occ_cat_name,
    '<br><b>Occupancy</b>:', round(occupancy, 2),
    '<extra></extra>'
  ),
  rangeslider = FALSE, # to prevent warning
  legend_interactive = FALSE, # to prevent warning
  slider = TRUE # activate slider
)

To add a prefix for the current value (the one above the slider); set the slider_currentvalue_prefix as follows:

catmaply(
  df,
  x=x_label,
  x_order = trip_seq,
  y = stop_name,
  y_order = stop_seq,
  z = occupancy,
  categorical_color_range = TRUE,
  categorical_col = occ_category,
  color_palette = viridis::inferno,
  hover_template = paste(
    '<br><b>Trip</b>:', trip_seq,
    '<br><b>Stop Name</b>:', stop_name,
    '<br><b>Occupancy category</b>:', occ_cat_name,
    '<br><b>Occupancy</b>:', round(occupancy, 2),
    '<extra></extra>'
  ),
  rangeslider = FALSE, # to prevent warning
  legend_interactive = FALSE, # to prevent warning
  slider = TRUE, # activate slider,
  slider_currentvalue_prefix = "Trip: "
)

Also, you can show/hide various elements of the slider, more specifically the steps, ticks and current value. This gets you almost a scrollbar feeling:

catmaply(
  df,
  x=x_label,
  x_order = trip_seq,
  y = stop_name,
  y_order = stop_seq,
  z = occupancy,
  categorical_color_range = TRUE,
  categorical_col = occ_category,
  color_palette = viridis::inferno,
  hover_template = paste(
    '<br><b>Trip</b>:', trip_seq,
    '<br><b>Stop Name</b>:', stop_name,
    '<br><b>Occupancy category</b>:', occ_cat_name,
    '<br><b>Occupancy</b>:', round(occupancy, 2),
    '<extra></extra>'
  ),
  rangeslider = FALSE, # to prevent warning
  legend_interactive = FALSE, # to prevent warning
  slider = TRUE, # activate slider,
  slider_currentvalue_visible = FALSE,
  slider_step_visible = FALSE,
  slider_tick_visible = FALSE
)

The slider steps are created automatically for you, however, you can also define them yourself. Catmaply provides two modes to alter the way the slider steps are created: auto, and custom list.

Mode auto alters the way the steps are automatically created for you. You need to provide parameter slider_steps a list with the following parameters:

  • slider_start (numeric): the starting-point of the slider on the x axis.
  • slider_range (numeric): the size of the window of the slider.
  • slider_shift (numeric): how much units the window should be moved to the right for each step.
  • slider_step_name (column): the name of the step; which must match the occurrence of x.

For example, if want to create a slider that starts at trip 5 with a window of size 15 trips and a shift of size 10 trips; you can do this as follows:

catmaply(
  df,
  x=x_label,
  x_order = trip_seq,
  y = stop_name,
  y_order = stop_seq,
  z = occupancy,
  categorical_color_range = TRUE,
  categorical_col = occ_category,
  color_palette = viridis::inferno,
  hover_template = paste(
    '<br><b>Trip</b>:', trip_seq,
    '<br><b>Stop Name</b>:', stop_name,
    '<br><b>Occupancy category</b>:', occ_cat_name,
    '<br><b>Occupancy</b>:', round(occupancy, 2),
    '<extra></extra>'
  ),
  rangeslider = FALSE, # to prevent warning
  legend_interactive = FALSE, # to prevent warning
  slider = TRUE, # activate slider,
  slider_currentvalue_visible = FALSE,
  slider_steps=list(
    slider_start=1,
    slider_range=15,
    slider_shift=10,
    slider_step_name="x" # same name as x axis (must be character)
  )
)

Besides mode auto, you can also get full control of the slider steps by providing a list with the following elements:

  • name name of the step
  • range range to be covered by the step

A simple example is shown below:

catmaply(
  df,
  x=x_label,
  x_order = trip_seq,
  y = stop_name,
  y_order = stop_seq,
  z = occupancy,
  categorical_color_range = TRUE,
  categorical_col = occ_category,
  color_palette = viridis::inferno,
  hover_template = paste(
    '<br><b>Trip</b>:', trip_seq,
    '<br><b>Stop Name</b>:', stop_name,
    '<br><b>Occupancy category</b>:', occ_cat_name,
    '<br><b>Occupancy</b>:', round(occupancy, 2),
    '<extra></extra>'
  ),
  rangeslider = FALSE, # to prevent warning
  legend_interactive = FALSE, # to prevent warning
  slider = TRUE, # activate slider,
  slider_currentvalue_visible = FALSE,
  slider_steps = list(
    list(name="Very important step one", range=c(12, 37)), 
    list(name="Very important step two", range=c(87, 111))
  )
)

Annotations

Sometimes it makes sense to add annotations to a heatmap. With catmaply, you can add and format annotations relatively easily with the following parameters:

  • text column name holding the values of the text to be displayed.
  • text_color the color of the text (similar to font_color).
  • text_size the size to be used for the text (similar to font_size).
  • text_font_family the font family to be used for the text (similar to font_family).

Adding annotations to the previous slider example can be achieved as follows:

catmaply(
  df,
  x=x_label,
  x_order = trip_seq,
  y = stop_name,
  y_order = stop_seq,
  z = occupancy,
  text = occ_category,
  text_color="#000",
  text_size=12,
  text_font_family="Open Sans",
  categorical_color_range = TRUE,
  categorical_col = occ_category,
  color_palette = viridis::plasma,
  hover_template = paste(
    '<br><b>Trip</b>:', trip_seq,
    '<br><b>Stop Name</b>:', stop_name,
    '<br><b>Occupancy category</b>:', occ_cat_name,
    '<br><b>Occupancy</b>:', round(occupancy, 2),
    '<extra></extra>'
  ),
  rangeslider = FALSE, # to prevent warning
  legend_interactive = FALSE, # to prevent warning
  slider = TRUE, # activate slider,
  slider_currentvalue_visible = FALSE,
  slider_steps=list(
    slider_start=1,
    slider_range=15,
    slider_shift=10,
    slider_step_name="x" # same name as x axis (must be character)
  )
)

Note: annotations can also be used with rangeslider or no slider at all. However, lots of annotations might have a negative influence on the performance. Only text values that are not NA are used for annotations.

Credits

This package only exists thanks to the amazing work done by many people in the open source community. Beyond the many people working on the pipeline of R, thanks should go to the plotly team, and especially to Carson Sievert and others working on the R package of plotly. Also, a special thanks to VBZ for providing advice on the functionality of the package as well as providing the vbz dataset; also, I would like to thank VBZ for investing time to test the package and for using it in your awesome shiny VBZ dashboard.

Session info

sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=de_CH.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=de_CH.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=de_CH.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.0.9    catmaply_0.9.4
#> 
#> loaded via a namespace (and not attached):
#>  [1] pillar_1.7.0      bslib_0.4.0       compiler_4.2.1    jquerylib_0.1.4  
#>  [5] viridis_0.6.2     tools_4.2.1       digest_0.6.29     lubridate_1.8.0  
#>  [9] viridisLite_0.4.1 gtable_0.3.0      jsonlite_1.8.0    evaluate_0.16    
#> [13] lifecycle_1.0.1   tibble_3.1.7      pkgconfig_2.0.3   rlang_1.0.4      
#> [17] cli_3.3.0         DBI_1.1.3         rstudioapi_0.14   crosstalk_1.2.0  
#> [21] yaml_2.3.5        xfun_0.32         fastmap_1.1.0     gridExtra_2.3    
#> [25] httr_1.4.3        stringr_1.4.1     knitr_1.39        htmlwidgets_1.5.4
#> [29] generics_0.1.3    vctrs_0.4.1       sass_0.4.2        grid_4.2.1       
#> [33] tidyselect_1.1.2  data.table_1.14.2 glue_1.6.2        R6_2.5.1         
#> [37] plotly_4.10.0     fansi_1.0.3       rmarkdown_2.14    farver_2.1.1     
#> [41] tidyr_1.2.0       purrr_0.3.4       ggplot2_3.3.6     magrittr_2.0.3   
#> [45] scales_1.2.0      htmltools_0.5.3   ellipsis_0.3.2    assertthat_0.2.1 
#> [49] colorspace_2.0-3  utf8_1.2.2        stringi_1.7.8     lazyeval_0.2.2   
#> [53] munsell_0.5.0     cachem_1.0.6      crayon_1.5.1