CRAN top 20s and interactive package and collaboration networks

Ioannis Kosmidis

2022-08-26

cranly

cranly provides core visualizations and summaries for the CRAN package database. It is aimed mainly as an analytics tool for developers to keep track of their CRAN packages and profiles, as well as those of others, which, at least for me, is proving harder and harder as the CRAN ecosystem grows.

The package provides comprehensive methods for cleaning up and organizing the information in the CRAN package database, for building package directives networks (depends, imports, suggests, enhances) and collaboration networks, and for computing summaries and producing interactive visualizations from the resulting networks. Network visualization is through the visNetwork package. The package also provides functions to coerce the networks to igraph https://CRAN.R-project.org/package=igraph objects for further analyses and modelling.

This vignette is a tour to the current capabilities in cranly.

Preparing today’s CRAN package database

Let’s attach cranly

library("cranly")

and use an instance of the cleaned CRAN package database

cran_db <- readRDS(url("https://raw.githubusercontent.com/ikosmidis/cranly/develop/inst/extdata/cran_db.rds"))

as of 2022-08-26 14:43:43 BST.

Alternatively, today’s package directives and author collaboration networks can be constructed by doing

p_db <- tools::CRAN_package_db()

and then we need to clean and organize author names, depends, imports, suggests, enhances

cran_db <- clean_CRAN_db(p_db)

The resulting dataset carries the timestamp of when it was put together, which helps keeping track of when the data import has taken place and will be helpful in future versions when dynamic analyses and visualization methods are implemented.

attr(cran_db, "timestamp")
#> [1] "2022-08-26 14:43:43 BST"

Network of package directives

We can now extract edges and nodes for the CRAN package directives network by simply doing

package_network <- build_network(cran_db)

and compute various statistics for the package network

## Global package network statistics
package_summaries <- summary(package_network)

The package_summaries object can now be used for finding the top-20 packages according to various statistics

plot(package_summaries, according_to = "n_authors", top = 20)

plot(package_summaries, according_to = "n_imports", top = 20)

plot(package_summaries, according_to = "n_imported_by", top = 20)

The names of the available statistics are

names(package_summaries)
#>  [1] "package"          "n_authors"        "n_imports"        "n_imported_by"   
#>  [5] "n_suggests"       "n_suggested_by"   "n_depends"        "n_depended_by"   
#>  [9] "n_enhances"       "n_enhanced_by"    "n_linking_to"     "n_linked_by"     
#> [13] "betweenness"      "closeness"        "page_rank"        "degree"          
#> [17] "eigen_centrality"

The sub-network for my packages can be found using the extractor function package_of which use exact matching by default

my_packages <- package_by(package_network, "Ioannis Kosmidis")
my_packages
#> [1] "PlackettLuce"     "betareg"          "brglm"            "brglm2"          
#> [5] "detectseparation" "enrichwith"       "profileModel"     "trackeR"         
#> [9] "trackeRapp"

We can now get an interactive visualization of the sub-network for my packages using

plot(package_network, package = my_packages, title = TRUE, legend = TRUE)

You can hover over the nodes and the edges to get package-specific information and links to the package pages.

In order to focus only on optional packages (i.e. exclude base and recommended packages), we do

optional_packages <- subset(package_network, recommended = FALSE, base = FALSE)
optional_summary <- summary(optional_packages)
plot(optional_summary, top = 30, according_to = "n_imported_by")

CRAN collaboration network

Next let’s build the CRAN collaboration network

author_network <- build_network(object = cran_db, perspective = "author")

Statistics for the collaboration network can be computed using the summary method as we did for package directives.

author_summaries <- summary(author_network)

The top-20 collaborators according to various network statistics are

plot(author_summaries, according_to = "n_packages", top = 20)

plot(author_summaries, according_to = "page_rank", top = 20)

plot(author_summaries, according_to = "betweenness", top = 20)

The R Core’s collaboration sub-network is

plot(author_network, author = "R Core")