Skip to Tutorial Content

Why graph

Network visualisation is non-trivial; indeed it is very important for at least two reasons.

First, visualisation is a crucial part of the process of data analysis. As a first step, network visualisation – or graphing – offers us a way to vett our data for anything strange that might be going on, both revealing and informing our assumptions and intuitions. The following image relates to the famous Anscombe’s quartet, which shows how different datasets can have identical statistical properties that are only revealed to be very different when graphed.

As Tufte (1983: 9) said:

“At their best, graphics are instruments for reasoning about quantitative information. Often the most effective way to describe, explore, and summarize a set of numbers – even a very large set – is to look at pictures of those numbers”

All of this is crucial with networks. Drawing network graphs is key to exploring and understanding both the global structure of a network as well as smaller-scale structures such as nodal positions or communities within it.

Second, visualisation is a crucial part of the communication of the lessons that we have learned through investigation with others. As Brandes et al (1999) argue, visualisation involves thinking about the substance of what you are trying to communicate, how to design it so that it is ergonomic and (ideally) aesthetic, and which algotihm is most appropriate to lay out the graph informatively. The aim is to offer a concise and precise delivery of insights.

There may be some dead-ends and time-sinks involved in visualising your data, but it is worth taking the time to explore your data and experiment with ways to make what you have learned over a longer period of time evident to others in a shorter period of time.

Different graphics approaches in R

To understand graph and network metric visualisation with {autograph}, it is useful to review the different approaches taken already in R. There are several main packages for plotting in R, as well as several for plotting networks in R. Plotting in R is typically based around two main approaches:

  • the ‘base’ approach in R by default, and
  • the ‘grid’ approach made popular by the famous and very flexible {ggplot2} package.1

In the case of base R graphics, plots are essentially written straight to the plotting device. This means that they are not easily modified after the fact. You would need to replot the whole thing to change something. Moreover, while there is an admirably clean aestethic to base R graphics, it can be difficult to modify or extend them to your needs.

In the case of grid graphics, plots are built up in layers, and thus can be modified after the fact. That is, you can initialise a plot using ggplot2::ggplot(), specifying the data and mapping variables to various aesthetic features, and then add layers to it using + to add further points and lines, but also titles, legends, etc.

The following figure illustrates the difference between these two approaches.

plot(mtcars$hp, mtcars$mpg,
     main = "Base R: MPG vs Horsepower",
     xlab = "Horsepower",
     ylab = "Miles per Gallon",
     pch = 19,
     col = "blue")
ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point(color = "blue") +
  labs(title = "ggplot2: MPG vs Horsepower",
       x = "Horsepower",
       y = "Miles per Gallon")

  1. Perhaps of interest, gg stands for the Grammar of Graphics (https://doi.org/10.1007/0-387-28695-0).↩︎

Different graphing approaches in R

Approaches to plotting graphs or networks in R can be similarly divided:

  • two classic packages, {igraph} and {sna}, both build upon the ‘base’ R graphics engine,
  • newer packages {ggnetwork} and {ggraph} build upon a ‘grid’ approach.2

In this tutorial, we’re going to use the fict_lotr dataset from the {manynet} package. Let’s see how this would be plotted using {igraph} and {ggraph}, adding a title to each to facilitate comparison, but otherwise relying on default behaviour.

plot(as_igraph(fict_lotr),
     main = "igraph: fict_lotr")
ggraph::ggraph(as_tidygraph(fict_lotr)) +
  ggraph::geom_edge_link() +
  ggraph::geom_node_point() +
  ggtitle("ggraph: fict_lotr")

We can see here that {igraph} plots the network in a fairly basic way, straight to the plotting device (window). By default, it uses a force-directed layout (see Layouts below),3 colors the nodes orange, and prints node labels if they have them. However, the layout is not optimised for the size of the plotting window, the node labels are regularly overlapping, and the orange color with black borders is not particularly appealing or helpful for label legibility. It only works with ‘igraph’ objects.

In contrast, {ggraph} offers the trademark flexibility of the grammar of graphics approach. However, it requires the user to build up a plot from the ground up, which can be daunting for new users and fiddly even for experienced ones. Four lines are required to get even a basic plot, with an additional line required if a grey background is not desired. No labels or other information are added by default, and would also require additional lines. It works with ‘tidygraph’ objects, which are an additional layer on top of ‘igraph’ objects.


  1. Others include: ‘Networkly’ for creating 2-D and 3-D interactive networks that can be rendered with plotly and can be easily integrated into shiny apps or markdown documents; ‘visNetwork’ interacts with javascript (vis.js) to make interactive networks (http://datastorm-open.github.io/visNetwork/); and ‘networkD3’ interacts with javascript (D3) to make interactive networks (https://www.r-bloggers.com/2016/10/network-visualization-part-6-d3-and-r-networkd3/).↩︎

  2. Which incidentally returns a different layout each time it is run.↩︎

Who draws first

{autograph} builds upon these packages, but takes a somewhat different approach. Because it builds upon the ‘grid’ approach of {ggplot2} and {ggraph}, lending itself to the additional layering and flexibility of those packages. Because it depends on the coercion routines available in {manynet}, it can be used with network-related objects from most common network analysis packages. However, unlike those packages it offers concise and easy-to-use functions for drawing graphs that offer sensible defaults for most common use cases, using the information that is available in the network object.

The first thing you will want to do when you import or create a new network dataset is draw it. Let’s say that we’re still interested in the fict_lotr dataset from the {manynet} package. Compared to the {igraph} and {ggraph} examples before, we can see that autograph::graphr() offers a much more concise way to draw the network.

graphr(fict_lotr)

The package also offers methods for plotting statistics related to networks (e.g. degree distributions) and models of them (e.g. goodness-of-fit plots). But we’ll get to that later, and most of these are demonstrated in other vignettes, tutorials, or packages. {autograph} also offers consistent theming across graphs and plots, so that you do not need to keep specifying the same options over and over again.

In the following pages, we’re going to go through a number of different ways of taking control of the graphing process. Click ‘Next Topic’ to continue.

Illustrating graphs

Once we have an initial graph of our network, we can start to explore features of the network and its structure in more detail. There are a number of different dimensions network researchers can play with to illustrate different aspects of the network. On her excellent and helpful website, Katya Ognyanova outlines some of these dimensions:

Nodes Ties
Position layout= Arrows (e.g. capped, head shape)
Labels labels=, node_group= Type (e.g. solid, dashed)
Shape node_shape= Shape (e.g. straight, bent)
Size node_size= Size edge_size=
Color node_color=, node_colour= Color edge_color=, edge_colour=

Currently only those options with named parameters in the table above are available to be customised in {autograph} at the moment. Tie arrows and shapes are used to indicate directionality and reciprocity, where present in the data. Let’s go through some of these options in more detail.

Shaping nodes

One of the first things we might be interested in doing is understanding better the distribution of some categorical variable. Our fict_lotr dataset contains a variable called Race, so let’s try and change the shape of the nodes by this variable. Following the syntax shown in the table above, we just need to reference the variable name in the node_shape argument.

fict_lotr
graphr(fict_lotr, node_shape = "Race")

We can see here that there are six different races present here.4 Unfortunately, this is a few too many different categories to be effectively distinguished by shape.


  1. Though the keen-eyed and well-read among you will have noticed that there are some racial assignments that are debatable.↩︎

Colouring nodes

Let’s try instead colouring the nodes by this “Race” variable. It is very similar to the shape example above. Can you try and complete the code yourself?

graphr(fict_lotr, node_color = "Race")

That’s much easier to read. Since the same colours seem to be clustered together, with the humans and hobbits each clustered together in the centre of the graph, and the elves clustered towards the left, we might infer that there is some ‘homophily’ going on here – a topic for another tutorial.

An alternative to coloring the nodes is to use the ‘node_group’ argument to highlight groups in a network. This puts a shaded area around nodes of the same group. For rather spatially clustered distributions, this can be a very effective way to show groupings, but can be sensitive to the layout used. If nodes of the same group are not close together, the shaded areas can overlap and make the graph harder to read.

graphr(fict_lotr, node_group = "Race")

Note that node_color and node_group can be used together, either to highlight different groups or to emphasise group assignment where there is the kind of interpenetration or overlap described above as a challenge.

Sizing nodes

What about if we’re interested in a continuous variable instead of a categorical variable? While the fict_lotr dataset does not contain any continuous nodal variables, we can create one rather easily from the network itself. Let’s use each node’s degree, which is the number of ties incident/connecting to the node.

fict_lotr %>% 
  mutate(Degree = node_deg(fict_lotr)) %>% 
  graphr(node_size = "Degree")

Tying up loose ends

All this works similarly with ties/edges. Just replace node_ with edge_ in the arguments above, and you can control edges’ size and color. Have a try yourself by adding some additional variables to the data. Try adding in a binary variable to each tie called ‘is_tri’ that indicates whether the tie is a part of a triangle or not. If you add a continuous variable to each tie called ‘weight’, and a categorical variable to the ties called ‘type’, then graphr() will even try to use this information automatically.

fict_lotr %>% 
  mutate_ties(weight = tie_closeness(fict_lotr),
              is_tri = tie_is_triangular(fict_lotr)) %>% 
  graphr(edge_color = "is_tri")

Theming

Setting a theme

Perhaps you are preparing a presentation, representing your institution, department, or research centre at home or abroad. In this case, you may wish to theme the whole network with institutional colors and fonts. Indeed, you may even want to set a theme that is then reused across all your graphs and plots. {autograph} offers a number of themes that can be set using the stocnet_theme() function.

stocnet_theme("default")
graphr(fict_lotr, node_color = "Race")
stocnet_theme("iheid")
graphr(fict_lotr, node_color = "Race")

More institutional scales and themes are available, and more can be implemented upon pull request.

Who’s hue?

By default, graphr() will use a color palette that offers fairly good contrast and better accessibility. However, a different hue might offer a better aesthetic or identifiability for some nodes. Because the graphr() function is based on the grammar of graphics, it’s easy to extend or alter aesthetic aspects. Here let’s try and change the colors assigned to the different races in the fict_lotr dataset.

graphr(fict_lotr,
           node_color = "Race")

graphr(fict_lotr,
           node_color = "Race") +
  ggplot2::scale_colour_hue()

Grayscale

Other times color may not be desired. Some publications require grayscale images. To use a grayscale color palette, replace _hue from above with _grey (note the ‘e’ spelling):

graphr(fict_lotr,
           node_color = "Race") +
  ggplot2::scale_colour_grey()

As you can see, grayscale is more effective for continuous variables or very few discrete variables than the number used here.

Manual override

Or we may want to choose particular colors for each category. This is pretty straightforward to do with ggplot2::scale_colour_manual(). Some common color names are available, but otherwise hex color codes can be used for more specific colors. Unspecified categories are coloured (dark) grey.

graphr(fict_lotr,
           node_color = "Race") +
  ggplot2::scale_colour_manual(
    values = c("Dwarf" = "red",
               "Hobbit" = "orange",
               "Maiar" = "#DEC20B",
               "Human" = "lightblue",
               "Elf" = "lightgreen",
               "Ent" = "darkgreen")) +
  labs(color = "Color")

Titles, labels, and legends

When it comes to communicating insights from network graphs to others, it is important to add in the contextual information that will help them understand what they are looking at. In this section, we will learn how to add titles, labels, and legends to graphs.

Labels

With our fict_lotr example above, because the network is itself labelled, graphr() automatically adds in the node labels because they are available. If you do not want these labels, you can remove them from the network before passing it on to graphr(), or you can use the argument labels = FALSE.

graphr(fict_lotr, labels = FALSE)

Without the labels, the structure of the network is clearer and easier to interpret, though we lose the information about which node is which character.

Titles

{autograph} works well with both {ggplot2} and {ggraph} functions that can be appended to create more tailored visualisations. Let’s try this by adding a title to a plot. Append (with a +) labs(title = ) to add a title to a plot, say “My graph”, and then add also a subtitle (an argument to that function), say “I did this”.

graphr(fict_lotr) + 
  labs(title = "My visualisation", 
       subtitle = "I did this")

Note that you can also use ggtitle() to do the same thing, but if you just remember labs() you can also use it to add labels for x and y axes, and legends (see below).

Legends

While {autograph} attempts to provide legends where necessary, in some cases the legends offer insufficient detail, such as in the following figure, or are absent.

fict_lotr %>% 
  mutate(maxbet = node_is_max(node_betweenness(fict_lotr))) %>% 
  graphr(node_color = "maxbet")

{autograph} supports the {ggplot2} way of adding legends after the main plot has been constructed, using guides() to add in the legends, and labs() for giving those legends particular titles. Note that we can use "\n" within the legend title to make the title span multiple lines.

fict_lotr %>% 
  mutate(maxbet = node_is_max(node_betweenness(fict_lotr))) %>% 
  graphr(node_color = "maxbet") +
  guides(color = "legend") + 
  labs(color = "Maximum\nBetweenness")

To change the position of the legend, add the theme() function from {ggplot2}. The legend can be positioned at the top, bottom, left, or right, or removed using “none”.

Layouts

The aim of graph layouts is to position nodes in a (usually) two-dimensional space to maximise some analytic and aesthetically pleasing function. There is a lot to which one could potentially pay attention. Quality measures might include:

  • minimising the crossing number of edges/ties in the graph (planar graphs require no crossings)
  • minimising the slope number of distinct edge slopes in the graph (where vertices are represented as points on a Euclidean plane)
  • minimising the bend number in all edges in the graph (every graph has a right angle crossing (RAC) drawing with three bends per edge)
  • minimising the total edge length
  • minimising the maximum edge length
  • minimising the edge length variance
  • maximising the angular resolution or sharpest angle of edges meeting at a common vertex
  • minimising the bounding box of the plot
  • evening the aspect ratio of the plot
  • displaying symmetry groups (subgraph automorphisms)

Graph layouts available in the {igraph}, {ggraph}, {graphlayouts}, and {autograph} packages can be used in graphr(). These can be specified using the layout argument. In the following sections, we review some of the most common types of layouts.

Force-directed layouts

Force-directed layouts updates some initial placement of vertices through the operation of some system of metaphorically-physical forces. These might include attractive and repulsive forces.

(graphr(ison_southern_women, layout = "kk") + ggtitle("Kamada-Kawai") |
   graphr(ison_southern_women, layout = "fr") + ggtitle("Fruchterman-Reingold") |
   graphr(ison_southern_women, layout = "stress") + ggtitle("Stress Minimisation"))

The Kamada-Kawai (KK) method inserts a spring between all pairs of vertices that is the length of the graph distance between them. This means that edges with a large weight will be longer. KK offers a good layout for lattice-like networks, because it will try to space the network out evenly.

The Fruchterman-Reingold (FR) method uses an attractive force between directly connected vertices, and a repulsive force between all vertex pairs. The attractive force is proportional to the edge’s weight, thus edges with a large weight will be shorter. FR offers a good baseline for most types of networks.

The Stress Minimisation (stress) method is related to the KK algorithm, but offers better runtime, quality, and stability and so is generally preferred. Indeed, {manynet} uses it as the default for most networks. It has the advantage of returning the same layout each time it is run on the same network.

Other force-directed layouts available include:

  • Simulated annealing (Davidson and Harel 1993): "dh"
  • Graph embedder (Frick et al. 1995): "gem"
  • Graphopt (Schmuhl): "graphopt"
  • Distributed recursive graph layout (Martin et al. 2008): "drl"

Layered layouts

Layered layouts arrange nodes into horizontal (or vertical) layers, positioning them so that they reduce crossings. These layouts are best suited for directed acyclic graphs or similar.

graphr(ison_southern_women, layout = "bipartite") + ggtitle("Bipartite")
graphr(ison_southern_women, layout = "hierarchy") + ggtitle("Hierarchy")
graphr(ison_southern_women, layout = "railway") + ggtitle("Railway")

Note that "hierarchy" and "railway" use a different algorithm to {igraph}’s "bipartite", and generally performs better, especially where there are multiple layers. Whereas "hierarchy" tries to position nodes to minimise overlaps, "railway" sequences the nodes in each layer to a grid so that nodes are matched as far as possible. If you want to flip the horizontal and vertical, you could flip the coordinates, or use something like the following layout.

graphr(ison_southern_women, layout = "alluvial") + ggtitle("Alluvial")

Other layered layouts include:

  • Tree: "tree"
  • Dominance layouts

Circular layouts

Circular layouts arrange nodes around (potentially concentric) circles, such that crossings are minimised and adjacent nodes are located close together. In some cases, location or layer can be specified by attribute or mode.

graphr(ison_southern_women, layout = "concentric") + ggtitle("Concentric")

Other such layouts include:

  • circular: "circle"
  • sphere: "sphere"
  • star: "star"
  • arc or linear layouts: "linear"

Spectral layouts

Spectral layouts arrange nodes according to the eigenvalues of the Laplacian matrix of a graph. These layouts tend to exaggerate clustering of like-nodes and the separation of less similar nodes in two-dimensional space.

graphr(ison_southern_women, layout = "eigen") + ggtitle("Eigenvector")

Somewhat similar are multidimensional scaling techniques, that visualise the similarity between nodes in terms of their proximity in a two-dimensional (or more) space.

graphr(ison_southern_women, layout = "mds") + ggtitle("Multidimensional Scaling")

Other such layouts include:

  • Pivot multidimensional scaling: "pmds"

Grid layouts

Grid layouts arrange nodes based on some Cartesian coordinates. These can be useful for making sure all nodes’ labels are visible, but horizontal and vertical lines can overlap, making it difficult to distinguish whether some nodes are tied or not.

graphr(ison_southern_women, layout = "grid") + ggtitle("Grid")

Other grid layouts include:

  • orthogonal layouts for e.g. printed circuit boards
  • grid snapping for other layouts

Multiple graphs

Arrangements

{autograph} uses the {patchwork} package for arranging graphs together, e.g. side-by-side or above one another. The syntax is quite straight forward and is used throughout these vignettes/tutorials. Basically, you just use + to put graphs side-by-side, and / to put them above one another. Parentheses can be used to group graphs together.

graphr(fict_lotr) + graphr(ison_algebra)
graphr(fict_lotr) / graphr(ison_algebra)

Sets

graphr() is not the only graphing function included in {autograph}. To graph sets of networks together, graphs() makes sure that two or more networks are plotted together. This might be a set of ego networks, subgraphs, or waves of a longitudinal network.

graphs(to_subgraphs(fict_lotr, "Race"),
       waves = c(1,2,3,4))

What is happening here is that to_subgraphs() is creating a list of subgraphs and then graphs() is plotting them together at once with the same set of aesthetic parameters.

Dynamics

grapht() is another alternative to graphr(), this time rendering network changes as a gif. While the grapht() function is not as flexible as graphr(), it is very useful for visualising changes in networks over time.

fict_lotr %>%
  mutate_ties(year = sample(1:12, 66, replace = TRUE)) %>%
  to_waves(attribute = "year", cumulative = TRUE) %>%
  grapht()

More functionality will be added to this function in future releases.

Further flexibility

For more flexibility with visualizations, {autograph} users are encouraged to use the excellent {ggraph} package. {ggraph} is built upon the venerable {ggplot2} package and works with tbl_graph and igraph objects. As with {ggplot2}, {ggraph} users are expected to build a particular plot from the ground up, adding explicit layers to visualise the nodes and edges.

library(ggraph)
ggraph(fict_greys, layout = "fr") + 
  geom_edge_link(edge_colour = "dark grey", 
                  arrow = arrow(angle = 45,
                                length = unit(2, "mm"),
                                type = "closed"),
                  end_cap = circle(3, "mm")) +
  geom_node_point(size = 2.5, shape = 19, colour = "blue") +
  geom_node_text(aes(label=name), family = "serif", size = 2.5) +
  scale_edge_width(range = c(0.3,1.5)) +
  theme_graph() +
  theme(legend.position = "none")

As we can see in the code above, we can specify various aspects of the plot to tailor it to our network.

First, we can alter the layout of the network using the layout = argument to create a clearer visualisation of the ties between nodes. This is especially important for larger networks, where nodes and ties are more easily obscured or misrepresented. In {ggraph}, the default layout is the “stress” layout. The “stress” layout is a safe choice because it is deterministic and fits well with almost any graph, but it is also a good idea to explore and try out other layouts on your data. More layouts can be found in the {graphlayouts} and {igraph} R packages. To use a layout from the {igraph} package, enter only the last part of the layout algorithm name (eg. layout = "mds" for “layout_with_mds”).

Second, using geom_node_point() which draws the nodes as geometric shapes (circles, squares, or triangles), we can specify the presentation of nodes in the network in terms of their shape (shape=, choose from 1 to 21), size (size=), or colour (colour=). We can also use aes() to match to node attributes. To add labels, use geom_node_text() or geom_node_label() (draws labels within a box). The font (family=), font size (size=), and colour (colour=) of the labels can be specified.

Third, we can also specify the presentation of edges in the network. To draw edges, we use geom_edge_link0() or geom_edge_link(). Using the latter function makes it possible to draw a straight line with a gradient. The following features can be tailored either globally or matched to specific edge attributes using aes():

  • colour: edge_colour=

  • width: edge_width=

  • linetype: edge_linetype=

  • opacity: edge_alpha=

For directed graphs, arrows can be drawn using the arrow= argument and the arrow() function from {ggplot2}. The angle, length, arrowhead type, and padding between the arrowhead and the node can also be specified.

For more see David Schoch’s excellent resources on this.

Plotting

While researchers will probably want to start with using graphr() to visualise the network, {autograph} also offers plot() methods for a number of different network-related objects. These include of measures of centrality, cohesion, and clustering, as well as goodness-of-fit plots for network models from packages such as {RSiena} and {MoNAn}. Usefully, all these plots use the same theming system as graphr(), so that you can set a theme once and have it apply to all your graphs and plots. Let’s try this now with a few examples.

stocnet_theme("default")
plot(node_degree(fict_lotr)) + 
plot(node_closeness(fict_lotr))
stocnet_theme("oxf")
plot(node_degree(fict_lotr)) + 
plot(node_closeness(fict_lotr))

This is a very simple example, but the same principle applies to all plots in {autograph}. One can set a theme once and have it apply to all plots. You can also always add additional {ggplot2} layers to any plot to further customise it. For example, it is straightforward to add titles and labels to these plots, but it is also possible to add trend lines, confidence intervals, and so on. The user is encouraged to explore the {ggplot2} package for more details.

Exporting plots to PDF

We can print the plots we have made to PDF by point-and-click by selecting ‘Save as PDF…’ from under the ‘Export’ dropdown menu in the plots panel tab of RStudio.

If you want to do this programmatically, say because you want to record how you have saved it so that you can e.g. make some changes to the parameters at some point, this is also not too difficult.

After running the (gg-based) plot you want to save, use the command ggsave("my_filename.pdf") to save your plot as a PDF to your working directory. If you want to save it somewhere else, you will need to specify the file path (or change the working directory, but that might be more cumbersome). If you want to save it as a different filetype, replace .pdf with e.g. .png or .jpeg. See ?ggsave for more.

Visualisation

by James Hollway