--- title: "4. Routing" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{4. Routing} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) knitr::opts_knit$set(global.par = TRUE) ``` ```{r plot, echo=FALSE, results='asis'} # plot margins oldpar = par(no.readonly = TRUE) par(mar = c(1, 1, 1, 1)) # crayon needs to be explicitly activated in Rmd oldoptions = options() options(crayon.enabled = TRUE) # Hooks needs to be set to deal with outputs # thanks to fansi logic old_hooks = fansi::set_knit_hooks( knitr::knit_hooks, which = c("output", "message", "error") ) ``` Calculating shortest paths between pairs of nodes is a core task in network analysis. The `sfnetworks` package offers wrappers around the path calculation functions of `igraph`, making it easier to use them when working with spatial data and tidyverse packages. This vignette demonstrates their functionality. In this regard it is important to remember that `sfnetworks` is a general-purpose package for spatial network analysis, not specifically optimized for a single task. If your *only* purpose is many-to-many routing in large networks, there might be other approaches that are faster and fit better to your needs. For example, the [dodgr](https://github.com/UrbanAnalyst/dodgr) package was designed for many-to-many routing on large dual-weighted graphs, with its main focus on OpenStreetMap road network data. The [cppRouting](https://github.com/vlarmet/cppRouting) package contains functions to calculate shortest paths and isochrones/isodistances on weighted graphs. When working with OpenStreetMap data, it is also possible to use the interfaces to external routing engines such as [graphhopper](https://github.com/crazycapivara/graphhopper-r), [osrm](https://github.com/riatelab/osrm) and [opentripplanner](https://github.com/ropensci/opentripplanner). The [stplanr](https://github.com/ropensci/stplanr) package for sustainable transport planning lets you use many routing engines from a single interface, through `stplanr::route()`, including routing using local R objects. Of course, all these packages can be happily used *alongside* `sfnetworks`. ```{r, message=FALSE} library(sfnetworks) library(sf) library(tidygraph) library(dplyr) library(purrr) library(TSP) ``` ## Calculating shortest paths The function `st_network_paths()` is a wrapper around the igraph function `igraph::shortest_paths()`. There are two main differences: - Besides node indices and node names, `st_network_paths()` gives the additional option to provide any (set of) geospatial point(s) as *from* and *to* location(s) of the shortest paths, either as sf or sfc object. Provided points that do not equal any node in the network will be snapped to their nearest node before calculating the paths. - To allow smooth integration with the tidyverse, the output of `st_network_paths()` is a tibble, with one row per returned path. The column *node_paths* contains the ordered list of node indices in the path, and the column *edge_paths* contains the ordered list of edge indices in the path. Just as `igraph::shortest_paths()`, the `st_network_paths()` function is meant for one-to-one and one-to-many routing. Hence, it is only possible to provide a single *from* location, while the *to* locations can be more than one. Lets start with the most basic example of providing node indices as *from* and *to* locations. Remember that a node index in a sfnetwork refers to the position of the node in the nodes table of the network (i.e. its row number). There is also the possibility to use character encoded node names instead of numeric node indices. This requires the nodes table to have a column *name* with a unique name for each node. We will use geographic edge lengths as the edge weights to be used for shortest paths calculation. In weighted networks, `igraph::shortest_paths()` applies the Dijkstra algorithm to find the shortest paths. In the case of unweighted networks, it uses breadth-first search instead. ```{r, fig.height=5, fig.width=5} net = as_sfnetwork(roxel, directed = FALSE) %>% st_transform(3035) %>% activate("edges") %>% mutate(weight = edge_length()) paths = st_network_paths(net, from = 495, to = c(458, 121), weights = "weight") paths paths %>% slice(1) %>% pull(node_paths) %>% unlist() paths %>% slice(1) %>% pull(edge_paths) %>% unlist() plot_path = function(node_path) { net %>% activate("nodes") %>% slice(node_path) %>% plot(cex = 1.5, lwd = 1.5, add = TRUE) } colors = sf.colors(3, categorical = TRUE) plot(net, col = "grey") paths %>% pull(node_paths) %>% walk(plot_path) net %>% activate("nodes") %>% st_as_sf() %>% slice(c(495, 121, 458)) %>% plot(col = colors, pch = 8, cex = 2, lwd = 2, add = TRUE) ``` Next we will create some geospatial points that do not intersect with any node in the network. Providing them to `st_network_paths()` will first find the nearest node to each of them, and then calculate the shortest paths accordingly. ```{r, fig.height=5, fig.width=5} p1 = st_geometry(net, "nodes")[495] + st_sfc(st_point(c(50, -50))) st_crs(p1) = st_crs(net) p2 = st_geometry(net, "nodes")[458] p3 = st_geometry(net, "nodes")[121] + st_sfc(st_point(c(-10, 100))) st_crs(p3) = st_crs(net) paths = st_network_paths(net, from = p1, to = c(p2, p3), weights = "weight") plot(net, col = "grey") paths %>% pull(node_paths) %>% walk(plot_path) plot(c(p1, p2, p3), col = colors, pch = 8, cex = 2, lwd = 2, add = TRUE) ``` Simply finding the nearest node to given points is not always the best way to go. For example: when a provided point is close to a location on an edge linestring, but relatively far from any node, it gives better results when the routing does not start at its nearest node, but at that nearby location on the edge linestring. To accommodate this, you can first *blend* your input points into the network, and then proceed as normal. Blending is the name we gave to the combined process of snapping a point to its nearest point on its nearest edge, splitting the edge there, and adding the snapped point as node to the network. It is implement in the function `st_network_blend()`. See the vignette on [spatial joins and filters](https://luukvdmeer.github.io/sfnetworks/articles/sfn03_join_filter.html#blending-points-into-a-network) for more details. Another issue may arise wen your network consists of multiple components that are not connected to each other. In that case, it is possible that the nearest node to a provided point is located in a tiny component and only a few other nodes can be reached from it. In such cases it might be good to first reduce the network to its largest (or *n* largest) component(s) before calculating shortest paths. The tidygraph function `tidygraph::group_components()` can help with this. It assigns an integer to each node identifying the component it is in, with `1` being the largest component in the network, `2` the second largest, and so on. ```{r, fig.height=5, fig.width=5} # Our network consists of several unconnected components. with_graph(net, graph_component_count()) connected_net = net %>% activate("nodes") %>% filter(group_components() == 1) plot(net, col = colors[2]) plot(connected_net, cex = 1.1, lwd = 1.1, add = TRUE) ``` Another way to calculate shortest paths, which fits nicely in the tidygraph style of working, is by using the `to_spatial_shortest_paths()` morpher function. This will subset the original network to only contain those nodes and edges that appear in a shortest path between two nodes. See the vignette on [Spatial morphers](https://luukvdmeer.github.io/sfnetworks/articles/sfn05_morphers.html#to_spatial_shortest_paths) for details. ## Retrieving an OD cost matrix The shortest paths calculation as described above is only supported for one-to-one and one-to-many routing. The alternative for many-to-many routing is the calculation of an origin-destination cost matrix. Instead of providing the individual paths, it returns a matrix in which entry $i,j$ is the total cost (i.e. sum of weights) of the shortest path from node $i$ to node $j$. The origin-destination cost matrix is usually an important starting point for further analysis. For example, it can serve as input to route optimization algorithms, spatial clustering algorithms and the calculation of statistical measures based on spatial proximity. The igraph function for this purpose is `igraph::distances()`, which in `sfnetworks` is wrapped by `st_network_cost()`, allowing again to provide sets of geospatial points as *from* and *to* locations. Note that the calculated costs refer to the paths between the *nearest nodes* of the input points. Their units are the same as the units of weights used in the calculation, in this case meters. ```{r} st_network_cost(net, from = c(p1, p2, p3), to = c(p1, p2, p3), weights = "weight") ``` If we don't provide any from and to points, `st_network_cost()` will by default calculate the cost matrix for the entire network. ```{r} # Our network has 701 nodes. with_graph(net, graph_order()) cost_matrix = st_network_cost(net, weights = "weight") dim(cost_matrix) ``` In directed networks, `st_network_cost()` also gives you the possibility to define if you want to allow travel only over outbound edges from the *from* points (by setting `direction = "out"`, the default), only over inbound edges (by setting `direction = "in"`), or both (by setting `direction = "all"`, i.e. assuming an undirected network). ```{r} net %>% convert(to_spatial_directed) %>% st_network_cost( from = c(p1, p2, p3), to = c(p1, p2, p3), direction = "in" ) net %>% convert(to_spatial_directed) %>% st_network_cost( from = c(p1, p2, p3), to = c(p1, p2, p3), direction = "out" ) net %>% convert(to_spatial_directed) %>% st_network_cost( from = c(p1, p2, p3), to = c(p1, p2, p3), direction = "all" ) ``` All `...` arguments are forwarded internally to `igraph::distances()`. Among other options, this means that you can define which algorithm is used for the paths calculation. By default, `igraph` will choose the most suitable algorithm based on characteristics of the request, such as the type of weights (e.g. only positive or also negative) and the number of given from locations. For details, see the [igraph documentation](https://igraph.org/r/doc/distances.html). ## Applications In this section, we will show a small set of applications of the routing related functions. It is definitely not meant to be an overview that covers everything! Also, remember again that `sfnetworks` is a general-purpose spatial network analysis package not optimized for a specific application. However, especially in combination with other packages it can address a wide variety of use-cases. ### Closest facility analysis The purpose of closest facility analysis is, given a set of destination locations (also referred to as the *facilities*) and origin locations (also referred to as the *sites*), to find the closest *n* facilities to each site. For example, you might want to find the nearest transit hub for each address in a city, or the nearest hospital to high-risk road intersections. To solve this problem, you can calculate the OD cost matrix with the sites as *from* points, and the facilities as *to* points. Then, for each row (i.e. each site) you find the column(s) with the lowest cost value. ```{r, fig.height=5, fig.width=5} # Select a random set of sites and facilities. # We select random locations within the bounding box of the nodes. set.seed(128) hull = net %>% activate("nodes") %>% st_geometry() %>% st_combine() %>% st_convex_hull() sites = st_sample(hull, 50, type = "random") facilities = st_sample(hull, 5, type = "random") # Blend the sites and facilities into the network to get better results. # Also select only the largest connected component. new_net = net %>% activate("nodes") %>% filter(group_components() == 1) %>% st_network_blend(c(sites, facilities)) # Calculate the cost matrix. cost_matrix = st_network_cost(new_net, from = sites, to = facilities, weights = "weight") # Find for each site which facility is closest. closest = facilities[apply(cost_matrix, 1, function(x) which(x == min(x))[1])] # Create a line between each site and its closest facility, for visualization. draw_lines = function(sources, targets) { lines = mapply( function(a, b) st_sfc(st_cast(c(a, b), "LINESTRING"), crs = st_crs(net)), sources, targets, SIMPLIFY = FALSE ) do.call("c", lines) } connections = draw_lines(sites, closest) # Plot the results. plot(new_net, col = "grey") plot(connections, col = colors[2], lwd = 2, add = TRUE) plot(facilities, pch = 8, cex = 2, lwd = 2, col = colors[3], add = TRUE) plot(sites, pch = 20, cex = 2, col = colors[2], add = TRUE) ``` ### Route optimization The traveling salesman problem aims to find the shortest tour that visits a set of locations exactly once and then returns to the starting location. In `sfnetworks`, there are no dedicated functions to solve this. However, we can bring in other packages here to assist us. Probably the best known R package for solving traveling salesman problems is [TSP](https://CRAN.R-project.org/package=TSP). To do so, it requires a matrix that specifies the travel cost between each pair of locations. As shown above, we can use `sfnetworks` to calculate such a cost matrix for our network. Lets first generate a set of random points within the bounding box of the network. These will serve as the locations the traveling salesman has to visit. ```{r} set.seed(403) rdm = net %>% st_bbox() %>% st_as_sfc() %>% st_sample(10, type = "random") ``` We can then compute the cost matrix using `st_network_cost()`. Internally, it will first snap the provided points to their nearest node in the network, and then use the *weight* column to calculate the travel costs between these nodes. Passing the matrix to `TSP::TSP()` will make it suitable for usage inside the `TSP` package. ```{r} net = activate(net, "nodes") cost_matrix = st_network_cost(net, from = rdm, to = rdm, weights = "weight") # Use nearest node indices as row and column names. rdm_idxs = st_nearest_feature(rdm, net) row.names(cost_matrix) = rdm_idxs colnames(cost_matrix) = rdm_idxs round(cost_matrix, 0) ``` By passing the cost matrix to the solver `TSP::solve_TSP()` we obtain an object of class `TOUR`. This object contains a named vector specifying the indices of the provided locations in the optimal order of visit. ```{r} tour = solve_TSP(TSP(units::drop_units(cost_matrix))) tour_idxs = as.numeric(names(tour)) tour_idxs # Approximate length of the route. # In meters, since that was the unit of our cost values. round(tour_length(tour), 0) ``` Knowing the optimal order to visit all provided locations, we move back to `sfnetworks` to match this route to the network. We do so by computing the shortest paths between each location and its subsequent one, until we reach the starting point again. For more details on solving travelling salesman problems in R, see the [TSP package documentation](https://CRAN.R-project.org/package=TSP). ```{r, fig.height=5, fig.width=5} # Define the nodes to calculate the shortest paths from. # Define the nodes to calculate the shortest paths to. # All based on the calculated order of visit. from_idxs = tour_idxs to_idxs = c(tour_idxs[2:length(tour_idxs)], tour_idxs[1]) # Calculate the specified paths. tsp_paths = mapply(st_network_paths, from = from_idxs, to = to_idxs, MoreArgs = list(x = net, weights = "weight") )["node_paths", ] %>% unlist(recursive = FALSE) # Plot the results. plot(net, col = "grey") plot(rdm, pch = 20, col = colors[2], add = TRUE) walk(tsp_paths, plot_path) # Reuse the plot_path function defined earlier. plot( st_as_sf(slice(net, rdm_idxs)), pch = 20, col = colors[3], add = TRUE ) plot( st_as_sf(slice(net, tour_idxs[1])), pch = 8, cex = 2, lwd = 2, col = colors[3], add = TRUE ) text( st_coordinates(st_as_sf(slice(net, tour_idxs[1]))) - c(200, 90), labels = "start/end\npoint" ) ``` ### Isochrones and isodistances With respect to a given point $p$ and a given travel time $t$, an isochrone is the line for which it holds that the travel time from any point on the line to or from $p$ is equal to $t$. When using distances instead of time, it is called an isodistance. In `sfnetworks` there are no dedicated, optimized functions for calculating isochrones or isodistances. However, we can roughly approximate them by using a combination of sf and tidygraph functions. Lets first calculate imaginary travel times for each edge, using randomly generated average walking speeds for each type of edge. ```{r, warning=FALSE} # How many edge types are there? types = net %>% activate("edges") %>% pull(type) %>% unique() types # Randomly define a walking speed in m/s for each type. # With values between 3 and 7 km/hr. set.seed(1) speeds = runif(length(types), 3 * 1000 / 60 / 60, 7 * 1000 / 60 / 60) # Assign a speed to each edge based on its type. # Calculate travel time for each edge based on that. net = net %>% activate("edges") %>% group_by(type) %>% mutate(speed = units::set_units(speeds[cur_group_id()], "m/s")) %>% mutate(time = weight / speed) %>% ungroup() net ``` Now, we can calculate the total travel time for each shortest path between the (nearest node of the) origin point and all other nodes in the network, using the node measure function `tidygraph::node_distance_from()` with the values in the *time* column as weights. Then, we filter the nodes reachable within a given travel time from the origin. By drawing a convex hull around these selected nodes we roughly approximate the isochrone. If we wanted isochrones for travel times *towards* the central point, we could have used the node measure function `tidygraph::node_distance_to()` instead. ```{r, fig.height=5, fig.width=5} net = activate(net, "nodes") p = net %>% st_geometry() %>% st_combine() %>% st_centroid() iso = net %>% filter(node_distance_from(st_nearest_feature(p, net), weights = time) <= 600) iso_poly = iso %>% st_geometry() %>% st_combine() %>% st_convex_hull() plot(net, col = "grey") plot(iso_poly, col = NA, border = "black", lwd = 3, add = TRUE) plot(iso, col = colors[2], add = TRUE) plot(p, col = colors[1], pch = 8, cex = 2, lwd = 2, add = TRUE) ``` The workflow presented above is generalized in a spatial morpher function `to_spatial_neighborhood()`, which can be used inside the `tidygraph::convert()` verb to filter only those nodes that can be reached within a given travel cost from the given origin node. ```{r, fig.width=5, fig.height=5} # Define the threshold values (in seconds). # Define also the colors to plot the neighborhoods in. thresholds = rev(seq(60, 600, 60)) palette = sf.colors(n = 10) # Plot the results. plot(net, col = "grey") for (i in c(1:10)) { nbh = convert(net, to_spatial_neighborhood, p, thresholds[i], weights = "time") plot(nbh, col = palette[i], add = TRUE) } plot(p, pch = 8, cex = 2, lwd = 2, add = TRUE) ``` ### Custom routing In many cases the shortest path based geographical distances or travel time is not necessarily the optimal path. The most appropriate route may depend on many factors, for example staying away from large and potentially unpleasant highways for motor traffic. All networks have different characteristics. In OpenStreetMap networks the 'highway' type is often a good (albeit crude) approximation of the network. The `roxel` demo dataset is derived from OpenStreetMap and represents a largely residential network: ```{r} table(roxel$type) ``` Building on the shortest paths calculated in a previous section we can try an alternative routing profile. Let's take a look at these paths to see what type of ways they travel on: ```{r, fig.height=5, fig.width=5} paths_sf = net %>% activate("edges") %>% slice(unlist(paths$edge_paths)) %>% st_as_sf() table(paths_sf$type) plot(paths_sf["type"], lwd = 4, key.pos = 4, key.width = lcm(4)) ``` As with the overall network, the paths we calculated are dominated by residential streets. For the purposes of illustration, lets imagine we're routing for a vehicle that we want to keep away from residential roads, and which has a much lower cost per unit distance on secondary roads: ```{r} weighting_profile = c( cycleway = Inf, footway = Inf, path = Inf, pedestrian = Inf, residential = 3, secondary = 1, service = 1, track = 10, unclassified = 1 ) weighted_net = net %>% activate("edges") %>% mutate(multiplier = weighting_profile[type]) %>% mutate(weight = edge_length() * multiplier) ``` We can now recalculate the routes. The result show routes that avoid residential networks. ```{r, fig.show='hold', out.width = '50%'} weighted_paths = st_network_paths(weighted_net, from = 495, to = c(458, 121), weights = "weight") weighted_paths_sf = weighted_net %>% activate("edges") %>% slice(unlist(weighted_paths$edge_paths)) %>% st_as_sf() table(weighted_paths_sf$type) plot(st_as_sf(net, "edges")["type"], lwd = 4, key.pos = NULL, reset = FALSE, main = "Distance weights") plot(st_geometry(paths_sf), add = TRUE) plot(st_as_sf(net, "edges")["type"], lwd = 4, key.pos = NULL, reset = FALSE, main = "Custom weights") plot(st_geometry(weighted_paths_sf), add = TRUE) ``` Note that developing more sophisticated routing profiles is beyond the scope of this package. If you need complex mode-specific routing profiles, we recommend looking at routing profiles associated with open source routing engines, such as [bike.lua](https://github.com/fossgis-routing-server/cbf-routing-profiles/blob/master/bike.lua) in the OSRM project. Another direction of travel could be to extend on the approach illustrated here, but this work could be well-suited to a separate package that builds on `sfnetworks` (remembering that sophisticated routing profiles account for nodes and edges). If you'd like to work on such a project to improve mode-specific routing in R by building on this package, please let us know in the [discussion room](https://github.com/luukvdmeer/sfnetworks/discussions)! ```{r, include = FALSE} par(oldpar) options(oldoptions) ```