hyperoverlap

Matilda Brown

2021-08-10

Hyperoverlap can be used to detect and visualise overlap in n-dimensional space.

Data: iris

To explore the functions in hyperoverlap, we’ll use the iris dataset. This dataset contains 150 observations of three species of iris (“setosa”, “versicolor” and “virginica”). These data are four-dimensional (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) and are documented in ?iris. We’ll set up five test datasets to explore the different functions: 1. test1 two entities (setosa, virginica); three dimensions (Sepal.Length, Sepal.Width, Petal.Length) 1. test2 two entities (versicolor, virginica); three dimensions (as above) 1. test3 two entities (setosa, virginica); four dimensions 1. test4 two entities (versicolor, virginica); four dimensions 1. test5 all entities, all dimensions

test1 <- iris[which(iris$Species!="versicolor"),c(1:3,5)]
test2 <- iris[which(iris$Species!="setosa"),c(1:3,5)]
test3 <- iris[which(iris$Species!="versicolor"),]
test4 <- iris[which(iris$Species!="setosa"),]
test5 <- iris

Note that entities may be species, genera, populations etc.

Examining overlap between two entities in 3D

To plot the decision boundary using hyperoverlap_plot, the data cannot exceed three dimensions. For high-dimensional visualisation, see hyperoverlap_lda.

library(hyperoverlap)
setosa_virginica3d <- hyperoverlap_detect(test1[,1:3], test1$Species)
versicolor_virginica3d <- hyperoverlap_detect(test2[,1:3], test2$Species)

To examine the result:

setosa_virginica3d@result             #gives us the result: overlap or non-overlap?
#> [1] "non-overlap"
versicolor_virginica3d@result
#> [1] "overlap"

setosa_virginica3d@shape              #for the non-overlapping pair, was the decision boundary linear or curvilinear? 
#> [1] "linear"


hyperoverlap_plot(setosa_virginica3d) #plot the data and the decision boundary in 3d
hyperoverlap_plot(versicolor_virginica3d) 

Note the points on the ‘wrong side’ of the boundary when comparing versicolor and virginica

Examining overlap between two entities in n-dimensions

To visualise overlap in n-dimensions, we need to use ordination techniques. The function hyperoverlap_lda uses a combination of linear discriminant analysis (LDA) and principal components analysis (PCA) to choose the best two (or three) axes for visualisation. To plot these using other methods (e.g. ggplot2), the point coordinates are returned as output, here named transformed_data.

setosa_virginica4d <- hyperoverlap_detect(test3[,1:4], test3$Species)
versicolor_virginica4d <- hyperoverlap_detect(test4[,1:4], test4$Species)

To examine the result:

setosa_virginica4d@result             #gives us the result: overlap or non-overlap?
#> [1] "non-overlap"
versicolor_virginica4d@result
#> [1] "overlap"

setosa_virginica4d@shape              #for the non-overlapping pair, was the decision boundary linear or curvilinear? 
#> [1] "linear"

transformed_data <- hyperoverlap_lda(setosa_virginica4d)  #plots the best two dimensions for visualising overlap
transformed_data <- hyperoverlap_lda(versicolor_virginica4d) 

In three dimensions:

rgl.close()  #close previous device
transformed_data <- hyperoverlap_lda(setosa_virginica4d, visualise3d=TRUE) 
rgl.close()  #close previous device
transformed_data <- hyperoverlap_lda(versicolor_virginica4d, visualise3d=TRUE) #plots the best three dimensions for visualising overlap

Examining patterns of overlap in groups of entities

We might want to know which species overlap in certain variables from an entire genus. To do this, we can use hyperoverlap_set and visualise the results using hyperoverlap_pairs_plot

all_spp <- hyperoverlap_set(test5[,1:4],test5$Species)
all_spp_plot <- hyperoverlap_pairs_plot(all_spp)
all_spp_plot
#> Warning: Use of `x$result` is discouraged. Use `result` instead.