Quick start guide

Martin Westgate & Dax Kellie

2024-04-09

galah is an R interface to biodiversity data hosted by the Global Biodiversity Information Facility (GBIF) and its subsidiary node organisations. GBIF and its partner nodes collate and store observations of individual life forms using the ‘Darwin Core’ data standard.

Installation

To install from CRAN:

install.packages("galah")

Or install the development version from GitHub:

install.packages("remotes")
remotes::install_github("AtlasOfLivingAustralia/galah")

Load the package

library(galah)

Configuration

By default, galah downloads information from the Atlas of Living Australia (ALA). To show the full list of organisations currently supported by galah, use show_all(atlases).

show_all(atlases)
## # A tibble: 11 × 4
##    region         institution                                                             acronym url                         
##    <chr>          <chr>                                                                   <chr>   <chr>                       
##  1 Australia      Atlas of Living Australia                                               ALA     https://www.ala.org.au      
##  2 Austria        Biodiversitäts-Atlas Österreich                                         BAO     https://biodiversityatlas.at
##  3 Brazil         Sistemas de Informações sobre a Biodiversidade Brasileira               SiBBr   https://sibbr.gov.br        
##  4 Estonia        eElurikkus                                                              <NA>    https://elurikkus.ee        
##  5 France         Portail français d'accès aux données d'observation sur les espèces      OpenObs https://openobs.mnhn.fr     
##  6 Global         Global Biodiversity Information Facility                                GBIF    https://gbif.org            
##  7 Guatemala      Sistema Nacional de Información sobre Diversidad Biológica de Guatemala SNIBgt  https://snib.conap.gob.gt   
##  8 Portugal       GBIF Portugal                                                           GBIF.pt https://www.gbif.pt         
##  9 Spain          GBIF Spain                                                              GBIF.es https://www.gbif.es         
## 10 Sweden         Swedish Biodiversity Data Infrastructure                                SBDI    https://biodiversitydata.se 
## 11 United Kingdom National Biodiversity Network                                           NBN     https://nbn.org.uk

Use galah_config() to set the node organisation using its region, name, or acronym. Once set, galah will automatically populate the server configuration for your selected GBIF node. To download occurrence records from your chosen GBIF node, you will need to register an account with them (using their website), then provide your registration email to galah. To download from GBIF, you will need to provide the email, username, and password.

galah_config(atlas = "GBIF",
             username = "user1",
             email = "email@email.com",
             password = "my_password")

You can find a full list of configuration options by running ?galah_config.

Basic syntax

The standard method to construct queries in {galah} is via piped functions. Pipes in galah start with the galah_call() function, and typically end with collect(), though collapse() and compute() are also supported. The development team use the base pipe by default (|>), but the {magrittr} pipe (%>%) should work too.

galah_config(atlas = "ALA",
             verbose = FALSE)
galah_call() |>
  count() |>
  collect()
## # A tibble: 1 × 1
##       count
##       <int>
## 1 133616691

To pass more complex queries, you can use additional {dplyr} functions such as filter(), select(), and group_by().

galah_call() |> 
  filter(year >= 2020) |> 
  count() |>
  collect()
## # A tibble: 1 × 1
##      count
##      <int>
## 1 28235670

Each GBIF node allows you to query using their own set of in-built fields. You can investigate which fields are available using show_all() and search_all():

search_all(fields, "australian states")
## # A tibble: 2 × 3
##   id     description                            type  
##   <chr>  <chr>                                  <chr> 
## 1 cl2013 ASGS Australian States and Territories fields
## 2 cl22   Australian States and Territories      fields

Taxonomic searches

To narrow your search to a particular taxonomic group, use identify(). Note that this function only accepts scientific names and is not case sensitive. It’s good practice to first use search_taxa() to check that the taxa you provide returns the correct taxonomic results.

search_taxa("reptilia") # Check whether taxonomic info is correct
## # A tibble: 1 × 9
##   search_term scientific_name taxon_concept_id                                                          rank  match_type kingdom  phylum   class    issues 
##   <chr>       <chr>           <chr>                                                                     <chr> <chr>      <chr>    <chr>    <chr>    <chr>  
## 1 reptilia    REPTILIA        https://biodiversity.org.au/afd/taxa/682e1228-5b3c-45ff-833b-550efd40c399 class exactMatch Animalia Chordata Reptilia noIssue
galah_call() |>
  identify("reptilia") |> 
  filter(year >= 2020) |> 
  count() |>
  collect()
## # A tibble: 1 × 1
##    count
##    <int>
## 1 252833

If you want to query something other than the number of records, modify the type argument in galah_call(). Here we’ll query the number of species:

galah_call(type = "species") |>
  identify("reptilia") |> 
  filter(year >= 2020) |> 
  count() |>
  collect()
## # A tibble: 1 × 1
##   count
##   <int>
## 1   866

Download

To download records—rather than find how many records are available—simply remove the count() function from your pipe.

result <- galah_call() |>
  identify("Litoria") |>
  filter(year >= 2020, cl22 == "Tasmania") |>
  select(basisOfRecord, group = "basic") |>
  collect()

result |> head()
## # A tibble: 6 × 9
##   recordID                             scientificName    taxonConceptID decimalLatitude decimalLongitude eventDate           occurrenceStatus dataResourceName basisOfRecord
##   <chr>                                <chr>             <chr>                    <dbl>            <dbl> <dttm>              <chr>            <chr>            <chr>        
## 1 00168ca6-84d0-4af1-8fa8-875fd69d25da Litoria raniform… https://biodi…           -41.2             146. 2023-12-20 23:20:19 PRESENT          iNaturalist Aus… HUMAN_OBSERV…
## 2 00250163-ec50-4eda-a5d5-58ae98bc5834 Litoria raniform… https://biodi…           -41.2             147. 2023-08-23 01:49:28 PRESENT          iNaturalist Aus… HUMAN_OBSERV…
## 3 003e0f63-9f95-4af9-b272-10db6d7b6371 Litoria ewingii   https://biodi…           -42.9             148. 2022-12-23 19:27:00 PRESENT          iNaturalist Aus… HUMAN_OBSERV…
## 4 00410554-5289-416f-9848-74df4a814b93 Litoria ewingii   https://biodi…           -41.7             147. 2021-05-06 00:00:00 PRESENT          FrogID           OCCURRENCE   
## 5 0070521f-bb45-46fb-8385-1a542c3a81a5 Litoria ewingii   https://biodi…           -43.1             147. 2023-12-20 03:29:23 PRESENT          iNaturalist Aus… HUMAN_OBSERV…
## 6 0081e7ef-459b-42a9-8f0b-b3664ec94d0e Litoria ewingii   https://biodi…           -43.2             147. 2020-08-02 00:00:00 PRESENT          FrogID           OCCURRENCE

Check out our other vignettes for more detail on how to use these functions.