Each occurrence record contains taxonomic information and information about the observation itself, like its location and the date of observation. These pieces of information are recorded and categorised into respective fields. When you import data using galah, columns of the resulting tibble correspond to these fields.

Data fields are important because they provide a means to narrow and refine queries to return only the information that you need, and no more. Consequently, much of the architecture of galah has been designed to make narrowing as simple as possible. These functions include:

These names have been chosen to echo comparable functions from {dplyr}; namely filter(), select() and group_by(). With the exception of galah_geolocate(), they also use {dplyr} tidy evaluation and syntax. This means that you can alternate between dplyr and galah versions of these functions as you see fit. Below we use the galah_ prefix for consistency with earlier versions of this vignette.

galah_identify & search_taxa

Perhaps unsurprisingly, search_taxa() searches for taxonomic information. search_taxa() uses fuzzy-matching to work a lot like the search bar on the Atlas of Living Australia website, and you can use it to search for taxa by their scientific name.

Finding your desired taxon with search_taxa() is an important step to using this taxonomic information to download data. For example, to search for reptiles, we first need to identify whether we have the correct query:

search_taxa("Reptilia")
## # A tibble: 1 × 9
##   search_term scientific_name taxon_concept_id                                                          rank  match_type kingdom  phylum   class    issues 
##   <chr>       <chr>           <chr>                                                                     <chr> <chr>      <chr>    <chr>    <chr>    <chr>  
## 1 Reptilia    REPTILIA        https://biodiversity.org.au/afd/taxa/682e1228-5b3c-45ff-833b-550efd40c399 class exactMatch Animalia Chordata Reptilia noIssue

If we want to be more specific, we can provide a tibble (or data.frame) providing additional taxonomic information.

search_taxa(tibble(genus = "Eolophus", kingdom = "Aves"))
## # A tibble: 1 × 13
##   search_term   scientific_name scientific_name_authorship taxon_concept_id                                  rank  match_type kingdom phylum class order family genus issues
##   <chr>         <chr>           <chr>                      <chr>                                             <chr> <chr>      <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr> 
## 1 Eolophus_Aves Eolophus        Bonaparte, 1854            https://biodiversity.org.au/afd/taxa/009169a9-a9… genus exactMatch Animal… Chord… Aves  Psit… Cacat… Eolo… noIss…

Once we know that our search matches the correct taxon or taxa, we can use galah_identify() to narrow the results of our query.

galah_call() |>
  galah_identify("Reptilia") |>
  atlas_counts()
## # A tibble: 1 × 1
##     count
##     <int>
## 1 1732625
taxa <- search_taxa(tibble(genus = "Eolophus", kingdom = "Aves"))

galah_call() |>
 galah_identify(taxa) |>
 atlas_counts()
## # A tibble: 1 × 1
##     count
##     <int>
## 1 1094712

If you’re using an international atlas, search_taxa() will automatically switch to using the local name-matching service. For example, Portugal uses the GBIF taxonomic backbone, but integrates seamlessly with our standard workflow.

galah_config(atlas = "Portugal")
## Atlas selected: GBIF Portugal (GBIF.pt) [Portugal]
galah_call() |> 
  galah_identify("Lepus") |> 
  galah_group_by(species) |> 
  atlas_counts()
## # A tibble: 5 × 2
##   species           count
##   <chr>             <int>
## 1 Lepus granatensis  1378
## 2 Lepus microtis       64
## 3 Lepus europaeus      10
## 4 Lepus saxatilis       2
## 5 Lepus capensis        1

Conversely, the UK’s National Biodiversity Network (NBN), has its own taxonomic backbone, but is supported using the same function call.

galah_config(atlas = "United Kingdom")
## Atlas selected: National Biodiversity Network (NBN) [United Kingdom]
galah_call() |> 
  galah_filter(genus == "Bufo") |> 
  galah_group_by(species) |> 
  atlas_counts()
## # A tibble: 3 × 2
##   species       count
##   <chr>         <int>
## 1 Bufo bufo     95466
## 2 Bufo spinosus    91
## 3 Bufo marinus      1

galah_filter

Perhaps the most important function in galah is galah_filter(), which is used to filter the rows of queries.

galah_config(atlas = "Australia")
## Atlas selected: Atlas of Living Australia (ALA) [Australia]
# Get total record count since 2000
galah_call() |>
  galah_filter(year > 2000) |>
  atlas_counts()
## # A tibble: 1 × 1
##      count
##      <int>
## 1 92053621
# Get total record count for iNaturalist in 2021
galah_call() |>
  galah_filter(
    year > 2000,
    dataResourceName == "iNaturalist Australia") |>
  atlas_counts()
## # A tibble: 1 × 1
##     count
##     <int>
## 1 6589403

To find available fields and corresponding valid values, use the field lookup functions show_all(fields), search_all(fields) & show_values().

galah_filter() can also be used to make more complex taxonomic queries than are possible using search_taxa(). By using the taxonConceptID field, it is possible to build queries that exclude certain taxa, for example. This can be useful to filter for paraphyletic concepts such as invertebrates.

galah_call() |>
  galah_filter(
     taxonConceptID == search_taxa("Animalia")$taxon_concept_id,
     taxonConceptID != search_taxa("Chordata")$taxon_concept_id
  ) |>
  galah_group_by(class) |>
  atlas_counts()
## # A tibble: 70 × 2
##    class          count
##    <chr>          <int>
##  1 Insecta      6228043
##  2 Gastropoda    970153
##  3 Arachnida     812946
##  4 Maxillopoda   700845
##  5 Malacostraca  657558
##  6 Polychaeta    276240
##  7 Bivalvia      231228
##  8 Anthozoa      221108
##  9 Cephalopoda   148054
## 10 Demospongiae  117913
## # ℹ 60 more rows

galah_apply_profile

When working with the ALA, a notable feature is the ability to specify a data profile—a set of data quality filters—to remove records that are suspect in some way.

galah_call() |>
  galah_filter(year > 2000) |>
  galah_apply_profile(ALA) |>
  atlas_counts()
## # A tibble: 1 × 1
##      count
##      <int>
## 1 82002698

To see a full list of data quality profiles, use show_all(profiles).

galah_group_by

Use galah_group_by() to group and summarise record counts by specified fields.

# Get record counts since 2010, grouped by year and basis of record
galah_call() |>
  galah_filter(year > 2015 & year <= 2020) |>
  galah_group_by(year, basisOfRecord) |>
  atlas_counts()
## # A tibble: 36 × 3
##    year  basisOfRecord         count
##    <chr> <chr>                 <int>
##  1 2020  HUMAN_OBSERVATION   6583810
##  2 2020  OCCURRENCE           419843
##  3 2020  PRESERVED_SPECIMEN    86211
##  4 2020  MACHINE_OBSERVATION   39643
##  5 2020  OBSERVATION           24801
##  6 2020  MATERIAL_SAMPLE        2034
##  7 2020  LIVING_SPECIMEN          62
##  8 2019  HUMAN_OBSERVATION   5753392
##  9 2019  OCCURRENCE           290610
## 10 2019  PRESERVED_SPECIMEN   167373
## # ℹ 26 more rows

galah_select

Use galah_select() to select which columns are returned when downloading records.

Return columns 'kingdom', 'eventDate' & `species` only
occurrences <- galah_call() |>
  galah_identify("reptilia") |>
  galah_filter(year == 1930) |>
  galah_select(kingdom, species, eventDate) |>
  atlas_occurrences()

occurrences |> head()
## Error: <text>:1:8: unexpected symbol
## 1: Return columns
##            ^

You can also use other {dplyr} functions that work with dplyr::select() with galah_select().

occurrences <- galah_call() |>
  galah_identify("reptilia") |>
  galah_filter(year == 1930) |>
  galah_select(starts_with("accepted") | ends_with("record")) |>
  atlas_occurrences()

occurrences |> head()
## # A tibble: 6 × 6
##   acceptedNameUsage acceptedNameUsageID basisOfRecord      raw_basisOfRecord OCCURRENCE_STATUS_INFERRED_FROM_BASIS_OF_RECORD userDuplicateRecord
##   <chr>             <lgl>               <chr>              <chr>             <lgl>                                           <lgl>              
## 1 <NA>              NA                  PRESERVED_SPECIMEN Museum specimen   FALSE                                           FALSE              
## 2 <NA>              NA                  PRESERVED_SPECIMEN PreservedSpecimen FALSE                                           FALSE              
## 3 <NA>              NA                  PRESERVED_SPECIMEN Museum specimen   FALSE                                           FALSE              
## 4 <NA>              NA                  HUMAN_OBSERVATION  HumanObservation  FALSE                                           FALSE              
## 5 <NA>              NA                  PRESERVED_SPECIMEN PreservedSpecimen FALSE                                           FALSE              
## 6 <NA>              NA                  PRESERVED_SPECIMEN PreservedSpecimen FALSE                                           FALSE

galah_geolocate

Use galah_geolocate() to specify a geographic area or region to limit your search.

# Get list of perameles species in area specified:
# (Note: This can also be specified by a shapefile)
wkt <- "POLYGON((131.36328125 -22.506468769126,135.23046875 -23.396716654542,134.17578125 -27.287832521411,127.40820312499 -26.661206402316,128.111328125 -21.037340349154,131.36328125 -22.506468769126))"

galah_call() |>
  galah_identify("perameles") |>
  galah_geolocate(wkt) |>
  atlas_species()
## # A tibble: 1 × 11
##   taxon_concept_id                                                    species_name scientific_name_auth…¹ taxon_rank kingdom phylum class order family genus vernacular_name
##   <chr>                                                               <chr>        <chr>                  <chr>      <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>          
## 1 https://biodiversity.org.au/afd/taxa/f234f91f-8e00-405d-a89b-eb8fb… Perameles e… Spencer, 1897          species    Animal… Chord… Mamm… Pera… Peram… Pera… Desert Bandico…
## # ℹ abbreviated name: ¹​scientific_name_authorship

galah_geolocate() also accepts shapefiles. More complex shapefiles may need to be simplified first (e.g., using rmapshaper::ms_simplify())