Title: Resolving Plant Taxon Names Using the Australian Plant Census
Version: 1.1.3
Description: The process of resolving taxon names is necessary when working with biodiversity data. 'APCalign' uses the Australian Plant Census (APC) and the Australian Plant Name Index (APNI) to align and update plant taxon names to current, accepted standards. 'APCalign' also supplies information about the established status of plant taxa across different states/territories.
License: MIT + file LICENSE
Encoding: UTF-8
Language: en
LazyData: true
Depends: R (≥ 4.1.0),
Imports: readr, purrr, dplyr, stringr, stringi, stringdist, crayon, httr, jsonlite, curl, arrow, rlang
Suggests: janitor, tidyr, covr, knitr, rmarkdown, kableExtra, here, testthat (≥ 3.0.0)
RoxygenNote: 7.3.2
Config/testthat/edition: 3
VignetteBuilder: knitr
URL: https://traitecoevo.github.io/APCalign/, https://github.com/traitecoevo/APCalign
BugReports: https://github.com/traitecoevo/APCalign/issues
NeedsCompilation: no
Packaged: 2025-02-11 00:01:38 UTC; dfalster
Author: Daniel Falster ORCID iD [aut, cre, cph], Elizabeth Wenk ORCID iD [aut, ctb], Will Cornwell ORCID iD [aut, ctb], Fonti Kar ORCID iD [aut, ctb], Carl Boettiger ORCID iD [ctb]
Maintainer: Daniel Falster <daniel.falster@unsw.edu.au>
Repository: CRAN
Date/Publication: 2025-02-11 13:40:05 UTC

Standardising Taxonomic Names in Australian Plants

Description

The process of standardising taxon names is necessary when working with biodiversity data. 'APCalign' uses the Australian Plant Name Index (APNI) and the Australian Plant Census (APC) to align and update plant taxon names to current, accepted standards. 'APCalign' can also supply information about the established status of plant taxa across different states/territories.

Functions

Standarise taxon names

Established status by region

Author(s)

Maintainer: Daniel Falster daniel.falster@unsw.edu.au (ORCID) [copyright holder]

Authors:

Other contributors:

References

If you have any questions, comments or suggestions, please submit an issue at our GitHub repository

See Also

Useful links:


Align Australian plant scientific names to the APC or APNI

Description

For a list of Australian plant names, find taxonomic or scientific name alignments to the APC or APNI through standardizing formatting and fixing spelling errors.

Usage case: Users will run this function if they wish to see the details of the matching algorithms, the many output columns that the matching function compares to as it seeks the best alignment. They may also select this function if they want to adjust the “fuzziness” level for fuzzy matches, options not allowed in create_taxonomic_update_lookup. This function is the first half of create_taxonomic_update_lookup.

Usage

align_taxa(
  original_name,
  output = NULL,
  full = FALSE,
  resources = load_taxonomic_resources(),
  quiet = FALSE,
  fuzzy_abs_dist = 3,
  fuzzy_rel_dist = 0.2,
  fuzzy_matches = TRUE,
  imprecise_fuzzy_matches = FALSE,
  APNI_matches = TRUE,
  identifier = NA_character_
)

Arguments

original_name

A list of names to query for taxonomic alignments.

output

(optional) The name of the file to save the results to.

full

Parameter to determine how many columns are output

resources

the taxonomic resources used to align the taxa names. Loading this can be slow, so call load_taxonomic_resources separately to greatly speed this function up and pass the resources in.

quiet

Logical to indicate whether to display messages while aligning taxa.

fuzzy_abs_dist

The number of characters allowed to be different for a fuzzy match.

fuzzy_rel_dist

The proportion of characters allowed to be different for a fuzzy match.

fuzzy_matches

Fuzzy matches are turned on as a default. The relative and absolute distances allowed for fuzzy matches to species and infraspecific taxon names are defined by the parameters fuzzy_abs_dist and fuzzy_rel_dist

imprecise_fuzzy_matches

Imprecise fuzzy matches uses the fuzzy matching function with lenient levels set (absolute distance of 5 characters; relative distance = 0.25). It offers a way to get a wider range of possible names, possibly corresponding to very distant spelling mistakes. This is FALSE as default and all outputs should be checked as it often makes erroneous matches.

APNI_matches

Name matches to the APNI (Australian Plant Names Index) are turned on as a default.

identifier

A dataset, location or other identifier, which defaults to NA.

Details

Notes:

Value

A tibble with columns that include original_name, aligned_name, taxonomic_dataset, taxon_rank, aligned_reason, alignment_code.

See Also

load_taxonomic_resources

Other taxonomic alignment functions: create_taxonomic_update_lookup(), update_taxonomy()

Examples



resources <- load_taxonomic_resources()

# example 1
align_taxa(c("Poa annua", "Abies alba"), resources=resources)

# example 2
input <- c("Banksia serrata", "Banksia serrate", "Banksia cerrata", 
"Banksia serrrrata", "Dryandra sp.", "Banksia big red flowers")

aligned_taxa <-
  APCalign::align_taxa(
    original_name = input,
    identifier = "APCalign test",
    full = TRUE,
    resources=resources
  ) 
  




State level native and introduced origin status

Description

This function uses the taxon distribution data from the APC to determine state level native and introduced origin status.

This function processes the geographic data available in the APC and returns state level native, introduced and more complicated origins status for all taxa.

Usage

create_species_state_origin_matrix(resources = load_taxonomic_resources())

Arguments

resources

the taxonomic resources required to make the summary statistics. Loading this can be slow, so call load_taxonomic_resources separately to greatly speed this function up and pass the resources in.

Value

A tibble with columns representing each state and rows representing each species. The values in each cell represent the origin of the species in that state.

See Also

load_taxonomic_resources

Other diversity methods: native_anywhere_in_australia(), state_diversity_counts()

Examples

create_species_state_origin_matrix()




Create a table with the best-possible scientific name match for Australian plant names

Description

This function takes a list of Australian plant names that need to be reconciled with current taxonomy and generates a lookup table of the best-possible scientific name match for each input name.

Usage case: This is APCalign’s core function, merging together the alignment and updating of taxonomy.

Usage

create_taxonomic_update_lookup(
  taxa,
  stable_or_current_data = "stable",
  version = default_version(),
  taxonomic_splits = "most_likely_species",
  full = FALSE,
  fuzzy_abs_dist = 3,
  fuzzy_rel_dist = 0.2,
  fuzzy_matches = TRUE,
  APNI_matches = TRUE,
  imprecise_fuzzy_matches = FALSE,
  identifier = NA_character_,
  resources = load_taxonomic_resources(quiet = quiet),
  quiet = FALSE,
  output = NULL
)

Arguments

taxa

A list of Australian plant species that needs to be reconciled with current taxonomy.

stable_or_current_data

either "stable" for a consistent version, or "current" for the leading edge version.

version

The version number of the dataset to use.

taxonomic_splits

How to handle one_to_many taxonomic matches. Default is "return_all". The other options are "collapse_to_higher_taxon" and "most_likely_species". most_likely_species defaults to the original_name if that name is accepted by the APC; this will be right for certain species subsets, but make errors in other cases, use with caution.

full

logical for whether the full lookup table is returned or just key columns

fuzzy_abs_dist

The number of characters allowed to be different for a fuzzy match.

fuzzy_rel_dist

The proportion of characters allowed to be different for a fuzzy match.

fuzzy_matches

Fuzzy matches are turned on as a default. The relative and absolute distances allowed for fuzzy matches to species and infraspecific taxon names are defined by the parameters fuzzy_abs_dist and fuzzy_rel_dist.

APNI_matches

Name matches to the APNI (Australian Plant Names Index) are turned off as a default.

imprecise_fuzzy_matches

Imprecise fuzzy matches uses the fuzzy matching function with lenient levels set (absolute distance of 5 characters; relative distance = 0.25). It offers a way to get a wider range of possible names, possibly corresponding to very distant spelling mistakes. This is FALSE as default and all outputs should be checked as it often makes erroneous matches.

identifier

A dataset, location or other identifier, which defaults to NA.

resources

These are the taxonomic resources used for cleaning, this will default to loading them from a local place on your computer. If this is to be called repeatedly, it's much faster to load the resources using load_taxonomic_resources separately and pass the data in.

quiet

Logical to indicate whether to display messages while loading data and aligning taxa.

output

file path to save the output. If this file already exists, this function will check if it's a subset of the species passed in and try to add to this file. This can be useful for large and growing projects.

Details

Notes:

Value

A lookup table containing the accepted and suggested names for each original name input, and additional taxonomic information such as taxon rank, taxonomic status, taxon IDs and genera.

See Also

load_taxonomic_resources

Other taxonomic alignment functions: align_taxa(), update_taxonomy()

Examples


resources <- load_taxonomic_resources()

# example 1
create_taxonomic_update_lookup(c("Eucalyptus regnans",
                                 "Acacia melanoxylon",
                                 "Banksia integrifolia",
                                 "Not a species"),
                                 resources = resources)
                                 
# example 2
input <- c("Banksia serrata", "Banksia serrate", "Banksia cerrata", 
"Banksea serrata", "Banksia serrrrata", "Dryandra")

create_taxonomic_update_lookup(
    taxa = input,
    identifier = "APCalign test",
    full = TRUE,
    resources = resources
  )

# example 3
taxon_list <-
  readr::read_csv(
  system.file("extdata", "test_taxa.csv", package = "APCalign"),
  show_col_types = FALSE)

create_taxonomic_update_lookup(
    taxa = taxon_list$original_name,
    identifier = taxon_list$notes,
    full = TRUE,
    resources = resources
  )



Get the default version for stable data

Description

This function returns the default version for stable data, which is used when no version is specified.

Usage

default_version()

Value

A character string representing the default version for stable data.

Examples

default_version()


GBIF Australian Plant Data

Description

A subset of plant data from the Global Biodiversity Information Facility

Usage

gbif_lite

Format

gbif_lite A tibble with 129 rows and 7 columns:

species

The name of the first or species of scientificname

infraspecificepithet

The name of the lowest or terminal infraspecific epithet of the scientificname

taxonrank

The taxonomic rank of the most specific name

decimalLongitude

Longitude in decimal degrees

decimalLatitude

Latitude in decimal degrees

scientificname

Scientific Name

verbatimscientificname

Scientific name as it appeared in original record

Source

https://www.gbif.org/


Lookup Family by Genus from APC

Description

Retrieve the family name for a given genus using taxonomic data from the Australian Plant Census (APC).

Usage

get_apc_genus_family_lookup(genus, resources = load_taxonomic_resources())

Arguments

genus

A character vector of genus names for which to retrieve the corresponding family names.

resources

The taxonomic resources required to make the lookup. Loading this can be slow, so call load_taxonomic_resources separately to speed up this function and pass the resources in.

Value

A data frame with two columns: "genus", indicating the genus name, and "family", indicating the corresponding family name from the APC.

See Also

load_taxonomic_resources

Examples

 get_apc_genus_family_lookup(genus = c("Acacia", "Eucalyptus"))

Which versions of taxonomic resources are available?

Description

Which versions of taxonomic resources are available?

Usage

get_versions()

Value

tibble of dates when APC/APNI resources were downloaded as a Github Release

Examples

get_versions()

Load taxonomic reference lists, APC & APNI

Description

This function loads two taxonomic datasets for Australia's vascular plants, the APC and APNI, into the global environment. It creates several data frames by filtering and selecting data from the loaded lists.

Usage

load_taxonomic_resources(
  stable_or_current_data = "stable",
  version = default_version(),
  quiet = FALSE
)

Arguments

stable_or_current_data

Type of dataset to access. The default is "stable", which loads the dataset from a github archived file. If set to "current", the dataset will be loaded from a URL which is the cutting edge version, but this may change at any time without notice.

version

The version number of the dataset to use. Defaults to the default version.

quiet

A logical indicating whether to print status of loading to screen. Defaults to FALSE.

Details

Value

The taxonomic resources data loaded into the global environment.

Examples


load_taxonomic_resources(stable_or_current_data="stable", 
version="2024-10-11")


Native anywhere in Australia

Description

This function checks which species from a list is thought to be native anywhere in Australia according to the APC.

Usage

native_anywhere_in_australia(species, resources = load_taxonomic_resources())

Arguments

species

A character string typically representing the binomial for the species.

resources

An optional list of taxonomic resources to use for the lookup. If not provided, the function will load default taxonomic resources using the load_taxonomic_resources() function.

Details

Important caveats:

Value

A tibble with two columns: species, which is the same as the unique values of the input species, and native_anywhere_in_aus, a vector indicating whether each species is native anywhere in Australia, introduced by humans from elsewhere, or unknown with respect to the APC resource.

See Also

Other diversity methods: create_species_state_origin_matrix(), state_diversity_counts()

Examples

native_anywhere_in_australia(c("Eucalyptus globulus","Pinus radiata","Banksis notaspecies"))

Objects exported from other packages

Description

These objects are imported from other packages. Follow the links below to see their documentation.

dplyr

%>%


Standardise taxon names

Description

Standardises taxon names by performing a series of text substitutions to remove common inconsistencies in taxonomic nomenclature.

The function takes a character vector of taxon names as input and returns a character vector of taxon names using standardised taxonomic syntax as output.

Usage

standardise_names(taxon_names)

Arguments

taxon_names

A character vector of taxon names that need to be standardised.

Details

Value

A character vector of standardised taxon names.

Examples

standardise_names(c("Quercus suber",
                    "Eucalyptus sp.",
                    "Eucalyptus spp.",
                    "Agave americana var. marginata",
                    "Agave americana v marginata",
                    "Notelaea longifolia forma longifolia",
                    "Notelaea longifolia f longifolia"))

Standardise taxon ranks

Description

Standardise taxon ranks from Latin into English.

Usage

standardise_taxon_rank(taxon_rank)

Arguments

taxon_rank

A character vector of Latin taxon ranks.

Details

The function takes a character vector of Latin taxon ranks as input and returns a character vector of taxon ranks using standardised English terms.

Value

A character vector of English taxon ranks.

Examples

standardise_taxon_rank(c("regnum", "kingdom", "classis", "class"))

State- and territory-level diversity

Description

For Australian states and territories, use geographic distribution data from the APC to calculate state-level diversity for native, introduced, and more complicated species origins

Usage

state_diversity_counts(state, resources = load_taxonomic_resources())

Arguments

state

A character string indicating the Australian state or territory to calculate the diversity for. Possible values are "NSW", "NT", "Qld", "WA", "ChI", "SA", "Vic", "Tas", "ACT", "NI", "LHI", "MI", "HI", "MDI", "CoI", "CSI", and "AR".

resources

the taxonomic resources required to make the summary statistics. loading this can be slow, so call load_taxonomic_resources separately to greatly speed this function up and pass the resources in.

Value

A tibble of diversity counts for the specified state or territory, including native, introduced, and more complicated species origins. The tibble has three columns: "origin" indicating the origin of the species, "state" indicating the Australian state or territory, and "num_species" indicating the number of species for that origin and state.

See Also

load_taxonomic_resources

Other diversity methods: create_species_state_origin_matrix(), native_anywhere_in_australia()

Examples

 state_diversity_counts(state = "NSW")

Strip taxon names

Description

Strip taxonomic names of taxon rank abbreviations and qualifiers and special characters

Usage

strip_names(taxon_names)

Arguments

taxon_names

A character vector of taxonomic names to be stripped.

Details

Given a vector of taxonomic names, this function removes:

The resulting vector of names is also converted to lowercase.

Value

A character vector of stripped taxonomic names, with subtaxa designations, special characters, and extra whitespace removed, and all letters converted to lowercase.

Examples

strip_names(c("Abies lasiocarpa subsp. lasiocarpa",
              "Quercus kelloggii",
              "Pinus contorta var. latifolia"))


Strip taxon names, extra

Description

Strip taxonomic names of sp. and hybrid symbols. This function assumes that a character function has already been run through strip_names.

Usage

strip_names_extra(taxon_names)

Arguments

taxon_names

A character vector of taxonomic names to be stripped.

Details

Given a vector of taxonomic names, this function removes additional filler words (" x " for hybrid taxa, "sp.") not removed by the function strip_names

Value

A character vector of stripped taxonomic names, with sp. and hybrid symbols removed.

Examples

strip_names_extra(c("Abies lasiocarpa subsp. lasiocarpa",
              "Quercus kelloggii",
              "Pinus contorta var. latifolia",
              "Acacia sp.",
              "Lepidium sp. Tanguin Hill (K.R.Newbey 10501)"))


Update to currently accepted APC name and add APC/APNI name metadata

Description

For a list of taxon names aligned to the APC, update the name to an accepted taxon concept per the APC and add scientific name and taxon concept metadata to names aligned to either the APC or APNI.

Usage

update_taxonomy(
  aligned_data,
  taxonomic_splits = "most_likely_species",
  quiet = TRUE,
  output = NULL,
  resources = load_taxonomic_resources()
)

Arguments

aligned_data

A tibble of plant names to update. This table must include 5 columns, original_name, aligned_name, taxon_rank, taxonomic_dataset, and aligned_reason. These columns are created by the function align_taxa. The columns original_name and aligned_name must be in the format of the scientific name, with genus and species, and may contain additional qualifiers such as subspecies or varieties. The names are case insensitive.

taxonomic_splits

Variable that determines what protocol to use to update taxon names that are ambiguous due to taxonomic splits. The three options are:

  • most_likely_species, which returns the species name in use before the split; alternative names are returned in a separate column

  • return_all, which returns all possible names

  • collapse_to_higher_taxon, which declares that an ambiguous name cannot be aligned to an accepted species/infraspecific name and the name is demoted to genus rank

quiet

Logical to indicate whether to display messages while updating taxa.

output

(optional) Name of the file where results are saved. The default is NULL and no file is created. If specified, the output will be saved in a CSV file with the given name.

resources

the taxonomic resources required to make the summary statistics. Loading this can be slow, so call load_taxonomic_resources separately to greatly speed this function up and pass the resources in.

Details

Notes:

Value

A tibble with updated taxonomy for the specified plant names. The tibble contains the following columns:

See Also

load_taxonomic_resources

Other taxonomic alignment functions: align_taxa(), create_taxonomic_update_lookup()

Examples

# Update taxonomy for two plant names and print the result

resources <- load_taxonomic_resources()

update_taxonomy(
 dplyr::tibble(
   original_name = c("Dryandra preissii", "Banksia acuminata"),
   aligned_name = c("Dryandra preissii", "Banksia acuminata"),
   taxon_rank = c("species", "species"),
   taxonomic_dataset = c("APC", "APC"),
   aligned_reason = c(NA_character_,
   NA_character_)
 ),
 resources = resources
)