The following table indicates the rules for each of the 51 separate algorithms sequentially applied to attempt to align each submitted name to a taxon concept in APC or scientific names in APNI.
Note, if the table is truncated on your screen, use horizontal scroll to view the entire table.
alignment_code | search algorithm | original name variant matched to | match type | taxonomic dataset aligned to | taxon_rank of alignment | notes about sequence |
---|---|---|---|---|---|---|
match_01a | Detect scientific names, including authorship | original_name | exact | APC accepted taxon concepts | species/infraspecific | Check if strings are full scientific names, including authorship. |
match_01b | Detect scientific names, including authorship | original_name | exact | other APC taxon concepts | species/infraspecific | NA |
match_01c | Detect canonical names, lacking authorship | cleaned_name | exact | APC accepted taxon concepts | species/infraspecific | Check if strings are taxon names, lacking authorship. |
match_01d | Detect canonical names, lacking authorship | cleaned_name | exact | other APC taxon concepts | species/infraspecific | NA |
match_02a |
Detect genus sp. , genus ssp. and
genus spp.
|
first word (“genus”) | exact | APC accepted taxon concepts, other APC taxon concepts, APNI | genus | First goal is to align 2-word strings that indicate an unknown species within a genus (or family) |
match_02b |
Detect genus sp. , genus ssp. and
genus spp.
|
first word (“genus”) | fuzzy | APC accepted taxon concepts | genus | NA |
match_02c |
Detect genus sp. , genus ssp. and
genus spp.
|
first word (“genus”) | fuzzy | other APC taxon concepts | genus | NA |
match_02d |
Detect family sp. , family ssp. and
family spp.
|
first word (“genus”) | exact | APC accepted taxon concepts | family | NA |
match_03a |
Detect -- , -- (intergrade taxa) and align to
genus
|
first word (“genus”) | exact | APC accepted taxon concepts, other APC taxon concepts, APNI | genus | Next find strings that indicate a name reflects an intergrade between two taxa. These names can only be aligned to a genus. |
match_03b |
Detect -- , -- (intergrade taxa) and align to
genus
|
first word (“genus”) | fuzzy | APC accepted taxon concepts | genus | NA |
match_03c |
Detect -- , -- (intergrade taxa) and align to
genus
|
first word (“genus”) | fuzzy | other APC taxon concepts | genus | NA |
match_03d |
Detect -- , -- (intergrade taxa) and align to
genus
|
first word (“genus”) | fuzzy | APNI | genus | NA |
match_03e |
Detect -- , -- (intergrade taxa), but fail to
align to genus
|
NA | no match | NA | NA | NA |
match_04a |
Detect \ (indecision between taxa) and align to genus.
|
first word (“genus”) | exact | APC accepted taxon concepts, other APC taxon concepts, APNI | genus | Next find strings that indicate a name reflects a data collector’s indecision about which of two (or more) taxa is the appropriate taxon. These names can only be aligned to a genus. |
match_04b |
Detect \ (indecision between taxa) and align to genus.
|
first word (“genus”) | fuzzy | APC accepted taxon concepts | genus | NA |
match_04c |
Detect \ (indecision between taxa) and align to genus.
|
first word (“genus”) | fuzzy | other APC taxon concepts | genus | NA |
match_04d |
Detect \ (indecision between taxa) and align to genus.
|
first word (“genus”) | fuzzy | APNI | genus | NA |
match_04e |
Detect \ (indecision between taxa), but fail to align to
genus
|
NA | no match | NA | NA | NA |
match_05a | Detect canonical names, lacking authorship | stripped_name | fuzzy | APC accepted taxon concepts | species/infraspecific | NA |
match_05b | Detect canonical names, lacking authorship | stripped_name | fuzzy | other APC taxon concepts | species/infraspecific | NA |
match_05c | Detect canonical names, lacking authorship | cleaned_name | exact | APNI | species/infraspecific | NA |
match_06a |
Detect aff , affinis (affinity to) and align to
genus
|
first word (“genus”) | exact | APC accepted taxon concepts, other APC taxon concepts, APNI | genus | Find strings that indicate a name that indicates an affinity to a specific taxon, but the name itself is not that taxon. Such names, unless documented in APC (i.e. matches 6, 7 above) can only be aligned to genus. |
match_06b |
Detect aff , affinis (affinity to) and align to
genus
|
first word (“genus”) | fuzzy | APC accepted taxon concepts | genus | NA |
match_06c |
Detect aff , affinis (affinity to) and align to
genus
|
first word (“genus”) | fuzzy | other APC taxon concepts | genus | NA |
match_06d |
Detect aff , affinis (affinity to) and align to
genus
|
first word (“genus”) | fuzzy | APNI | genus | NA |
match_06e |
Detect aff , affinis (affinity to), but fail to
align to genus
|
NA | no match | NA | NA | NA |
match_07a | Detect canonical names, lacking authorship | stripped_name | imprecise fuzzy | APC accepted taxon concepts | species/infraspecific | Further checks if strings are taxon names, lacking authorship, now with imprecise fuzzy matching |
match_07b | Detect canonical names, lacking authorship | stripped_name | imprecise fuzzy | other APC taxon concepts | species/infraspecific | NA |
match_08a |
Detect x (hybrid taxon) and align to genus
|
first word (“genus”) | exact | APC accepted taxon concepts, other APC taxon concepts, APNI | genus | Find strings that indicate a name that is a hybrid between two taxa. Such names, unless documented in APC (i.e. matches 6, 7 above) can only be aligned to genus. |
match_08b |
Detect x (hybrid taxon) and align to genus
|
first word (“genus”) | fuzzy | APC accepted taxon concepts | genus | NA |
match_08c |
Detect x (hybrid taxon) and align to genus
|
first word (“genus”) | fuzzy | other APC taxon concepts | genus | NA |
match_08d |
Detect x (hybrid taxon) and align to genus
|
first word (“genus”) | fuzzy | APNI | genus | NA |
match_08e |
Detect x (hybrid taxon), but fail to align to genus
|
NA | no match | NA | NA | NA |
match_09a | Detect canonical names, by checking first three words in string | three words (from stripped_name_2) | exact | APC accepted taxon concepts | species/infraspecific | Check if the first three words in the name string match with a taxon name, allowing notes to be discarded. Also useful for aligning phrase names. |
match_09b | Detect canonical names, by checking first three words in string | three words (from stripped_name_2) | exact | other APC taxon concepts | species/infraspecific | NA |
match_09c | Detect canonical names, by checking first three words in string | three words (from stripped_name_2) | fuzzy | APC accepted taxon concepts | species/infraspecific | NA |
match_09d | Detect canonical names, by checking first three words in string | three words (from stripped_name_2) | fuzzy | other APC taxon concepts | species/infraspecific | NA |
match_10a | Detect canonical names, by checking first two words in string | two words (from stripped_name_2) | exact | APC accepted taxon concepts | species/infraspecific | Check if the first two words in the name string match with a taxon name, allowing notes and invalid infraspecific names to be discarded. Also useful for aligning phrase names. |
match_10b | Detect canonical names, by checking first two words in string | two words (from stripped_name_2) | exact | other APC taxon concepts | species/infraspecific | NA |
match_10c | Detect canonical names, by checking first two words in string | two words (from stripped_name_2) | fuzzy | APC accepted taxon concepts | species/infraspecific | NA |
match_10d | Detect canonical names, by checking first two words in string | two words (from stripped_name_2) | fuzzy | other APC taxon concepts | species/infraspecific | NA |
match_11a | Detect canonical names, lacking authorship | stripped_name | fuzzy | APNI | species/infraspecific | Further checks if strings are APNI taxon names, lacking authorship, now with fuzzy matching or considering just the first three or two words in the string. |
match_11b | Detect canonical names, lacking authorship | stripped_name | imprecise fuzzy | APNI | species/infraspecific | NA |
match_11c | Detect canonical names, by checking first three words in string | three words (from stripped_name_2) | exact | APNI | species/infraspecific | NA |
match_11d | Detect canonical names, by checking first two words in string | two words (from stripped_name_2) | exact | APNI | species/infraspecific | NA |
match_12a | Detect genus, by checking the first word in the string | first word (“genus”) | exact | APC accepted taxon concepts | genus | Check if the first two word in the name string match with a taxon name, allowing an alignment to the genus-level or family-level |
match_12b | Detect genus, by checking the first word in the string | first word (“genus”) | exact | other APC taxon concepts | genus | NA |
match_12c | Detect genus, by checking the first word in the string | first word (“genus”) | exact | APNI | genus | NA |
match_12d | Detect family, by checking the first word in the string | first word (“genus”) | exact | APC accepted taxon concepts | family | NA |
match_12e | Detect family, by checking the first word in the string | first word (“genus”) | exact | other APC taxon concepts | family | NA |
match_12f | Detect genus, by checking the first word in the string | first word (“genus”) | fuzzy | APC accepted taxon concepts | genus | NA |
match_12g | Detect genus, by checking the first word in the string | first word (“genus”) | fuzzy | other APC taxon concepts | genus | NA |
match_12h | Detect family, by checking the first word in the string | first word (“genus”) | fuzzy | APC accepted taxon concepts | family | NA |
match_12i | Detect family, by checking the first word in the string | first word (“genus”) | fuzzy | other APC taxon concepts | family | NA |
The following table indicates the separate functions used to:
Different functions are used depending on the taxon rank of the aligned name and the taxonomic dataset to which the name was aligned (APC vs APNI).
function name | taxonomic dataset | taxon rank | updates to aligned name |
format of suggested_name
|
accepted name (& taxon_ID) | genus (& taxon_ID_genus) | scientific_name_ID |
---|---|---|---|---|---|---|---|
update_taxonomy_APC_genus | APC | genus | to APC accepted genus |
genus sp. [notes] *
|
no | yes | no |
update_taxonomy_APNI_genus | APNI | genus | none |
genus sp. [notes]
|
no | no | no |
update_taxonomy_APC_family | APC | family | none |
family sp. [notes]
|
no | no | no |
update_taxonomy_APC_species_and_infraspecific_taxa | APC | species & infraspecific | NA | APC accepted species** name | yes | yes | yes |
– taxonomic_splits = “most_likely_species” | NA | NA | to APC accepted taxon concept | most likely APC accepted species** name [alternative possible names] | yes | yes | yes |
– taxonomic_splits = “return_all” | NA | NA | to APC accepted taxon concept | all possible APC accepted species** name (extra rows added) | yes | yes | yes |
– taxonomic_splits = “collapse_to_higher_taxon” | NA | NA | collapsed to APC accepted genus |
genus sp. [collapsed names]
|
no | yes | no |
update_taxonomy_APNI_species_and_infraspecific_taxa | APNI | species & infraspecific | none to species name; genus to APC accepted genus if possible | APNI listed species** name* | no | sometimes | yes |
(names not aligned) | (not aligned) | (not aligned) | none | original name | no | no | no |
-* genus updated to APC accepted genus if possible; ** species or infraspecific taxon name
The following columns are output by the core function
create_taxonomic_update_lookup
and the two component
functions align_taxa
and update_taxonomy
.
variable | returned by | description |
---|---|---|
original_name | default | The original plant name. |
aligned_name | default | The input plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function. |
accepted_name | default | The APC-accepted plant name when available. |
suggested_name | default | The suggested plant name to use. Identical to the accepted_name when an accepted_name exists; otherwise the suggested_name is the aligned_name or the aligned name with an outdated genus updated. |
genus | default | The genus of the accepted (or suggested) name; only APC-accepted genus names are filled in. |
family | full | The family of the accepted (or suggested) name; only APC-accepted family names are filled in. |
taxon_rank | default | The taxonomic rank of the suggested (and accepted) name. |
taxonomic_dataset | default | The source of the suggested (and accepted) names (APC or APNI). |
taxonomic_status | full | The taxonomic status of the suggested (and accepted) name. |
aligned_reason | default | The explanation of a specific taxon name alignment (from an original name to an aligned name). |
update_reason | default | The explanation of a specific taxon name update (from an aligned name to an accepted or suggested name). |
subclass | full | The subclass of the accepted name. |
taxon_distribution | full | The distribution of the accepted name; only filled in if an APC accepted_name is available. |
scientific_name_authorship | default | The authorship information for the accepted (or synonymous) name; available for both APC and APNI names. |
taxon_ID | full | The unique taxon concept identifier for the accepted_name; only filled in if an APC accepted_name is available. |
taxon_ID_genus | full | An identifier for the genus; only filled in if an APC-accepted genus name is available. |
scientific_name_ID | full | An identifier for the nomenclatural (not taxonomic) details of a scientific name; available for both APC and APNI names. |
taxonomic_status_aligned | full | The taxonomic status of the aligned name before any taxonomic updates have been applied. |
row_number | full | The row number of a specific original_name in the input. |
number_of_collapsed_taxa | default | The number of possible taxon names that have been collapsed when taxonomic_splits == “collapse_to_higher_taxon”. |