Once you have rredlist
set
up, it’s very likely you’ll want to use it for some sort of research
workflow. This vignette gives examples of the kinds of data that users
may be interested in downloading from the IUCN API, and how they might
then use that data within a partial research workflow. I’ve given three
worked examples, but because the possibilities are pretty endless, I’ve
also cited some recent published works at the bottom of the vignette to
cover a wider variety of use cases.
Before we get started, we’ll just load some packages that we’ll need.
We’ll use {dplyr}
for handling and cleaning our data, and
we’ll use {ggplot2}
and {patchwork}
for
visualizing our data.
For our first research workflow, we are interested in the assessment
of molluscs through time. Specifically, we are interested in how many
cumulative assessments have been completed each year, and how many
cumulative species have then been assessed each year. First, we’ll need
to download an assessment summary for Mollusca. Note that while I’ve
used quiet = TRUE
here for the purposes of the vignette,
you would likely want to skip that to get a lovely progress bar that
keeps you apprised of the download’s progress since there are quite a
few pages of assessments to download.
Now that we have our assessment summary, we can summarize the data by
calculating the cumulative numbers of assessments through time. This is
a fairly basic {dplyr}
pipeline that counts the number of
assessments for each year, then orders those counts from oldest to most
recent, then calculates the cumulative sum of the counts.
moll_counts <- moll_assessments$assessments %>%
group_by(year_published) %>%
summarize(assess_cnt = n()) %>%
ungroup() %>%
mutate(year_published = as.numeric(year_published)) %>%
arrange(year_published) %>%
mutate(assess_cum = cumsum(assess_cnt))
Now that we have our cumulative sums, we can make a simple plot. We’ll save it for later to combine with our second plot.
gg1 <- ggplot(moll_counts) +
geom_line(aes(x = year_published, y = assess_cum)) +
labs(x = "Year", y = "Cumulative # of Assessments") +
theme_classic(base_size = 14)
Now let’s see how many cumulative species are assessed by
year. In order to do this, we’ll first need to filter for the first
assessment of each species using arrange()
and
slice()
. Then we can calculate the cumulative sum as we did
for the assessments above.
moll_sp_counts <- moll_assessments$assessments %>%
mutate(year_published = as.numeric(year_published)) %>%
group_by(sis_taxon_id) %>%
arrange(year_published) %>%
slice(1) %>%
ungroup() %>%
group_by(year_published) %>%
summarize(sp_cnt = n()) %>%
ungroup() %>%
mutate(year_published = as.numeric(year_published)) %>%
arrange(year_published) %>%
mutate(sp_cum = cumsum(sp_cnt))
Now that we have our species cumulative sums, we can visualize that and combine the two plots together.
gg2 <- ggplot(moll_sp_counts) +
geom_line(aes(x = year_published, y = sp_cum)) +
labs(x = "Year", y = "Cumulative # of Species Assessed") +
theme_classic(base_size = 14)
gg1 / gg2
Et voila! Now we have a fairly straightforward visualization of mollusc IUCN assessment through time, indicating that there were some booms and lulls over this interval. We also see one or two instances where the majority of the assessments were of previously assessed species instead of new species.
That first workflow only used an assessment summary, but there is so much more data available via the IUCN API if we retrieve entire assessments. Of course, these assessments include the threat status of each species, so let’s start with that. In this case, we’ll assess the overall threat status of differences orders of Australian reptiles.
First, we’ll need to figure out which Australian reptiles have been
assessed. Unfortunately, we can’t do this in a single query, so we’ll
need to get the intersection of all reptiles that have been assessed and
all Australian species that have been assessed. Let’s get the Australian
species assessment summary first. Hmm…what’s the code for Australia,
again? I can’t remember, so let’s look it up using the default behavior
of rl_countries()
which returns all country codes:
rl_countries()
#> $countries
#> en code
#> 1 Andorra AD
#> 2 United Arab Emirates AE
#> 3 Afghanistan AF
#> 4 Antigua and Barbuda AG
#> 5 Anguilla AI
#> 6 Albania AL
#> 7 Armenia AM
#> 8 Angola AO
#> 9 Antarctica AQ
#> 10 Argentina AR
#> 11 American Samoa AS
#> 12 Austria AT
#> 13 Australia AU
#> 14 Aruba AW
#> 15 Åland Islands AX
#> 16 Azerbaijan AZ
#> 17 Bosnia and Herzegovina BA
#> 18 Barbados BB
...
Ah, “AU”, of course! Now we can get an assessment summary for all
species that occur in Australia. As opposed to the first use case above,
we only care about which species have been assessed, so we’ll only
retrieve the latest assessments using latest = TRUE
.
au_assessments <- rl_countries(code = "AU", quiet = TRUE, latest = TRUE)
names(au_assessments)
#> [1] "country" "assessments" "filters"
You’ll notice that the returned assessment actually contains more than just the assessment, it includes information about the query we made (“metadata”, if you will). Let’s check that it matches what we meant to search:
au_assessments$country
#> $description
#> $description$en
#> [1] "Australia"
#>
#>
#> $code
#> [1] "AU"
au_assessments$filters
#> $latest
#> [1] "TRUE"
Yep, that looks good! Now let’s inspect the format of the assessment summary:
names(au_assessments$assessments)
#> [1] "year_published" "latest" "possibly_extinct"
#> [4] "possibly_extinct_in_the_wild" "sis_taxon_id" "url"
#> [7] "assessment_id" "code" "code_type"
#> [10] "scopes"
head(au_assessments$assessments)
#> year_published latest possibly_extinct possibly_extinct_in_the_wild sis_taxon_id
#> 1 2020 TRUE FALSE FALSE 10030
#> 2 2014 TRUE FALSE FALSE 11150
#> 3 2019 TRUE FALSE FALSE 11200
#> 4 2013 TRUE FALSE FALSE 137084
#> 5 2013 TRUE FALSE FALSE 137097
#> 6 2013 TRUE FALSE FALSE 137119
#> url assessment_id code code_type scopes
#> 1 https://www.iucnredlist.org/species/10030/495630 495630 AU country list(en ....
#> 2 https://www.iucnredlist.org/species/11150/500780 500780 AU country list(en ....
#> 3 https://www.iucnredlist.org/species/11200/500969 500969 AU country list(en ....
#> 4 https://www.iucnredlist.org/species/137084/519692 519692 AU country list(en ....
#> 5 https://www.iucnredlist.org/species/137097/519904 519904 AU country list(en ....
#> 6 https://www.iucnredlist.org/species/137119/520192 520192 AU country list(en ....
We could go ahead and download the full assessments for all Australian species now, but downloading full assessments is the slowest part of the workflow, so let’s get an assessment summary for reptiles first, then only download full assessments for Australian reptiles.
OK, now we’ve got our assessment summaries for Australian species and for reptiles. We can take those two lists of assessments and find the intersection of them to get the list of assessment IDs for Australian reptiles.
assessment_ids <- intersect(au_assessments$assessments$assessment_id,
reptilia_assessments$assessments$assessment_id)
Now that we have this list of IDs, we can download the full
assessment information. Before we download full assessments for all of
these IDs, let’s first take a look at what that looks like for the first
assessment. We do that using rl_assessment()
.
assessment <- rl_assessment(assessment_ids[1])
names(assessment)
#> [1] "assessment_date" "year_published" "latest"
#> [4] "possibly_extinct" "possibly_extinct_in_the_wild" "sis_taxon_id"
#> [7] "criteria" "url" "citation"
#> [10] "assessment_id" "assessment_points" "assessment_ranges"
#> [13] "taxon" "population_trend" "red_list_category"
#> [16] "supplementary_info" "documentation" "biogeographical_realms"
#> [19] "conservation_actions" "faos" "habitats"
#> [22] "locations" "researches" "use_and_trade"
#> [25] "threats" "credits" "errata"
#> [28] "references" "growth_forms" "lmes"
#> [31] "scopes" "stresses" "systems"
Now you can see why downloading this information will take a while; that’s a lot of data! The information available from the IUCN API spans ecology, threats, references, and more!
Now we’ll go ahead and download the full records for all of our
assessments. Note that performing this many large queries over a short
period of time may result in getting timed out by the API. If you do get
timed out, rredlist
will automatically retry the query
after a short wait time. By default, if you get timed out more than 3
times for the same query, the query will be stopped and throw an error,
forcing you to try the entire download again. However, you can customize
the settings related to these retries, including the wait time and the
number of retries. Since this is a pretty large download that we don’t
want to have to redo, we’ll increase the number of retries to 10 (from
the default of 3).
Alternatively, if we know we might get timed out because of the
download size, we can impose our own wait time between each query to
proactively prevent timeouts. IUCN recommends a wait time of 0.5 seconds
between queries to avoid timeouts. We can use Sys.sleep()
to impose such a wait before each query.
OK, now, after a bit of a wait, we have our full assessment records! Each item of this list is a single assessment record. And each assessment record is also a list, with each element representing a different kind of data in the record (e.g., habitats, taxonomy). For this workflow, we only really care about the taxonomic and threat category data, so let’s extract those elements from each list item. We’ll put all of this into a data.frame so it is easier to use.
assessment_df <- data.frame(
assessment_id = sapply(assessment_list, function(x) x$assessment_id),
species = sapply(assessment_list, function(x) x$taxon$scientific_name),
class = sapply(assessment_list, function(x) x$taxon$class),
order = sapply(assessment_list, function(x) x$taxon$order_name),
category = sapply(assessment_list, function(x) x$red_list_category$code)
)
We’ll do some very minor data cleaning and filtering. In this case, some of the assessments use old threat codes, so we’ll exclude them from further analyses. We’ll also convert the category column to an ordered factor for visualization purposes.
assessment_df <- assessment_df %>%
mutate(category = factor(category,
levels = c("DD", "LC", "NT", "VU", "EN", "CR",
"EW", "EX"),
ordered = TRUE)) %>%
filter(!is.na(category))
Now let’s make a plot! Here we are visualizing the number of reptiles
in each threat category, split out by their order (crocs vs. lizards
vs. turtles). Note that rredlist
includes a custom color
palette for use with {ggplot2}
that matches the official
IUCN threat category color scheme (scale_fill_iucn()
for
the fill aesthetic and scale_color_iucn()
for the color
aesthetic).
ggplot(assessment_df) +
geom_bar(aes(x = category, fill = category), show.legend = FALSE) +
scale_fill_iucn() +
facet_wrap(~order, ncol = 1, scales = "free_y") +
labs(x = "IUCN Category", y = "# of Species") +
theme_classic(base_size = 14)
We can do some quick data manipulation to calculate the proportions of species in each threat category instead of counts to make comparisons between the orders easier.
assessment_props <- assessment_df %>%
group_by(order) %>%
count(category, name = "cnt") %>%
mutate(prop = cnt/sum(cnt))
ggplot(assessment_props) +
geom_col(aes(y = order, x = prop, fill = category)) +
scale_fill_iucn(name = NULL, limits = rev) +
scale_x_continuous(expand = expansion()) +
labs(x = "Proportion of Species", y = "Order") +
theme_classic(base_size = 14) +
theme(legend.position = "top")
And that’s that! You’ll see that Australian turtles are all threatened, whereas Australian crocodiles and lizards tend to be not threatened.
Some taxonomic or functional groups are stored as “comprehensive groups” within the API. These often correspond to groups of taxa that groups of assessors care about. Many of these correspond to groups that are not otherwise query-able by taxonomy such as clades of plants. Let’s take a look at what comprehensive groups exist in the database:
rl_comp_groups()
#> $comprehensive_group
#> high comp name higher_name
#> 1 TRUE TRUE amphibians amphibians
#> 2 FALSE TRUE angelfishes selected_bony_fishes
#> 3 TRUE TRUE birds birds
#> 4 FALSE TRUE blennies selected_bony_fishes
#> 5 FALSE TRUE butterfly_fishes selected_bony_fishes
#> 6 FALSE TRUE cacti selected_dicots
#> 7 FALSE TRUE chameleons selected_reptiles
#> 8 FALSE TRUE cone_snails selected_gastropods
#> 9 FALSE TRUE conifers conifers
#> 10 FALSE TRUE crocodiles_and_alligators selected_reptiles
#> 11 FALSE TRUE cycads cycads
#> 12 FALSE TRUE fw_caridean_shrimps selected_crustaceans
#> 13 FALSE TRUE fw_crabs selected_crustaceans
#> 14 FALSE TRUE fw_crayfish selected_crustaceans
#> 15 FALSE TRUE groupers selected_bony_fishes
#> 16 FALSE TRUE hagfishes <NA>
#> 17 FALSE TRUE lobsters selected_crustaceans
#> 18 FALSE TRUE magnolias selected_dicots
#> 19 TRUE TRUE mammals mammals
#> 20 FALSE TRUE mangrove_plants <NA>
#> 21 FALSE TRUE marine_turtles selected_reptiles
#> 22 FALSE TRUE pufferfishes selected_bony_fishes
#> 23 FALSE TRUE reef_building_corals reef_forming_corals
#> 24 FALSE TRUE seabreams_porgies_picarels selected_bony_fishes
#> 25 FALSE TRUE seagrasses <NA>
#> 26 FALSE TRUE seasnakes selected_reptiles
#> 27 FALSE TRUE sharks_and_rays sharks_and_rays
#> 28 FALSE TRUE sturgeons selected_bony_fishes
#> 29 FALSE TRUE surgeonfishes selected_bony_fishes
#> 30 FALSE TRUE tarpons_and_ladyfishes selected_bony_fishes
#> 31 FALSE TRUE tunas_and_billfishes selected_bony_fishes
#> 32 FALSE TRUE wrasses_and_parrotfishes selected_bony_fishes
#> 33 TRUE FALSE arachnids <NA>
#> 34 TRUE FALSE brown_algae <NA>
#> 35 TRUE FALSE corals <NA>
#> 36 TRUE FALSE crustaceans <NA>
#> 37 TRUE FALSE fernes_and_allies <NA>
#> 38 TRUE FALSE fishes <NA>
#> 39 TRUE FALSE flowering_plants <NA>
#> 40 TRUE FALSE green_algae <NA>
#> 41 TRUE FALSE gymnosperms <NA>
#> 42 TRUE FALSE horseshoe_crabs <NA>
#> 43 TRUE FALSE insects <NA>
#> 44 TRUE FALSE lichens <NA>
#> 45 TRUE FALSE molluscs <NA>
#> 46 TRUE FALSE mosses <NA>
#> 47 TRUE FALSE mushrooms <NA>
#> 48 TRUE FALSE others <NA>
#> 49 TRUE FALSE red_algae <NA>
#> 50 TRUE FALSE reptiles <NA>
#> 51 TRUE FALSE velvet_worms <NA>
OK, since we have gymnosperms split into conifers and cycads, let’s compare these two groups. Let’s also introduce another type of data in assessments: habitat.
We’ll start with the cycads and download a summary of their latest
assessments. We’ll then download the full assessment records based on
that summary. Finally, we’ll extract the entire habitats
data.frame from each assessment record and add some taxonomic
information from the taxon
element of each record. Note
that for each assessment, there are one or more habitats assigned to the
species. There is also often habitat suitability data, although we’ll
ignore that for today.
cycad_assessments <- rl_comp_groups("cycads", quiet = TRUE, latest = TRUE)
cycad_data <- lapply(cycad_assessments$assessments$assessment_id,
function(x) {
Sys.sleep(0.5)
rl_assessment(x)
}
)
cycad_habitats <- lapply(cycad_data, function(x) {
x$habitats %>%
mutate(species = x$taxon$scientific_name,
class = x$taxon$class_name,
order = x$taxon$order_name)
}
) %>% bind_rows()
Now we can do the same thing for conifers:
# do the same thing for the conifers
conifer_assessments <- rl_comp_groups("conifers", quiet = TRUE, latest = TRUE)
conifer_data <- lapply(conifer_assessments$assessments$assessment_id,
function(x) {
Sys.sleep(0.5)
rl_assessment(x)
}
)
conifer_habitats <- lapply(conifer_data, function(x) {
x$habitats %>%
mutate(species = x$taxon$scientific_name,
class = x$taxon$class_name,
order = x$taxon$order_name)
}
) %>% bind_rows()
Now that we have habitat data for both clades, let’s combine them and
clean up the data a bit. We can also use rl_habitats()
without any arguments to get descriptions of the different habitats. For
visualization purposes, we’ll shorten the descriptions to anything
outside of parentheses. Also, we’ll collapse all of the habitats into
their major habitats (i.e., codes without underscores), since the
visualization would be pretty messy otherwise.
habitats <- rl_habitats()$habitats %>%
mutate(label = gsub("\\s\\(.*$", "", description$en))
gymno_habitats <- bind_rows(Cycads = cycad_habitats,
Conifers = conifer_habitats, .id = "clade") %>%
mutate(code_major = gsub("\\_[0-9]*$", "", code)) %>%
mutate(code_major = factor(as.numeric(code_major), ordered = TRUE)) %>%
mutate(habitat_major = habitats$label[match(code_major, habitats$code)])
We’ll now summarize the data, counting how many of each clade exist in each habitat. Like above, we’ll calculate proportions so we can compare between the two clades.
gymno_counts <- gymno_habitats %>%
group_by(clade) %>%
count(code_major, habitat_major, name = "cnt") %>%
mutate(prop = cnt/sum(cnt))
And now we can plot our proportions:
ggplot(gymno_counts) +
geom_col(aes(x = prop, y = code_major, fill = clade),
position = position_dodge(preserve = "single")) +
theme_classic(base_size = 14) +
scale_fill_brewer(name = NULL, palette = "Dark2", limits = rev) +
scale_x_continuous(expand = expansion()) +
scale_y_discrete(labels = function(breaks) {
habitats$label[match(breaks, habitats$code)]
}, limits = rev) +
labs(x = "Proportion of Species in Habitat", y = "Habitat") +
theme(legend.position = "top")
Tada! As you maybe expected, cycads and conifers live in fairly similar habitats, although conifers tend to live in forests slightly more than cycads, whereas some cycads live in rocky areas.
rredlist
Since I can’t cover all potential use cases of rredlist
,
here is a selection of publications that have recently used the
rredlist
as part of their research pipeline: