Ecological Metadata Language (EML) is a “comprehensive vocabulary and
a readable XML markup syntax for documenting research data” (Jones et al. 2019).
Supplying metadata in this format is a requirement of several
data-sharing institutions, notably including the Global Biodiversity
Information Facility (GBIF) and its
partner nodes. While the EML package already
exists to support building and manipulating EML within the workspace,
there is no comparable system for writing these documents in a more
human-readable format, such as R Markdown or Quarto markdown.
delma
provides this capability.
The default method for using delma
is to:
use_metadata_template()
read_md()
write_eml()
EML is quite stringent in the types of data that are allowed, as well
as their order and placement in the hierarchy. Getting these points
right can be challenging, and we suggest using
check_metadata()
in combination with the EML schema documentation to
get it right.
There are several points that are unusual about Rmarkdown documents
formatted using delma
, which we discuss below
Create a metadata statement template using
use_metadata_template()
Header levels in markdown-formatted metadata statements determine the nested structure of the resulting xml. For example, this text in markdown…
# EML
## Dataset
### Title
My title
…parses to this in xml.
<EML>
<dataset>
<title>My title</title>
</dataset>
<EML>
Attributes can be added to a particular EML tag by including them in
a list
within a code block, the label
of which
is used by delma
to link tags to their attributes. To add
attributes to the userId
field, for example, you would add
the following code under the ## userId
heading:
The include: false
tag is added so that this content
isn’t displayed when the document is knit.
Every EML document must open with the tag eml
, and the
attributes of that element must contain a unique
identifier in the packageId
field, as well as a link to the
system within which that key is unique. A logical example might be a
DOI:
```{r}
#| label: 'eml'
#| include: false
list(packageId = "https://doi.org/10.32614/CRAN.package.galah",
system = "https://doi.org")
```
A valid alternative might be a GitHub release:
```{r}
#| label: 'eml'
#| include: false
list(packageId = "https://github.com/AtlasOfLivingAustralia/galah-R/releases/download/v2.1.0/galah_2.1.0.tar.gz",
system = "https://github.com")
```
Note that the eml
tag is unusual in delma
in that it is added automatically if not supplied. Where this occurs,
all tag levels are also incremented by one to account for this
change.
delma
will call rmarkdown::render()
internally whenever read_md()
or
render_metadata()
is used, meaning that it is possible to
add dynamic content to your metadata statements. The boilerplate
statement that ships with delma uses this feature to automatically
populate the Title
and Pubdate
fields from the
YAML
section, for example:
```{r}
#| echo: false
#| results: 'asis'
# NOTE: This is set using the yaml above; do not edit by hand
cat(rmarkdown::metadata$date)
```
You could also implement dynamic content using inline code. For example:
Internally, delma
uses the lightparser
package to convert markdown files to tibble
s, and the xml2 package to
convert list
s to xml and write them to file. Between these
two packages, we have written functions to convert between tibble and
list versions of EML-formatted markdown.
Under the hood, read_md()
, which does a few things:
rmarkdown::render()
on your chosen file, meaning
any code blocks or inline code is executed properly.label
matches an EML tag to
the attributes
of that EML tag; this allows quite complex
attribute addition without affecting rendered text.This tibble can then be rendered to EML using
write_eml()
. The inverse operation is accomplished by
calling read_eml()
followed by write_md()
.