Title: | Edit and Validate Darwin Core Taxon Data |
---|---|
Description: | Edit and validate taxonomic data in compliance with Darwin Core standards (Darwin Core 'Taxon' class <https://dwc.tdwg.org/terms/#taxon>). |
Authors: | Joel H. Nitta [aut, cre, cph] , Wataru Iwasaki [ctb] , Collin Schwantes [rev] (Collin reviewed the package (v. 1.0.0.9000) for rOpenSci, see <https://github.com/ropensci/software-review/issues/574>), Stephen Formel [rev] (Stephen reviewed the package (v. 1.0.0.9000) for rOpenSci, see <https://github.com/ropensci/software-review/issues/574>) |
Maintainer: | Joel H. Nitta <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.0.3.9001 |
Built: | 2024-12-16 03:44:09 UTC |
Source: | https://github.com/ropensci/dwctaxon |
Add one or more rows to a taxonomic database in Darwin Core (DwC) format.
dct_add_row( tax_dat, taxonID = NULL, scientificName = NULL, taxonomicStatus = NULL, acceptedNameUsageID = NULL, acceptedNameUsage = NULL, new_dat = NULL, fill_taxon_id = dct_options()$fill_taxon_id, fill_usage_id = dct_options()$fill_usage_id, taxon_id_length = dct_options()$taxon_id_length, stamp_modified = dct_options()$stamp_modified, stamp_modified_by_id = dct_options()$stamp_modified_by_id, stamp_modified_by = dct_options()$stamp_modified_by, strict = dct_options()$strict, ... )
dct_add_row( tax_dat, taxonID = NULL, scientificName = NULL, taxonomicStatus = NULL, acceptedNameUsageID = NULL, acceptedNameUsage = NULL, new_dat = NULL, fill_taxon_id = dct_options()$fill_taxon_id, fill_usage_id = dct_options()$fill_usage_id, taxon_id_length = dct_options()$taxon_id_length, stamp_modified = dct_options()$stamp_modified, stamp_modified_by_id = dct_options()$stamp_modified_by_id, stamp_modified_by = dct_options()$stamp_modified_by, strict = dct_options()$strict, ... )
tax_dat |
Dataframe; taxonomic database in DwC format. |
taxonID |
Character or numeric vector; values to add to taxonID column.
Ignored if |
scientificName |
Character vector; values to add to scientificName
column. Ignored if |
taxonomicStatus |
Character vector; values to add to taxonomicStatus
column. Ignored if |
acceptedNameUsageID |
Character or numeric vector; values to add to
acceptedNameUsageID column. Ignored if |
acceptedNameUsage |
Character vector; values to add to acceptedNameUsage
column. Ignored if |
new_dat |
A dataframe including columns corresponding to one or more of
the above arguments, except for |
fill_taxon_id |
Logical vector of length 1; if |
fill_usage_id |
Logical vector of length 1; if |
taxon_id_length |
Numeric vector of length 1; how many characters should be included in automatically generated values of taxonID? Must be between 1 and 32, inclusive. Default |
stamp_modified |
Logical vector of length 1; should the |
stamp_modified_by_id |
Logical vector of length 1; should the |
stamp_modified_by |
Logical vector of length 1; should the |
strict |
Logical vector of length 1; should taxonomic checks be run on the updated taxonomic database? Default |
... |
Additional data to add, specified as sets of named
character or numeric vectors; e.g., |
fill_taxon_id
and fill_usage_id
only act on the newly added data (they
do not fill columns in tax_dat
).
If "taxonID" is not provided for the new row and fill_taxon_id
is TRUE
,
a value for taxonID will be automatically generated from the md5 hash digest
of the scientific name.
To modify settings used for validation if strict
is TRUE
,
use dct_options()
.
Dataframe; taxonomic database in DwC format.
tibble::tibble( taxonID = "123", scientificName = "Foogenus barspecies", acceptedNameUsageID = NA_character_, taxonomicStatus = "accepted" ) |> dct_add_row( scientificName = "Foogenus barspecies var. bla", parentNameUsageID = "123", nameAccordingTo = "me", strict = TRUE )
tibble::tibble( taxonID = "123", scientificName = "Foogenus barspecies", acceptedNameUsageID = NA_character_, taxonomicStatus = "accepted" ) |> dct_add_row( scientificName = "Foogenus barspecies var. bla", parentNameUsageID = "123", nameAccordingTo = "me", strict = TRUE )
Check that values of terms like 'acceptedUsageID' map properly to taxonID in Darwin Core (DwC) taxonomic data.
dct_check_mapping( tax_dat, on_fail = dct_options()$on_fail, on_success = dct_options()$on_success, col_select = "acceptedNameUsageID", quiet = dct_options()$quiet )
dct_check_mapping( tax_dat, on_fail = dct_options()$on_fail, on_success = dct_options()$on_success, col_select = "acceptedNameUsageID", quiet = dct_options()$quiet )
tax_dat |
Dataframe; taxonomic database in DwC format. |
on_fail |
Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default |
on_success |
Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default |
col_select |
Character vector of length 1; the name of the column
(DwC term) to check. Default |
quiet |
Logical vector of length 1; should warnings be silenced? Default |
The following rules are enforced:
Value of taxonID may not be identical to that of the selected column within a single row (in other words, a name cannot be its own accepted name, parent taxon, or basionym).
Every value in the selected column must have a corresponding taxonID.
col_select
can take one of the following values:
"acceptedNameUsageID"
: taxonID corresponding to the accepted name (of
a synonym).
"parentNameUsageID"
: taxonID corresponding to the immediate parent taxon
of a name (for example, for a species, this would be the genus).
"originalNameUsageID"
: taxonID corresponding to the basionym of a name.
Depends on the result of the check and on values of on_fail
and
on_success
:
If the check passes and on_success
is "logical", return TRUE
If the check passes and on_success
is "data", return the input dataframe
If the check fails and on_fail
is "error", return an error
If the check fails and on_fail
is "summary", issue a warning and
return a dataframe with a summary of the reasons for failure
# The bad data has an acceptedNameUsageID (third row, "4") that lacks a # corresponding taxonID bad_dat <- tibble::tribble( ~taxonID, ~acceptedNameUsageID, ~taxonomicStatus, ~scientificName, "1", NA, "accepted", "Species foo", "2", "1", "synonym", "Species bar", "3", "4", "synonym", "Species bat" ) dct_check_mapping(bad_dat, on_fail = "summary", quiet = TRUE)
# The bad data has an acceptedNameUsageID (third row, "4") that lacks a # corresponding taxonID bad_dat <- tibble::tribble( ~taxonID, ~acceptedNameUsageID, ~taxonomicStatus, ~scientificName, "1", NA, "accepted", "Species foo", "2", "1", "synonym", "Species bar", "3", "4", "synonym", "Species bat" ) dct_check_mapping(bad_dat, on_fail = "summary", quiet = TRUE)
Check for correctly formatted scientificName column in Darwin Core taxonomic data.
dct_check_sci_name( tax_dat, on_fail = dct_options()$on_fail, on_success = dct_options()$on_success, quiet = dct_options()$quiet )
dct_check_sci_name( tax_dat, on_fail = dct_options()$on_fail, on_success = dct_options()$on_success, quiet = dct_options()$quiet )
tax_dat |
Dataframe; taxonomic database in DwC format. |
on_fail |
Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default |
on_success |
Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default |
quiet |
Logical vector of length 1; should warnings be silenced? Default |
The following rules are enforced:
scientificName may not be missing (NA)
scientificName must be unique
Depends on the result of the check and on values of on_fail
and
on_success
:
If the check passes and on_success
is "logical", return TRUE
If the check passes and on_success
is "data", return the input dataframe
If the check fails and on_fail
is "error", return an error
If the check fails and on_fail
is "summary", issue a warning and
return a dataframe with a summary of the reasons for failure
dct_check_sci_name( data.frame(scientificName = NA_character_), on_fail = "summary", quiet = TRUE ) dct_check_sci_name(data.frame(scientificName = "a"))
dct_check_sci_name( data.frame(scientificName = NA_character_), on_fail = "summary", quiet = TRUE ) dct_check_sci_name(data.frame(scientificName = "a"))
Check that taxonomicStatus is within valid values in Darwin Core taxonomic data
dct_check_tax_status( tax_dat, on_fail = dct_options()$on_fail, on_success = dct_options()$on_success, valid_tax_status = dct_options()$valid_tax_status, quiet = dct_options()$quiet )
dct_check_tax_status( tax_dat, on_fail = dct_options()$on_fail, on_success = dct_options()$on_success, valid_tax_status = dct_options()$valid_tax_status, quiet = dct_options()$quiet )
tax_dat |
Dataframe; taxonomic database in DwC format. |
on_fail |
Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default |
on_success |
Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default |
valid_tax_status |
Character vector of length 1; valid values for |
quiet |
Logical vector of length 1; should warnings be silenced? Default |
Depends on the result of the check and on values of on_fail
and
on_success
:
If the check passes and on_success
is "logical", return TRUE
If the check passes and on_success
is "data", return the input dataframe
If the check fails and on_fail
is "error", return an error
If the check fails and on_fail
is "summary", issue a warning and
return a dataframe with a summary of the reasons for failure
https://dwc.tdwg.org/terms/#dwc:taxonomicStatus
# The bad data has an taxonomicStatus (third row, "foo") that is not # a valid value bad_dat <- tibble::tribble( ~taxonID, ~acceptedNameUsageID, ~taxonomicStatus, ~scientificName, "1", NA, "accepted", "Species foo", "2", "1", "synonym", "Species bar", "3", NA, "foo", "Species bat" ) dct_check_tax_status(bad_dat, on_fail = "summary", quiet = TRUE) # Example of setting valid values of taxonomicStatus via dct_options() # First store existing settings, including any changes made by the user old_settings <- dct_options() # Change options for valid_tax_status dct_options(valid_tax_status = "provisionally accepted, synonym, NA") tibble::tribble( ~taxonID, ~acceptedNameUsageID, ~taxonomicStatus, ~scientificName, "1", NA, "provisionally accepted", "Species foo", "2", "1", "synonym", "Species bar", "3", NA, NA, "Strange name" ) |> dct_check_tax_status() # Reset options to those before this example was run do.call(dct_options, old_settings)
# The bad data has an taxonomicStatus (third row, "foo") that is not # a valid value bad_dat <- tibble::tribble( ~taxonID, ~acceptedNameUsageID, ~taxonomicStatus, ~scientificName, "1", NA, "accepted", "Species foo", "2", "1", "synonym", "Species bar", "3", NA, "foo", "Species bat" ) dct_check_tax_status(bad_dat, on_fail = "summary", quiet = TRUE) # Example of setting valid values of taxonomicStatus via dct_options() # First store existing settings, including any changes made by the user old_settings <- dct_options() # Change options for valid_tax_status dct_options(valid_tax_status = "provisionally accepted, synonym, NA") tibble::tribble( ~taxonID, ~acceptedNameUsageID, ~taxonomicStatus, ~scientificName, "1", NA, "provisionally accepted", "Species foo", "2", "1", "synonym", "Species bar", "3", NA, NA, "Strange name" ) |> dct_check_tax_status() # Reset options to those before this example was run do.call(dct_options, old_settings)
Check for correctly formatted taxonID column in Darwin Core taxonomic data.
dct_check_taxon_id( tax_dat, on_fail = dct_options()$on_fail, on_success = dct_options()$on_success, quiet = dct_options()$quiet )
dct_check_taxon_id( tax_dat, on_fail = dct_options()$on_fail, on_success = dct_options()$on_success, quiet = dct_options()$quiet )
tax_dat |
Dataframe; taxonomic database in DwC format. |
on_fail |
Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default |
on_success |
Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default |
quiet |
Logical vector of length 1; should warnings be silenced? Default |
The following rules are enforced:
taxonID may not be missing (NA)
taxonID must be unique
Depends on the result of the check and on values of on_fail
and
on_success
:
If the check passes and on_success
is "logical", return TRUE
If the check passes and on_success
is "data", return the input dataframe
If the check fails and on_fail
is "error", return an error
If the check fails and on_fail
is "summary", issue a warning and
return a dataframe with a summary of the reasons for failure
dct_check_taxon_id( data.frame(taxonID = NA_character_), on_fail = "summary", quiet = TRUE ) dct_check_taxon_id(data.frame(taxonID = 1))
dct_check_taxon_id( data.frame(taxonID = NA_character_), on_fail = "summary", quiet = TRUE ) dct_check_taxon_id(data.frame(taxonID = 1))
Drop one or more rows from a taxonomic database in Darwin Core (DwC) format by taxonID or scientificName.
dct_drop_row(tax_dat, taxonID = NULL, scientificName = NULL)
dct_drop_row(tax_dat, taxonID = NULL, scientificName = NULL)
tax_dat |
Dataframe; taxonomic database in DwC format. |
taxonID |
Character or numeric vector; taxonID of the row(s) to be dropped. |
scientificName |
Character vector; scientificName of the row(s) to be dropped. |
Only works if values of taxonID or scientificName are unique and non-missing in the taxonomic database (tax_dat).
Either taxonID or scientificName should be provided, but not both.
Dataframe; taxonomic database in DwC format
# Can drop rows by scientificName or taxonID dct_filmies |> dct_drop_row(scientificName = "Cephalomanes atrovirens Presl") dct_filmies |> dct_drop_row(taxonID = "54133783") # Can drop multiple rows at once by providing multiple values for # scientificName or taxonID dct_filmies |> dct_drop_row( scientificName = c( "Cephalomanes atrovirens Presl", "Trichomanes crassum Copel." ) ) dct_filmies |> dct_drop_row( taxonID = c( "54133783", "54133783" ) )
# Can drop rows by scientificName or taxonID dct_filmies |> dct_drop_row(scientificName = "Cephalomanes atrovirens Presl") dct_filmies |> dct_drop_row(taxonID = "54133783") # Can drop multiple rows at once by providing multiple values for # scientificName or taxonID dct_filmies |> dct_drop_row( scientificName = c( "Cephalomanes atrovirens Presl", "Trichomanes crassum Copel." ) ) dct_filmies |> dct_drop_row( taxonID = c( "54133783", "54133783" ) )
Fill a column in a taxonomic database in Darwin Core (DwC) format.
dct_fill_col( tax_dat, fill_to = "acceptedNameUsage", fill_from = "scientificName", match_to = "taxonID", match_from = "acceptedNameUsageID", stamp_modified = dct_options()$stamp_modified )
dct_fill_col( tax_dat, fill_to = "acceptedNameUsage", fill_from = "scientificName", match_to = "taxonID", match_from = "acceptedNameUsageID", stamp_modified = dct_options()$stamp_modified )
tax_dat |
Dataframe; taxonomic database in DwC format. |
fill_to |
Character vector of length 1; name of column to fill. If the column does not yet exist it will be created. |
fill_from |
Character vector of length 1; name of column to copy values from when filling. |
match_to |
Character vector of length 1; name of column to match to. |
match_from |
Character vector of length 1; name of column to match from. |
stamp_modified |
Logical vector of length 1; should the |
Several terms (columns) in DwC format come in pairs of "term" and "termID"; for example, "acceptedNameUsage" and "acceptedNameUsageID", where the first is the value in a human-readable form (in this case, scientific name of the accepted taxon) and the second is the value used by a machine (in this case, taxonID of the accepted taxon). Other pairs include "parentNameUsage" and "parentNameUsageID", "scientificName" and "scientificNameID", etc. None are required to be used in a given DwC dataset.
Often when updating data, the user may only fill in one value or the other
(e.g., "acceptedNameUsage" or "acceptedNameUsageID"), but not both. The
purpose of dct_fill_col()
is to fill the missing column.
match_from
and match_to
are used to locate the values used for filling
each cell. The values in the match_to
column must be unique.
The default settings are to fill acceptedNameUsage with values from scientificName by matching acceptedNameUsageID to taxonID (see Example).
When adding timestamps with stamp_modified
, any row that differs from the
original data (tax_dat
) is considered modified. This includes when a new
column is added, in which case all rows will be considered modified.
Dataframe; taxonomic database in DwC format.
# Fill acceptedNameUsage with values from scientificName by # matching acceptedNameUsageID to taxonID (head(dct_filmies, 5)) |> dct_fill_col( fill_to = "acceptedNameUsage", fill_from = "scientificName", match_to = "taxonID", match_from = "acceptedNameUsageID" )
# Fill acceptedNameUsage with values from scientificName by # matching acceptedNameUsageID to taxonID (head(dct_filmies, 5)) |> dct_fill_col( fill_to = "acceptedNameUsage", fill_from = "scientificName", match_to = "taxonID", match_from = "acceptedNameUsageID" )
Taxonomic data of filmy ferns (family Hymenophyllaceae) in Darwin Core format. Non-ASCII characters have been converted to ASCII, so some author names may not be as expected. Meant for demonstration purposes only, not formal data analysis.
dct_filmies
dct_filmies
Dataframe (tibble), with 2451 rows and 5 columns. For details about data format, see https://dwc.tdwg.org/terms/#taxon.
Modified from data downloaded from the Catalog of Life under the Creative Commons Attribution (CC BY) 4.0 license.
https://www.catalogueoflife.org/
dct_filmies
dct_filmies
Modify one or more rows in a taxonomic database in Darwin Core (DwC) format.
dct_modify_row( tax_dat, taxonID = NULL, scientificName = NULL, taxonomicStatus = NULL, acceptedNameUsageID = NULL, acceptedNameUsage = NULL, clear_usage_id = dct_options()$clear_usage_id, clear_usage_name = dct_options()$clear_usage_name, fill_usage_name = dct_options()$fill_usage_name, remap_names = dct_options()$remap_names, remap_parent = dct_options()$remap_parent, remap_variant = dct_options()$remap_variant, stamp_modified = dct_options()$stamp_modified, stamp_modified_by_id = dct_options()$stamp_modified_by_id, stamp_modified_by = dct_options()$stamp_modified_by, strict = dct_options()$strict, quiet = dct_options()$quiet, args_tbl = NULL, ... )
dct_modify_row( tax_dat, taxonID = NULL, scientificName = NULL, taxonomicStatus = NULL, acceptedNameUsageID = NULL, acceptedNameUsage = NULL, clear_usage_id = dct_options()$clear_usage_id, clear_usage_name = dct_options()$clear_usage_name, fill_usage_name = dct_options()$fill_usage_name, remap_names = dct_options()$remap_names, remap_parent = dct_options()$remap_parent, remap_variant = dct_options()$remap_variant, stamp_modified = dct_options()$stamp_modified, stamp_modified_by_id = dct_options()$stamp_modified_by_id, stamp_modified_by = dct_options()$stamp_modified_by, strict = dct_options()$strict, quiet = dct_options()$quiet, args_tbl = NULL, ... )
tax_dat |
Dataframe; taxonomic database in DwC format. |
taxonID |
Character or numeric vector of length 1; taxonID of the row to be modified (the selected row). |
scientificName |
Character vector of length 1; scientificName of the row
to be modified if |
taxonomicStatus |
Character vector of length 1; taxonomicStatus to assign to the selected row. |
acceptedNameUsageID |
Character or numeric vector of length 1; acceptedNameUsageID to assign to the selected row. |
acceptedNameUsage |
Character vector of length 1; acceptedNameUsage to assign to the selected row. |
clear_usage_id |
Logical vector of length 1; should acceptedNameUsageID of the selected row be set to |
clear_usage_name |
Logical vector of length 1; should acceptedNameUsageID of the selected row be set to |
fill_usage_name |
Logical vector of length 1; should the acceptedNameUsage of the selected row be set to the scientificName corresponding to its acceptedNameUsageID? Default |
remap_names |
Logical vector of length 1; should the acceptedNameUsageID be updated (remapped) for rows with the same acceptedNameUsageID as the taxonID of the row to be modified? Default |
remap_parent |
Logical vector of length 1; should the parentNameUsageID be updated (remapped) to that of its accepted name if it is a synonym? Will also apply to any other rows with the same parentNameUsageID as the taxonID of the row to be modified. Default |
remap_variant |
Same as |
stamp_modified |
Logical vector of length 1; should the |
stamp_modified_by_id |
Logical vector of length 1; should the |
stamp_modified_by |
Logical vector of length 1; should the |
strict |
Logical vector of length 1; should taxonomic checks be run on the updated taxonomic database? Default |
quiet |
Logical vector of length 1; should warnings be silenced? Default |
args_tbl |
A dataframe including columns corresponding to one or more of
the above arguments, except for |
... |
other DwC terms to modify, specified as sets of named values. Each element of the vector must have a name corresponding to a valid DwC term; see dct_terms. |
taxonID
is only used to identify the row(s) to modify and is not itself
modified. scientificName
can be used in the same way if taxonID
is not
provided (as long as scientificName
matches a single row). If both
taxonID
and scientificName
are provided, scientificName
will be
assigned to the scientificName of the row identified by taxonID
, replacing
any value that already exists.
acceptedNameUsageID
and acceptedNameUsage
must match existing values of
acceptedNameUsageID and acceptedNameUsage in the input data (tax_dat
). On
default settings, either can be used and the other will be filled in
automatically (fill_usage_id
and fill_usage_name
are both TRUE
).
Any other arguments provided that are DwC terms will be assigned to the selected row (i.e., they will modify the row).
If remap_names
is TRUE
(default) and acceptedNameUsageID
is provided,
any names that have an acceptedNameUsageID matching the taxonID of the
selected row (i.e., synonyms of that row) will also have their
acceptedNameUsageID replaced with the new acceptedNameUsageID. This applies
to acceptedNameUsage
as well. This behavior
is not applied to names with taxonomicStatus of "variant" by default, but can
be turned on for such names with remap_variant
.
If remap_parent
is TRUE
(default) and the parentNameUsageID
of the
selected row is a synonym, the parentNameUsageID
will be changed to
that of the accepted name (of the parent taxon). This will also apply to any
other row with the same parentNameUsageID
as the selected row. This applies
to parentNameUsage
as well.
If clear_usage_id
or clear_usage_name
is TRUE
and taxonomicStatus
includes the word "accepted", acceptedNameUsageID
or acceptedNameUsage will be set to NA respectively, regardless of the
values of acceptedNameUsageID
, acceptedNameUsage
, or fill_usage_name
.
Can either modify a single row in the input taxonomic database if each
argument is supplied as a vector of length 1, or can apply a set of changes
to the taxonomic database if the input is supplied as a dataframe via
args_tbl
.
Dataframe; taxonomic database in DwC format
# Swap the accepted / synonym status of # Cephalomanes crassum (Copel.) M. G. Price # and Trichomanes crassum Copel. dct_filmies |> dct_modify_row( scientificName = "Cephalomanes crassum (Copel.) M. G. Price", taxonomicStatus = "synonym", acceptedNameUsage = "Trichomanes crassum Copel." ) |> dct_modify_row( scientificName = "Trichomanes crassum Copel.", taxonomicStatus = "accepted" ) |> dct_validate( check_tax_status = FALSE, check_mapping_accepted_status = FALSE, check_sci_name = FALSE ) # Sometimes changing one name will affect others, if they map # to the new synonym dct_modify_row( tax_dat = dct_filmies |> head(), scientificName = "Cephalomanes crassum (Copel.) M. G. Price", taxonomicStatus = "synonym", acceptedNameUsage = "Cephalomanes densinervium (Copel.) Copel." ) # Apply a set of changes library(tibble) updates <- tibble( scientificName = c( "Cephalomanes atrovirens Presl", "Cephalomanes crassum (Copel.) M. G. Price" ), taxonomicStatus = "synonym", acceptedNameUsage = "Trichomanes crassum Copel." ) dct_filmies |> dct_modify_row(args_tbl = updates) |> dct_modify_row( scientificName = "Trichomanes crassum Copel.", taxonomicStatus = "accepted" )
# Swap the accepted / synonym status of # Cephalomanes crassum (Copel.) M. G. Price # and Trichomanes crassum Copel. dct_filmies |> dct_modify_row( scientificName = "Cephalomanes crassum (Copel.) M. G. Price", taxonomicStatus = "synonym", acceptedNameUsage = "Trichomanes crassum Copel." ) |> dct_modify_row( scientificName = "Trichomanes crassum Copel.", taxonomicStatus = "accepted" ) |> dct_validate( check_tax_status = FALSE, check_mapping_accepted_status = FALSE, check_sci_name = FALSE ) # Sometimes changing one name will affect others, if they map # to the new synonym dct_modify_row( tax_dat = dct_filmies |> head(), scientificName = "Cephalomanes crassum (Copel.) M. G. Price", taxonomicStatus = "synonym", acceptedNameUsage = "Cephalomanes densinervium (Copel.) Copel." ) # Apply a set of changes library(tibble) updates <- tibble( scientificName = c( "Cephalomanes atrovirens Presl", "Cephalomanes crassum (Copel.) M. G. Price" ), taxonomicStatus = "synonym", acceptedNameUsage = "Trichomanes crassum Copel." ) dct_filmies |> dct_modify_row(args_tbl = updates) |> dct_modify_row( scientificName = "Trichomanes crassum Copel.", taxonomicStatus = "accepted" )
Changes the default values of function arguments.
dct_options(reset = FALSE, ...)
dct_options(reset = FALSE, ...)
reset |
Logical vector of length 1; if TRUE, reset all options to their default values. |
... |
Any number of |
Use this to change the default values of function arguments. That way, you don't have to type the same thing each time you call a function.
The arguments that can be set with this function are as follows:
check_col_names
: Logical vector of length 1; should all column names be required to be a valid DwC term? Default TRUE
.
check_mapping_accepted_status
: Logical vector of length 1; should rules about mapping of variants and synonyms be enforced? Default FALSE
.
(See dct_validate()
).
check_mapping_accepted
: Logical vector of length 1; should all values of acceptedNameUsageID
be required to map to the taxonID
of an existing name? Default TRUE
.
check_mapping_original
: Logical vector of length 1; should all values of originalNameUsageID
be required to map to the taxonID
of an existing name? Default TRUE
.
check_mapping_parent
: Logical vector of length 1; should all values of parentNameUsageID
be required to map to the taxonID
of an existing name? Default TRUE
.
check_mapping_parent_accepted
: Logical vector of length 1; should all values of parentNameUsageID
be required to map to the taxonID
of an accepted name? Default FALSE
.
check_sci_name
: Logical vector of length 1; should all instances of scientificName
be required to be non-missing and unique? Default TRUE
.
check_status_diff
: Logical vector of length 1; should each scientific name be allowed to have only one taxonomic status? Default FALSE
.
check_tax_status
: Logical vector of length 1; should all taxonomic names be required to have a valid value for taxonomic status (by default, "accepted", "synonym", or "variant")? Default TRUE
.
check_taxon_id
: Logical vector of length 1; should all instances of taxonID
be required to be non-missing and unique? Default TRUE
.
extra_cols
: Character vector; names of columns that should be allowed beyond
those defined by the DwC taxon standard. Default NULL. Providing column name(s) that are valid DwC taxon column(s) has no effect.
on_fail
: Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default "error"
.
on_success
: Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default "data"
.
skip_missing_cols
: Logical vector of length 1; should checks be silently skipped if any of the
columns they inspect are missing? Default FALSE
.
valid_tax_status
: Character vector of length 1; valid values for taxonomicStatus
. Each value must be separated by a comma. Default accepted, synonym, variant, NA
. "NA"
indicates that missing (NA) values are valid. Case-sensitive.
clear_usage_id
: Logical vector of length 1; should acceptedNameUsageID of the selected row be set to NA
if the word "accepted" is detected in tax_status (not case-sensitive)? Default TRUE
.
clear_usage_name
: Logical vector of length 1; should acceptedNameUsage of the selected row be set to NA
if the word "accepted" is detected in tax_status (not case-sensitive)? Default TRUE
.
fill_taxon_id
: Logical vector of length 1; if taxon_id
is not provided, should values in the taxonID column be filled in by generating them automatically from the scientificName? If the taxonID
column does not yet exist it will be created. Default TRUE
.
fill_usage_id
: Logical vector of length 1; if usage_id
is not provided, should values in the acceptedNameUsageID column be filled in by matching acceptedNameUsage to scientificName? If the acceptedNameUsageID
column does not yet exist it will be created. Default TRUE
.
fill_usage_name
: Logical vector of length 1; should the acceptedNameUsage of the selected row be set to the scientificName corresponding to its acceptedNameUsageID? Default TRUE
.
remap_names
: Logical vector of length 1; should the acceptedNameUsageID be updated (remapped) for rows with the same acceptedNameUsageID as the taxonID of the row to be modified? Default TRUE
.
remap_parent
: Logical vector of length 1; should the parentNameUsageID be updated (remapped) to that of its accepted name if it is a synonym? Will also apply to any other rows with the same parentNameUsageID as the taxonID of the row to be modified. Default TRUE
.
remap_variant
: Same as remap_names
, but applies specifically to rows with taxonomicStatus of "variant". Default FALSE
.
stamp_modified
: Logical vector of length 1; should the modified
column of any newly created or modified row include a timestamp with the date and time of its creation/modification? If the modified
column does not yet exist it will be created. Default TRUE
.
stamp_modified_by
: Logical vector of length 1; should the modifiedBy
column of any newly created or modified row include the name of the
current user? If the modifiedBy
column does not yet exist it will be created; note that
this is a non-DWC standard column, so "modifiedBy"
is required in
extra_cols
. The current user can be specified with the user_name
option. Default FALSE
.
stamp_modified_by_id
: Logical vector of length 1; should the modifiedByID
column of any newly created or modified row include the ID of the
current user? If the modifiedByID
column does not yet exist it will be created; note that
this is a non-DWC standard column, so "modifiedByID"
is required in
extra_cols
. The current user ID can be specified with the user_id
option. Default FALSE
.
taxon_id_length
: Numeric vector of length 1; how many characters should be included in automatically generated values of taxonID? Must be between 1 and 32, inclusive. Default 32
.
quiet
: Logical vector of length 1; should warnings be silenced? Default FALSE
.
strict
: Logical vector of length 1; should taxonomic checks be run on the updated taxonomic database? Default FALSE
.
user_name
: Character vector of length 1; the name of the current user. Default ""
.
user_id
: Character vector of length 1; the ID of the current user. Default ""
.
Nothing; used for its side-effect.
# Show all options dct_options() # Store existing settings, including any changes made by the user old_settings <- dct_options() # View one option dct_options()$valid_tax_status # Change one option dct_options(valid_tax_status = "accepted, weird, whatever") dct_options()$valid_tax_status # Reset to default values dct_options(reset = TRUE) dct_options()$valid_tax_status # Multiple options may also be set at once dct_options(check_taxon_id = FALSE, check_status_diff = TRUE) # Reset options to those before this example was run do.call(dct_options, old_settings)
# Show all options dct_options() # Store existing settings, including any changes made by the user old_settings <- dct_options() # View one option dct_options()$valid_tax_status # Change one option dct_options(valid_tax_status = "accepted, weird, whatever") dct_options()$valid_tax_status # Reset to default values dct_options(reset = TRUE) dct_options()$valid_tax_status # Multiple options may also be set at once dct_options(check_taxon_id = FALSE, check_status_diff = TRUE) # Reset options to those before this example was run do.call(dct_options, old_settings)
A table of valid Darwin Core terms. Only terms in the Taxon class or at the record-level are included.
dct_terms
dct_terms
Dataframe (tibble), including two columns:
group
: Darwin Core term group; either "taxon" (terms in the Taxon class)
or "record-level" (terms that are generic in that they might apply
to any type of record in a dataset.)
term
: Darwin Core term
with two additional attributes:
retrieved
: Date the terms were obtained
url
: URL from which the terms were obtained
Modified from data downloaded from TDWG Darwin Core under the Creative Commons Attribution (CC BY) 4.0 license.
https://dwc.tdwg.org/terms/#taxon
dct_terms
dct_terms
Runs a series of automated checks on a taxonomic database in Darwin Core (DwC) format.
dct_validate( tax_dat, check_taxon_id = dct_options()$check_taxon_id, check_tax_status = dct_options()$check_tax_status, check_mapping_accepted = dct_options()$check_mapping_accepted, check_mapping_parent = dct_options()$check_mapping_parent, check_mapping_parent_accepted = dct_options()$check_mapping_parent_accepted, check_mapping_original = dct_options()$check_mapping_original, check_mapping_accepted_status = dct_options()$check_mapping_accepted_status, check_sci_name = dct_options()$check_sci_name, check_status_diff = dct_options()$check_status_diff, check_col_names = dct_options()$check_col_names, valid_tax_status = dct_options()$valid_tax_status, extra_cols = dct_options()$extra_cols, on_success = dct_options()$on_success, on_fail = dct_options()$on_fail, skip_missing_cols = dct_options()$skip_missing_cols, quiet = dct_options()$quiet )
dct_validate( tax_dat, check_taxon_id = dct_options()$check_taxon_id, check_tax_status = dct_options()$check_tax_status, check_mapping_accepted = dct_options()$check_mapping_accepted, check_mapping_parent = dct_options()$check_mapping_parent, check_mapping_parent_accepted = dct_options()$check_mapping_parent_accepted, check_mapping_original = dct_options()$check_mapping_original, check_mapping_accepted_status = dct_options()$check_mapping_accepted_status, check_sci_name = dct_options()$check_sci_name, check_status_diff = dct_options()$check_status_diff, check_col_names = dct_options()$check_col_names, valid_tax_status = dct_options()$valid_tax_status, extra_cols = dct_options()$extra_cols, on_success = dct_options()$on_success, on_fail = dct_options()$on_fail, skip_missing_cols = dct_options()$skip_missing_cols, quiet = dct_options()$quiet )
tax_dat |
Dataframe; taxonomic database in DwC format. |
check_taxon_id |
Logical vector of length 1; should all instances of |
check_tax_status |
Logical vector of length 1; should all taxonomic names be required to have a valid value for taxonomic status (by default, "accepted", "synonym", or "variant")? Default |
check_mapping_accepted |
Logical vector of length 1; should all values of |
check_mapping_parent |
Logical vector of length 1; should all values of |
check_mapping_parent_accepted |
Logical vector of length 1; should all values of |
check_mapping_original |
Logical vector of length 1; should all values of |
check_mapping_accepted_status |
Logical vector of length 1; should rules about mapping of variants and synonyms be enforced? Default |
check_sci_name |
Logical vector of length 1; should all instances of |
check_status_diff |
Logical vector of length 1; should each scientific name be allowed to have only one taxonomic status? Default |
check_col_names |
Logical vector of length 1; should all column names be required to be a valid DwC term? Default |
valid_tax_status |
Character vector of length 1; valid values for |
extra_cols |
Character vector; names of columns that should be allowed beyond those defined by the DwC taxon standard. Default NULL. Providing column name(s) that are valid DwC taxon column(s) has no effect. |
on_success |
Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default |
on_fail |
Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default |
skip_missing_cols |
Logical vector of length 1; should checks be silently skipped if any of the
columns they inspect are missing? Default |
quiet |
Logical vector of length 1; should warnings be silenced? Default |
For check_mapping_accepted_status
and check_status_diff
, "accepted",
"synonym", and "variant" are determined by string matching of
taxonomicStatus
; so "provisionally accepted" is counted as "accepted",
"ambiguous synonym" is counted as "synonym", etc. (case-sensitive).
For check_mapping_accepted_status
, the following rules are enforced:
Rows with taxonomicStatus
of "synonym" (synonyms) must have an
acceptedNameUsageID
matching the taxonID
of an accepted name
(taxonomicStatus
of "accepted")
Rows with taxonomicStatus
of "variant" (orthographic variants) must
have an acceptedNameUsageID
matching the taxonID
of an accepted name or
synonym (but not another variant)
Rows with taxonomicStatus
of "accepted" must not have any value entered
for acceptedNameUsageID
Rows with a value for acceptedNameUsageID
must have a valid value for
taxonomicStatus
.
Default settings of all arguments can be modified with dct_options()
(see
Examples).
Most columns are expected to be vectors of class character, but this is not checked for all columns. Columns (DwC terms) with names including 'ID', for example 'taxonID', may be character, numeric, or integer.
Depends on the result of the check and on values of on_fail
and
on_success
:
If the check passes and on_success
is "logical", return TRUE
If the check passes and on_success
is "data", return the input dataframe
If the check fails and on_fail
is "error", return an error
If the check fails and on_fail
is "summary", issue a warning and
return a dataframe with a summary of the reasons for failure
# The example dataset dct_filmies is already correctly formatted and passes # validation dct_validate(dct_filmies) # So make some bad data on purpose with a duplicated scientific name bad_dat <- dct_filmies bad_dat$scientificName[1] <- bad_dat$scientificName[2] # The incorrectly formatted data won't pass try( dct_validate(bad_dat) ) # It will pass if we allow duplicated scientific names though dct_validate(bad_dat, check_sci_name = FALSE) # Individual checks can also be turned or off with dct_options() # First save the current settings before making any changes old_settings <- dct_options() # Let's allow duplicated scientific names by default dct_options(check_sci_name = FALSE) # The data passes validation as before, but we don't have to specify # `check_sci_name = FALSE` in the function call dct_validate(bad_dat) # Reset options to those before this example was run do.call(dct_options, old_settings)
# The example dataset dct_filmies is already correctly formatted and passes # validation dct_validate(dct_filmies) # So make some bad data on purpose with a duplicated scientific name bad_dat <- dct_filmies bad_dat$scientificName[1] <- bad_dat$scientificName[2] # The incorrectly formatted data won't pass try( dct_validate(bad_dat) ) # It will pass if we allow duplicated scientific names though dct_validate(bad_dat, check_sci_name = FALSE) # Individual checks can also be turned or off with dct_options() # First save the current settings before making any changes old_settings <- dct_options() # Let's allow duplicated scientific names by default dct_options(check_sci_name = FALSE) # The data passes validation as before, but we don't have to specify # `check_sci_name = FALSE` in the function call dct_validate(bad_dat) # Reset options to those before this example was run do.call(dct_options, old_settings)