New function bold_identify_taxonomy()
to add taxonomic information to the output of bold_identify()
and replace bold_identify_parents()
. Instead of taking the taxon names from the bold_identify()
output, and use bold_tax_name()
to get the taxonomic ID to then pass it to bold_tax_id()
to get the parent names, we take the process ids from the bold_identify()
output and then pass them to bold_specimens()
. This has the advantages of being faster and, more importantly, making sure the correct taxonomy is returned. The function has less arguments since the filtering of the result isn't necessary anymore. Since the result now has only one line per row of input, the output is always in 'wide' format (like when using bold_identify_parents()
with wide=TRUE
). There is one new argument taxOnly
which is TRUE
by default and return only the taxonomic data. However, since bold_specimens()
also returns other data (habitat, country, image_url, etc), setting this argument to FALSE
will also join that data to the input.
New function bold_tax_id2()
which will eventually replace bold_tax_id()
. The main changes are in the format of the output. For the dataTypes
'basic', 'stats', 'images' and 'thirdparty', the output doesn't change. For the dataTypes
'sequencinglabs', 'geo' and 'depository', instead of having one (sometimes very) wide data.frame, the result is now in 'long' format, with the columns 'input', 'taxid', 'sequencinglabs|country|depository' and 'count'. For the dataTypes
'all' or when selecting more than one dataTypes, the output is a list for each data types containing their respective data.frame. When setting includeTree to TRUE
, the parents' data is rbinded to their respective data.frame. The function also check that all arguments are the correct type and that the dataTypes
chosen are valid.
The now deprecated bold_tax_id()
has the same argument checks as bold_tax_id2()
but will throw warnings instead of errors to not affect existing workflows. Also, if a chosen dataTypes
is invalid, it gets removed to not make unnecessary requests.
Similarly, the now deprecated bold_identify_parents()
has new argument checks and will throw warnings to not affect existing workflows.
For bold_tax_id2()
and bold_tax_name()
, when querying multiple taxa, if one fails, the loop won't break and will instead throw the API error as a warning. The output object will also have 2 new attributes "errors" and "params" that will let you see what errors occurred for with request and what parameters were use for the request.
To make it easy to retrieve these attributes, 3 new functions have been created:
bold_get_attr()
will return a list of the two attributesbold_get_errors()
will return a list of the errorsbold_get_params()
will return a list of parameters usedbold_specimens()
and bold_seqspec()
have a new parameter cleanData
which, when set to TRUE
, replaces empty strings ("") by NAs and strings containing only duplicated values by their unique value (ex : "COI-5P|COI-5P|COI-5P" becomes "COI-5P").
New function bold_read_trace()
to replace read_trace()
. Can read one or multiple trace files from a boldtrace
object or provided file path(s).
New function b_sepFasta()
to use after a call to bold_seqspec()
where sepFasta
wasn't set to TRUE
.
bold_trace()
functionbold_specimens()
and bold_seqspec()
can now also return partial output like bold_seq()
data.table
when possible, removed dplyr
and reshape
dependenciesstringi
instead of stringr
which removed stringr
's other dependenciesbold_seq()
, bold_seqspec()
and bold_specimen()
that if the taxon
doesn't have public records, if using another parameter will return all data for that parameter. Users can verify the availability of public records with bold_stats()
. A note was also added in bold_tax_name()
that the column 'specimenrecords' relate to the records in the taxonomy browser and not in the public data portal. (#76)bold_tax_id()
(#83). Added a line in the function to change 'depositories' to 'depository' in case people had been using that.bold_tax_name()
to double escape single quotes. Otherwise it doesn't return the data (#84, #85). Since it's related to the API, this means that the data that comes back also contains errors. So I added a function to repair the names of 'taxon', 'taxonrep' and 'parentname' in the returned object. The function is also used in pipe_params()
(which is used by bold_seq()
, bold_seqspec()
and bold_specimen()
) to repair the taxon
parameter in case users use results from previous versions.bold_seqspec()
is read (#87, #88) thanks @cjfieldsbold_stats()
documentation to specify that the record counts include all gene markers (#90).bold_seqspec()
- we now set the encoding to "UTF-8" before parsing the string to XML (#71)bold_seqspec()
fix: capture "Fatal errors" returned by BOLD servers and pass that along to the user with advice (#66)bold_seq()
and bold_seqspec()
. the marker section details that the marker parameter doesn't actually filter results that you get - but you can filter them yourself. the large requests section gives some caveats associated with large data requests and outlines how to sort it out (#61)bold_identify_parents()
(#64)sangerseqR
- instructions depend on which version of R is being used (#65) thanks @KevCaz_R_CHECK_LENGTH_1_LOGIC2_
(#57)bold_identify()
fix: ampersands needed to be escaped (#62) thanks @devonorourkevcr
to cache responses, speeds up tests significantly, and no longer relies on an internet connection (#55) (#56)bold_seq()
: sometimes on large requests, the BOLD servers time out, and give back partial output but don't indicate that there was an error. We catch this kind of error now, throw a message for the user, and the function gives back the partial output given by the server. Also added to the documentation for bold_seq()
and in the README that if you run into this problem try to do many queries that will result in smaller set of results instead of one or fewer larger queries (#52) (#53)bold_seq()
: remove return characters (\r
and \n
) from sequences (#54)bold_identify_parents()
gains many new parameters (taxid
, taxon
, tax_rank
, tax_division
, parentid
, parentname
, taxonrep
, specimenrecords
) to filter parents based on any of a number of fields - should solve problem where multiple parents found for a single taxon, often in different kingdoms (#50)bold_identify()
that the function uses lapply
internally, so queries with lots of sequences can take a long timebold_specimens()
: use rawToChar()
on raw bytes instead of parse()
from crul
(#47)crul
for HTTP requests. Only really affects users in that
specifying curl options works slightly differenlty (#42)marker
parameter in bold_seqspec
was and maybe still is not working,
in the sense that using the parameter doesn't always limit results to the
marker you specify. Not really fixed - watch out for it, and filter after you
get results back to get markers you want. (#25)bold_identify_parents
- was failing when no match for a
parent name. (#41) thx @VascoElbrechttsv
results were erroring in bold_specimens
and other fxns (#46) - fixed
by switching to new BOLD v4 API (#30)stats
and utils
- replaced
is
with inherits
(#39)bold_identify_parents()
to add taxonomic information
to the output of bold_identify()
. We take the taxon names from bold_identify
output, and use bold_tax_name
to get the taxonomic ID, passing it to
bold_tax_id
to get the parent names, then attaches those to the input data.
There are two options given what you put for the wide
parameter. If TRUE
you get data.frames of the same dimensions with parent rank name and ID
as new columns (for each name going up the hierarchy) - while if FALSE
you get a long data.frame. thanks @dougwyu for inspiring this (#36)xml2::xml_find_one
with xml2::xml_find_first
(#33)db
options in bold_identify
man file -
COX1 and COX1_SPECIES were switched (#37) thanks for pointing that out
@dougwyubold_tax_id
for when some elements returned from the BOLD
API were empty/NULL
(#32) thanks @fmichonneau !!xml2
from XML
as the XML parser for this package (#26)bold_trace()
to create dir and tar file when it doesn't
already existcontent(x, "text")
, so now using rawToChar(content(x))
,
which works (#24)sangerseqR
package now in Suggests for reading trace files, and is only used in bold_trace()
function.bold_trace()
gains two new parameters: overwrite
to choose whether to overwrite an existing
file of the same name or not, progress
to show a progress bar for downloading or not.bold_trace()
gains a print method to show a tidy summary of the trace file downloaded.bold_tax_name()
(#17) and bold_tax_id()
(#18) in which species that were missing from the BOLD database returned empty arrays but 200 status codes. Parsing those as failed attempts now. Also fixes problem in taxize in bold_search()
that use these two functions.bold_tax_name()
and bold_tax_id()
, which search for taxonomic data from BOLD using either names or BOLD identifiers, respectively. (#11)jsonlite
and reshape
.callopts
parameter changed to ...
throughout the package, so that passing on options to httr::GET
is done via named parameters, e.g., config=verbose()
. (#13)