Title: | Retrieve Data from the 1000 Plants Initiative (1KP) |
---|---|
Description: | The 1000 Plants Initiative (www.onekp.com) has sequenced the transcriptomes of over 1000 plant species. This package allows these sequences and metadata to be retrieved and filtered by code, species or recursively by clade. Scientific names and NCBI taxonomy IDs are both supported. |
Authors: | Dhakal Rijan [aut, cre], Zebulun Arendsee [aut], Zachary Foster [rev], Jessica Minnier [rev], Joel Nitta [ctb] |
Maintainer: | Dhakal Rijan <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.0 |
Built: | 2024-12-04 18:00:46 UTC |
Source: | https://github.com/ropensci/onekp |
These functions will return all files in the OneKP object of the given type
(protein or DNA FASTA files for download_peptides
and
download_nucleotides
, respectively). If you do not want to retrieve
all these files (there are over a thousand), then you should filter the
OneKP object first, using the filter_by_*
functions.
download_peptides(x, dir = file.path(tempdir(), "peptides"), absolute = FALSE) download_nucleotides( x, dir = file.path(tempdir(), "nucleotides"), absolute = FALSE )
download_peptides(x, dir = file.path(tempdir(), "peptides"), absolute = FALSE) download_nucleotides( x, dir = file.path(tempdir(), "nucleotides"), absolute = FALSE )
x |
OneKP object |
dir |
Directory in which to store the downloaded data |
absolute |
If TRUE, return absolute paths (default=FALSE) |
character vector of paths to the files that were downloaded
## Not run: data(onekp) # Filter by 1KP code (from `onekp@table$code` column) seqs <- filter_by_code(onekp, c('URDJ', 'ROAP')) # Download FASTA files to temporary directory download_peptides(seqs) download_nucleotides(seqs) ## End(Not run)
## Not run: data(onekp) # Filter by 1KP code (from `onekp@table$code` column) seqs <- filter_by_code(onekp, c('URDJ', 'ROAP')) # Download FASTA files to temporary directory download_peptides(seqs) download_nucleotides(seqs) ## End(Not run)
Filter a OneKP object
filter_by_code(x, code) filter_by_clade(x, clade) filter_by_species(x, species)
filter_by_code(x, code) filter_by_clade(x, clade) filter_by_species(x, species)
x |
OneKP object |
code |
character vector of 1KP IDs (e.g. URDJ) |
clade |
vector of clade-level NCBI taxonomy IDs or scientific names |
species |
vector of species-level scientific names or NCBI taxonomy IDs |
OneKP object
data(onekp) # filter by 1KP ID filter_by_code(onekp, c('URDJ', 'ROAP')) # filter by species name filter_by_species(onekp, 'Pinus radiata') # filter by species NCBI taxon ID filter_by_species(onekp, 3347) # filter by clade name scientific name filter_by_clade(onekp, 'Brassicaceae') # filter by clade NCBI taxon ID filter_by_clade(onekp, 3700)
data(onekp) # filter by 1KP ID filter_by_code(onekp, c('URDJ', 'ROAP')) # filter by species name filter_by_species(onekp, 'Pinus radiata') # filter by species NCBI taxon ID filter_by_species(onekp, 3347) # filter by clade name scientific name filter_by_clade(onekp, 'Brassicaceae') # filter by clade NCBI taxon ID filter_by_clade(onekp, 3700)
The object stored here should be exactly the same as the object returned
from retrieve_onekp()
. It is stored here for convenience and to save
time in examples (retrieve_onekp
takes around 30 seconds to run).
The 1000 Plants Initiative (www.onekp.com) has sequenced the transcriptomes of over 1000 plant species. This package allows these sequences and metadata to be retrieved and filtered by code, species or recursively by clade. Scientific names and NCBI taxonomy IDs are both supported.
onekp
onekp
OneKP object
retrieve_onekp
- retrieve all 1KP metadata
filter_by_code
- filter metadata by 1KP code
filter_by_clade
- filter metadata by clade
filter_by_species
- filter metadata by species
download_peptides
- get protein sequences linked to metadata
download_nucleotides
- get DNA sequences linked to metadata
Zebulun Arendsee <email: [email protected]>
Any bugs or issues can be reported at <https://github.com/ropensci/onekp/issues>
OneKP print generic function
## S3 method for class 'OneKP' print(x, ...)
## S3 method for class 'OneKP' print(x, ...)
x |
OneKP object |
... |
Additional arguments (unused) |
Download the table of metadata for each transcriptome from the 1KP website
(http://www.onekp.com/public_data.html). The metadata are wrapped into
a OneKp
S4 object. This object contains two data.frames: 1)
@table
, the main metadata table and 2) @links
a map from
resource to URL (mostly for internal use).
retrieve_onekp(add_taxids = TRUE, filter = TRUE)
retrieve_onekp(add_taxids = TRUE, filter = TRUE)
add_taxids |
If TRUE, add NCBI taxon ids for each species. This requires downloading the NCBI taxonomy database, which will require a few extra minutes the first time you run the function. This step is necessary only if you wish to filter by NCBI taxon ids. |
filter |
If TRUE, filter out entries that are associated with a single species (for example crosses or datasets pooled across a genus). If set to TRUE, then add_taxids will also be set to TRUE. |
This dataset is also saved as package data, you can access this with
data(onekp)
.
The metadata table contains the following columns:
species - species scientific name
code - 4-letter 1KP transcriptome unique identifier
family - the taxonomic family
tissue - the tissue(s) that where sequenced
peptides - the filename for the transcript proteins
nucleotides - the filename for the transcript DNA
tax_id (optional) - the species NCBI taxonomy ID
OneKP object
## Not run: # scrape data from the OneKP website kp <- retrieve_onekp() # print to see data summary kp # access the metadata table head(kp@table) ## End(Not run)
## Not run: # scrape data from the OneKP website kp <- retrieve_onekp() # print to see data summary kp # access the metadata table head(kp@table) ## End(Not run)