Title: | Create and Query a Local Copy of 'GenBank' in R |
---|---|
Description: | Download large sections of 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> and generate a local SQL-based database. A user can then query this database using 'restez' functions or through 'rentrez' <https://CRAN.R-project.org/package=rentrez> wrappers. |
Authors: | Joel H. Nitta [aut, cre] , Dom Bennett [aut] |
Maintainer: | Joel H. Nitta <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.1.4.9000 |
Built: | 2024-11-27 03:39:29 UTC |
Source: | https://github.com/ropensci/restez |
This function is called whenever sequence files have been successfully added to the nucleotide SQL database. Row entries are added to 'add_lot.tsv' in the user's restez path containing the filename, GB release numbers and the time of successful adding. The log is to help users keep track of when sequences have been added.
add_rcrd_log(fl)
add_rcrd_log(fl)
fl |
filename, character |
Other private:
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Helper function for printing lines to console. Automatically formats lines by adding newlines.
cat_line(...)
cat_line(...)
... |
Text to print, character |
Other private:
add_rcrd_log()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Print to console green text to indicate a name/filepath/text
char(x)
char(x)
x |
Text to print, character |
coloured character encoding, character
Other private:
add_rcrd_log()
,
cat_line()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
All retrieval functions need a stable internet connection to work properly. This internal function pings the google homepage and throws an error if it cannot be reached.
check_connection()
check_connection()
Hajk-Georg Drost
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Removes all temporary test data created.
cleanup()
cleanup()
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Returns TRUE if a restez SQL database has been connected.
connected()
connected()
Logical
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Safely acquire the restez connection. Raises error if no connection set.
connection_get()
connection_get()
connection
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return the number of ids in a user's restez database.
count_db_ids(db = "nucleotide")
count_db_ids(db = "nucleotide")
db |
character, database name |
Requires an open connection. If no connection or db 0 is returned.
integer
Other database:
db_create()
,
db_delete()
,
db_download()
,
demo_db_create()
,
is_in_db()
,
list_db_ids()
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) (count_db_ids()) # delete demo after example db_delete(everything = TRUE)
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) (count_db_ids()) # delete demo after example db_delete(everything = TRUE)
Create a new local SQL database from downloaded files. Currently only GenBank/nucleotide/nuccore database is supported.
db_create( db_type = "nucleotide", min_length = 0, max_length = NULL, acc_filter = NULL, invert = FALSE, alt_restez_path = NULL, scan = FALSE )
db_create( db_type = "nucleotide", min_length = 0, max_length = NULL, acc_filter = NULL, invert = FALSE, alt_restez_path = NULL, scan = FALSE )
db_type |
character, database type |
min_length |
Minimum sequence length, default 0. |
max_length |
Maximum sequence length, default NULL. |
acc_filter |
Character vector; accessions to include or exclude from
the database as specified by |
invert |
Logical vector of length 1; if TRUE, accessions in |
alt_restez_path |
Alternative restez path if you would like to use the downloads from a different restez path. |
scan |
Logical vector of length 1; should the sequence file be scanned
for accessions in |
All .seq.gz files are added to the database by default. A user can specify
minimum/maximum sequence lengths or accession numbers to limit the sequences
to be added to the database – smaller databases are faster to search. The
final selection of sequences is the result of applying all filters
(acc_filter
, min_length
, max_length
) in combination.
The scan
option can decrease the time needed to build a database if only a
small number of sequences should be written to the database compared to the
number of the sequences downloaded from GenBank; i.e., if many of the files
downloaded from GenBank do not contain any sequences that should be written
to the database. When set to TRUE, if a file does not contain any of the
accessions in acc_filter
, further processing of that file will be skipped
and none of the sequences it contains will be added to the database.
Alternatively, a user can use the alt_restez_path
to add the files
from an alternative restez file path. For example, you may wish to have a
database of all environmental sequences but then an additional smaller one of
just the sequences with lengths below 100 bp. Instead of having to download
all environmental sequences twice, you can generate multiple restez databases
using the same downloaded files from a single restez path.
This function will not overwrite a pre-existing database. Old databases must
be deleted before a new one can be created. Use db_delete()
with
everything=FALSE to delete an SQL database.
Connections/disconnections to the database are made automatically.
Other database:
count_db_ids()
,
db_delete()
,
db_download()
,
demo_db_create()
,
is_in_db()
,
list_db_ids()
## Not run: # Example of general usage library(restez) restez_path_set(filepath = 'path/for/downloads/and/database') db_download() db_create() # Example of using `acc_filter` # # Download files to temporary directory temp_dir <- paste0(tempdir(), "/restez", collapse = "") dir.create(temp_dir) restez_path_set(filepath = temp_dir) # Choose GenBank domain 20 ('unannotated'), the smallest db_download(preselection = 20) # Only include three accessions in database db_create( acc_filter = c("AF000122", "AF000123", "AF000124") ) list_db_ids() db_delete() unlink(temp_dir) ## End(Not run)
## Not run: # Example of general usage library(restez) restez_path_set(filepath = 'path/for/downloads/and/database') db_download() db_create() # Example of using `acc_filter` # # Download files to temporary directory temp_dir <- paste0(tempdir(), "/restez", collapse = "") dir.create(temp_dir) restez_path_set(filepath = temp_dir) # Choose GenBank domain 20 ('unannotated'), the smallest db_download(preselection = 20) # Only include three accessions in database db_create( acc_filter = c("AF000122", "AF000123", "AF000124") ) list_db_ids() db_delete() unlink(temp_dir) ## End(Not run)
Delete the local SQL database and/or restez folder.
db_delete(everything = FALSE)
db_delete(everything = FALSE)
everything |
T/F, delete the whole restez folder as well? |
Any connected database will be automatically disconnected.
Other database:
count_db_ids()
,
db_create()
,
db_download()
,
demo_db_create()
,
is_in_db()
,
list_db_ids()
library(restez) fp <- tempdir() restez_path_set(filepath = fp) demo_db_create(n = 10) db_delete(everything = FALSE) # Will not run: gb_sequence_get(id = 'demo_1') # only the SQL database is deleted db_delete(everything = TRUE) # Now returns NULL (restez_path_get())
library(restez) fp <- tempdir() restez_path_set(filepath = fp) demo_db_create(n = 10) db_delete(everything = FALSE) # Will not run: gb_sequence_get(id = 'demo_1') # only the SQL database is deleted db_delete(everything = TRUE) # Now returns NULL (restez_path_get())
Download .seq.tar files from the latest GenBank release.
db_download( db = "nucleotide", overwrite = FALSE, preselection = NULL, max_tries = 1 )
db_download( db = "nucleotide", overwrite = FALSE, preselection = NULL, max_tries = 1 )
db |
Database type, only 'nucleotide' currently available. |
overwrite |
T/F, overwrite pre-existing downloaded files? |
preselection |
Character vector of length 1; GenBank domains to download. If not specified (default), a menu will be provided for selection. To specify, provide either a single number or a single character string of numbers separated by spaces, e.g. "19 20" for 'Phage' (19) and 'Unannotated' (20). |
max_tries |
Numeric vector of length 1; maximum number of times to attempt downloading database (default 1). |
In default mode, the user interactively selects the parts (i.e., "domains")
of GenBank to download (e.g. primates, plants, bacteria ...). Alternatively,
the selected domains can be provided as a character string to preselection
.
The max_tries
argument is useful for large databases that may otherwise
fail due to periodic lapses in internet connectivity. This value can be set
to Inf
to continuously try until the database download succeeds (not
recommended if you do not have an internet connection!).
T/F, if all files download correctly, TRUE else FALSE.
Other database:
count_db_ids()
,
db_create()
,
db_delete()
,
demo_db_create()
,
is_in_db()
,
list_db_ids()
## Not run: library(restez) restez_path_set(filepath = 'path/for/downloads') db_download() ## End(Not run)
## Not run: library(restez) restez_path_set(filepath = 'path/for/downloads') db_download() ## End(Not run)
Download .seq.tar files from the latest GenBank release. The
user interactively selects the parts of GenBank to download (e.g. primates,
plants, bacteria ...).
This is an internal function so the download can be wrapped in while()
to
enable persistent downloading.
db_download_intern(db = "nucleotide", overwrite = FALSE, preselection = NULL)
db_download_intern(db = "nucleotide", overwrite = FALSE, preselection = NULL)
db |
Database type, only 'nucleotide' currently available. |
overwrite |
T/F, overwrite pre-existing downloaded files? |
preselection |
Character vector of length 1; GenBank domains to download. If not specified (default), a menu will be provided for selection. To specify, provide either a single number or a single character string of numbers separated by spaces, e.g. "19 20" for 'Phage' (19) and 'Unannotated' (20). |
The downloaded files will appear in the restez filepath under downloads.
T/F, if all files download correctly, TRUE else FALSE.
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Returns the maximum and minimum sequence lengths as set by the user upon db creation.
db_sqlngths_get()
db_sqlngths_get()
If no file found, returns empty character vector.
vector of integers
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Log the min and maximum sequence length used in the created db.
db_sqlngths_log(min_lngth, max_lngth)
db_sqlngths_log(min_lngth, max_lngth)
min_lngth |
Minimum length |
max_lngth |
Maximum length |
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Creates a local mock SQL database from package test data for demonstration purposes. No internet connection required.
demo_db_create(db_type = "nucleotide", n = 100)
demo_db_create(db_type = "nucleotide", n = 100)
db_type |
character, database type |
n |
integer, number of mock sequences |
Other database:
count_db_ids()
,
db_create()
,
db_delete()
,
db_download()
,
is_in_db()
,
list_db_ids()
library(restez) # set the restez path to a temporary dir restez_path_set(filepath = tempdir()) # create demo database demo_db_create(n = 5) # in the demo, IDs are 'demo_1', 'demo_2' ... (gb_sequence_get(id = 'demo_1')) # Delete a demo database after an example db_delete(everything = TRUE)
library(restez) # set the restez path to a temporary dir restez_path_set(filepath = tempdir()) # create demo database demo_db_create(n = 5) # in the demo, IDs are 'demo_1', 'demo_2' ... (gb_sequence_get(id = 'demo_1')) # Delete a demo database after an example db_delete(everything = TRUE)
Returns the size of directory in GB
dir_size(fp)
dir_size(fp)
fp |
File path, character |
numeric
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return path to folder where raw .seq files are stored.
dwnld_path_get()
dwnld_path_get()
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
This function is called whenever a file is successfully downloaded. A row entry is added to the 'download_log.tsv' in the user's restez path containing the file name, the GB release number and the time of successfully download. The log is to help users keep track of when they downloaded files and to determine if the downloaded files are out of date.
dwnld_rcrd_log(fl)
dwnld_rcrd_log(fl)
fl |
file name, character |
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return fasta format as expected from an Entrez call. If not all IDs are returned, will run rentrez::entrez_fetch.
entrez_fasta_get(id, ...)
entrez_fasta_get(id, ...)
id |
vector, unique ID(s) for record(s) |
... |
arguments passed on to rentrez |
character string containing the file created
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Wrapper for rentrez::entrez_fetch.
entrez_fetch(db, id = NULL, rettype, retmode = "", ...)
entrez_fetch(db, id = NULL, rettype, retmode = "", ...)
db |
character, name of the database |
id |
vector, unique ID(s) for record(s) |
rettype |
character, data format |
retmode |
character, data mode |
... |
Arguments to be passed on to rentrez |
Attempts to first search local database with user-specified parameters, if the record is missing in the database, the function then calls rentrez::entrez_fetch to search GenBank remotely.
rettype='fasta'
and rettype='gb'
are respectively equivalent to
gb_fasta_get()
and gb_record_get()
.
character string containing the file created
XML retmode is not supported. Rettypes 'seqid', 'ft', 'acc' and 'uilist' are also not supported.
It is advisable to call restez and rentrez functions with '::' notation rather than library() calls to avoid namespace issues. e.g. restez::entrez_fetch().
library(restez) restez_path_set(tempdir()) demo_db_create(n = 5) # return fasta record fasta_res <- entrez_fetch(db = 'nucleotide', id = c('demo_1', 'demo_2'), rettype = 'fasta') cat(fasta_res) # return whole GB record in text format gb_res <- entrez_fetch(db = 'nucleotide', id = c('demo_1', 'demo_2'), rettype = 'gb') cat(gb_res) # NOT RUN # whereas these request would go through rentrez # fasta_res <- entrez_fetch(db = 'nucleotide', # id = c('S71333', 'S71334'), # rettype = 'fasta') # gb_res <- entrez_fetch(db = 'nucleotide', # id = c('S71333', 'S71334'), # rettype = 'gb') # delete demo after example db_delete(everything = TRUE)
library(restez) restez_path_set(tempdir()) demo_db_create(n = 5) # return fasta record fasta_res <- entrez_fetch(db = 'nucleotide', id = c('demo_1', 'demo_2'), rettype = 'fasta') cat(fasta_res) # return whole GB record in text format gb_res <- entrez_fetch(db = 'nucleotide', id = c('demo_1', 'demo_2'), rettype = 'gb') cat(gb_res) # NOT RUN # whereas these request would go through rentrez # fasta_res <- entrez_fetch(db = 'nucleotide', # id = c('S71333', 'S71334'), # rettype = 'fasta') # gb_res <- entrez_fetch(db = 'nucleotide', # id = c('S71333', 'S71334'), # rettype = 'gb') # delete demo after example db_delete(everything = TRUE)
Return gb and gbwithparts format as expected from an Entrez call. If not all IDs are returned, will run rentrez::entrez_fetch.
entrez_gb_get(id, ...)
entrez_gb_get(id, ...)
id |
vector, unique ID(s) for record(s) |
... |
arguments passed on to rentrez |
character string containing the file created
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return accession ID from GenBank record
extract_accession(record)
extract_accession(record)
record |
GenBank record in text format, character |
If element is not found, ” returned.
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Search through GenBank record for a keyword and return text up to the end_pattern.
extract_by_patterns(record, start_pattern, end_pattern = "\n")
extract_by_patterns(record, start_pattern, end_pattern = "\n")
record |
GenBank record in text format, character |
start_pattern |
REGEX pattern indicating the point to start extraction, character |
end_pattern |
REGEX pattern indicating the point to stop extraction, character |
The start_pattern should be any of the capitalized elements in a GenBank record (e.g. LOCUS, DESCRIPTION, ACCESSION). The end_pattern depends on how much of the selected element a user wants returned. By default, the extraction will stop at the next newline. If keyword or end pattern not found, returns NULL.
character or NULL
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return clean sequence from seqrecpart of a GenBank record
extract_clean_sequence(seqrecpart, max_len = 1e+08)
extract_clean_sequence(seqrecpart, max_len = 1e+08)
seqrecpart |
Sequence part of a GenBank record, character |
max_len |
Number: maximum number of characters allowed in a single record before splitting the record into parts. Does not affect output, but only internal calculations, so generally should not be changed. Default = 1e8. |
If element is not found, ” returned.
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return definition from GenBank record.
extract_definition(record)
extract_definition(record)
record |
GenBank record in text format, character |
If element is not found, ” returned.
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return feature table as list from GenBank record
extract_features(record)
extract_features(record)
record |
GenBank record in text format, character |
If element is not found, empty list returned.
list of lists
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return information part from GenBank record
extract_inforecpart(record)
extract_inforecpart(record)
record |
GenBank record in text format, character |
If element is not found, ” returned.
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return keywords as list from GenBank record
extract_keywords(record)
extract_keywords(record)
record |
GenBank record in text format, character |
If element is not found, ” returned.
character vector
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return locus information from GenBank record
extract_locus(record)
extract_locus(record)
record |
GenBank record in text format, character |
If element is not found, ” returned.
named character vector
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return organism name from GenBank record
extract_organism(record)
extract_organism(record)
record |
GenBank record in text format, character |
If element is not found, ” returned.
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return sequence part from GenBank record
extract_seqrecpart(record)
extract_seqrecpart(record)
record |
GenBank record in text format, character |
If element is not found, ” returned.
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return sequence from GenBank record
extract_sequence(record)
extract_sequence(record)
record |
GenBank record in text format, character |
If element is not found, ” returned.
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return accession + version ID from GenBank record
extract_version(record)
extract_version(record)
record |
GenBank record in text format, character |
If element is not found, ” returned.
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Download a GenBank .seq.tar file. Check the file has downloaded properly. If not, returns FALSE. If overwrite is true, any previous file will be overwritten.
file_download(fl, overwrite = FALSE)
file_download(fl, overwrite = FALSE)
fl |
character, base filename (e.g. gbpri9.seq) to be downloaded |
overwrite |
T/F |
T/F
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Record a filename in a log file along with GB release and time.
filename_log(fl, fp)
filename_log(fl, fp)
fl |
file name, character |
fp |
filepath to log file, character |
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Read records from a .seq file.
flatfile_read(flpth)
flatfile_read(flpth)
flpth |
Path to .seq file |
list of GenBank records in text format
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Given a list of seq_files, read and add the contents of the files to a SQL-like database. If any errors during the process, FALSE is returned.
gb_build( dpth, seq_files, max_length, min_length, acc_filter = NULL, invert = FALSE, scan = FALSE )
gb_build( dpth, seq_files, max_length, min_length, acc_filter = NULL, invert = FALSE, scan = FALSE )
dpth |
Download path (where seq_files are stored) |
seq_files |
.seq.tar seq file names |
max_length |
Maximum sequence length, default NULL. |
min_length |
Minimum sequence length, default 0. |
acc_filter |
Character vector; accessions to include or exclude from
the database as specified by |
invert |
Logical vector of length 1; if TRUE, accessions in |
scan |
Logical vector of length 1; should the sequence file be scanned
for accessions in |
This function will automatically connect to the restez database.
Logical
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return the definition line for an accession ID.
gb_definition_get(id)
gb_definition_get(id)
id |
character, sequence accession ID(s) |
named vector of definitions, if no results found NULL
Other get:
gb_fasta_get()
,
gb_organism_get()
,
gb_record_get()
,
gb_sequence_get()
,
gb_version_get()
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) (def <- gb_definition_get(id = 'demo_1')) (defs <- gb_definition_get(id = c('demo_1', 'demo_2'))) # delete demo after example db_delete(everything = TRUE)
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) (def <- gb_definition_get(id = 'demo_1')) (defs <- gb_definition_get(id = c('demo_1', 'demo_2'))) # delete demo after example db_delete(everything = TRUE)
Make data.frame from columns vectors for nucleotide entries. As part of gb_df_generate().
gb_df_create(accessions, versions, organisms, definitions, sequences, records)
gb_df_create(accessions, versions, organisms, definitions, sequences, records)
accessions |
character, vector of accessions |
versions |
character, vector of accessions + versions |
organisms |
character, vector of organism names |
definitions |
character, vector of sequence definitions |
sequences |
character, vector of sequences |
records |
character, vector of GenBank records in text format |
data.frame
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
For a list of records, construct a data.frame for insertion into SQL database.
gb_df_generate( records, min_length = 0, max_length = NULL, acc_filter = NULL, invert = FALSE )
gb_df_generate( records, min_length = 0, max_length = NULL, acc_filter = NULL, invert = FALSE )
records |
character, vector of GenBank records in text format |
min_length |
Minimum sequence length, default 0. |
max_length |
Maximum sequence length, default NULL. |
acc_filter |
Character vector; accessions to include or exclude from
the database as specified by |
invert |
Logical vector of length 1; if TRUE, accessions in |
The resulting data.frame has five columns: accession, organism, raw_definition, raw_sequence, raw_record. The prefix 'raw_' indicates the data has been converted to the raw format, see ?charToRaw, in order to save on RAM. The raw_record contains the entire GenBank record in text format.
Use acc_filter
and max and min sequence lengths to minimize the size of the
database. All sequences have to be at least as long as min and less than or
equal in length to max, unless max is NULL in which there is no maximum
length. The final selection of sequences is the result of applying all
filters (acc_filter
, min_length
, max_length
) in combination.
data.frame, or NULL if no records pass filters
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return elements of GenBank record e.g. sequence, definition ...
gb_extract( record, what = c("accession", "version", "organism", "sequence", "definition", "locus", "features", "keywords") )
gb_extract( record, what = c("accession", "version", "organism", "sequence", "definition", "locus", "features", "keywords") )
record |
GenBank record in text format, character |
what |
Which element to extract |
This function uses a REGEX to extract particular elements of a GenBank record. All of the what options return a single character with the exception of 'locus' or 'keywords' that return character vectors and 'features' that returns a list of lists for all features.
The accuracy of these functions cannot be guaranteed due to the enormity of the GenBank database. But the function is regularly tested on a range of GenBank records.
Note: all non-latin1 characters are converted to '-'.
character or list of lists (what='features') or named character vector (what='locus')
library(restez) data('record') (gb_extract(record = record, what = 'locus'))
library(restez) data('record') (gb_extract(record = record, what = 'locus'))
Get sequence and definition data in FASTA format. Equivalent to
rettype='fasta'
in rentrez::entrez_fetch()
.
gb_fasta_get(id, width = 70)
gb_fasta_get(id, width = 70)
id |
character, sequence accession ID(s) |
width |
integer, maximum number of characters in a line |
named vector of fasta sequences, if no results found NULL
Other get:
gb_definition_get()
,
gb_organism_get()
,
gb_record_get()
,
gb_sequence_get()
,
gb_version_get()
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) (fasta <- gb_fasta_get(id = 'demo_1')) (fastas <- gb_fasta_get(id = c('demo_1', 'demo_2'))) # delete demo after example db_delete(everything = TRUE)
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) (fasta <- gb_fasta_get(id = 'demo_1')) (fastas <- gb_fasta_get(id = c('demo_1', 'demo_2'))) # delete demo after example db_delete(everything = TRUE)
Return the organism name for an accession ID.
gb_organism_get(id)
gb_organism_get(id)
id |
character, sequence accession ID(s) |
named vector of definitions, if no results found NULL
Other get:
gb_definition_get()
,
gb_fasta_get()
,
gb_record_get()
,
gb_sequence_get()
,
gb_version_get()
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) (org <- gb_organism_get(id = 'demo_1')) (orgs <- gb_organism_get(id = c('demo_1', 'demo_2'))) # delete demo after example db_delete(everything = TRUE)
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) (org <- gb_organism_get(id = 'demo_1')) (orgs <- gb_organism_get(id = c('demo_1', 'demo_2'))) # delete demo after example db_delete(everything = TRUE)
Return the entire GenBank record for an accession ID.
Equivalent to rettype='gb'
in rentrez::entrez_fetch()
.
gb_record_get(id)
gb_record_get(id)
id |
character, sequence accession ID(s) |
named vector of records, if no results found NULL
Other get:
gb_definition_get()
,
gb_fasta_get()
,
gb_organism_get()
,
gb_sequence_get()
,
gb_version_get()
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) (rec <- gb_record_get(id = 'demo_1')) (recs <- gb_record_get(id = c('demo_1', 'demo_2'))) # delete demo after example db_delete(everything = TRUE)
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) (rec <- gb_record_get(id = 'demo_1')) (recs <- gb_record_get(id = c('demo_1', 'demo_2'))) # delete demo after example db_delete(everything = TRUE)
Return the sequence(s) for a record(s) from the accession ID(s).
gb_sequence_get(id, dnabin = FALSE)
gb_sequence_get(id, dnabin = FALSE)
id |
character, sequence accession ID(s) |
dnabin |
Logical vector of length 1; should the sequences be returned using the bit-level coding scheme of the ape package? Default FALSE. |
For more information about the dnabin
format, see ape::DNAbin()
.
named vector of sequences, if no results found NULL
Other get:
gb_definition_get()
,
gb_fasta_get()
,
gb_organism_get()
,
gb_record_get()
,
gb_version_get()
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) (seq <- gb_sequence_get(id = 'demo_1')) (seqs <- gb_sequence_get(id = c('demo_1', 'demo_2'))) (fasta_dnabin <- gb_sequence_get(id = 'demo_1', dnabin = TRUE)) # delete demo after example db_delete(everything = TRUE)
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) (seq <- gb_sequence_get(id = 'demo_1')) (seqs <- gb_sequence_get(id = c('demo_1', 'demo_2'))) (fasta_dnabin <- gb_sequence_get(id = 'demo_1', dnabin = TRUE)) # delete demo after example db_delete(everything = TRUE)
Add records data.frame to SQL-like database.
gb_sql_add(df)
gb_sql_add(df)
df |
Records data.frame |
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Generic query function for retrieving data from the SQL database for the get functions.
gb_sql_query(nm, id)
gb_sql_query(nm, id)
nm |
character, column name |
id |
character, sequence accession ID(s) |
data.frame
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return the accession version for an accession ID.
gb_version_get(id)
gb_version_get(id)
id |
character, sequence accession ID(s) |
named vector of versions, if no results found NULL
Other get:
gb_definition_get()
,
gb_fasta_get()
,
gb_organism_get()
,
gb_record_get()
,
gb_sequence_get()
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) (ver <- gb_version_get(id = 'demo_1')) (vers <- gb_version_get(id = c('demo_1', 'demo_2'))) # delete demo after example db_delete(everything = TRUE)
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) (ver <- gb_version_get(id = 'demo_1')) (vers <- gb_version_get(id = c('demo_1', 'demo_2'))) # delete demo after example db_delete(everything = TRUE)
Returns TRUE if the GenBank release number is the most recent GenBank release available.
gbrelease_check()
gbrelease_check()
logical
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Returns the GenBank release number. Returns empty character if none found.
gbrelease_get()
gbrelease_get()
If no file found, returns empty character vector.
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
This function is called whenever db_download is run. It logs the GB release number in the 'gb_release.txt' in the user's restez path. The log is to help users keep track of whether their database if out of date.
gbrelease_log(release)
gbrelease_log(release)
release |
GenBank release number, character |
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Returns TRUE if a restez SQL database has data.
has_data()
has_data()
Logical
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Searches through the release notes for a GenBank release to find all listed .seq files. Returns a data.frame for all .seq files and their description.
identify_downloadable_files()
identify_downloadable_files()
data.frame
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Determine whether an id(s) is/are present in a database.
is_in_db(id, db = "nucleotide")
is_in_db(id, db = "nucleotide")
id |
character, sequence accession ID(s) |
db |
character, database name |
named vector of booleans
Other database:
count_db_ids()
,
db_create()
,
db_delete()
,
db_download()
,
demo_db_create()
,
list_db_ids()
library(restez) # set the restez path to a temporary dir restez_path_set(filepath = tempdir()) # create demo database demo_db_create(n = 5) # in the demo, IDs are 'demo_1', 'demo_2' ... ids <- c('thisisnotanid', 'demo_1', 'demo_2') (is_in_db(id = ids)) # delete demo after example db_delete(everything = TRUE)
library(restez) # set the restez path to a temporary dir restez_path_set(filepath = tempdir()) # create demo database demo_db_create(n = 5) # in the demo, IDs are 'demo_1', 'demo_2' ... ids <- c('thisisnotanid', 'demo_1', 'demo_2') (is_in_db(id = ids)) # delete demo after example db_delete(everything = TRUE)
Return the date and time of the last added sequence as determined using the 'add_log.tsv'.
last_add_get()
last_add_get()
If no file found, returns empty character vector.
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return the date and time of the last download as determined using the 'download_log.tsv'.
last_dwnld_get()
last_dwnld_get()
If no file found, returns empty character vector.
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return the last entry from a tab-delimited log file.
last_entry_get(fp)
last_entry_get(fp)
fp |
Filepath, character |
vector
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Downloads the latest GenBank release number and returns it.
latest_genbank_release()
latest_genbank_release()
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Downloads the latest GenBank release notes to a user's restez download path.
latest_genbank_release_notes()
latest_genbank_release_notes()
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return a vector of all IDs in a database.
list_db_ids(db = "nucleotide", n = 100)
list_db_ids(db = "nucleotide", n = 100)
db |
character, database name |
n |
Maximum number of IDs to return, if NULL returns all |
Warning: can return very large vectors for large databases.
vector of characters
Other database:
count_db_ids()
,
db_create()
,
db_delete()
,
db_download()
,
demo_db_create()
,
is_in_db()
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) # Warning: not recommended for real databases # with potentially millions of IDs all_ids <- list_db_ids() # What shall we do with these IDs? # ... how about make a mock fasta file seqs <- gb_sequence_get(id = all_ids) defs <- gb_definition_get(id = all_ids) # paste together fasta_seqs <- paste0('>', defs, '\n', seqs) fasta_file <- paste0(fasta_seqs, collapse = '\n') cat(fasta_file) # delete after example db_delete(everything = TRUE)
library(restez) restez_path_set(filepath = tempdir()) demo_db_create(n = 5) # Warning: not recommended for real databases # with potentially millions of IDs all_ids <- list_db_ids() # What shall we do with these IDs? # ... how about make a mock fasta file seqs <- gb_sequence_get(id = all_ids) defs <- gb_definition_get(id = all_ids) # paste together fasta_seqs <- paste0('>', defs, '\n', seqs) fasta_file <- paste0(fasta_seqs, collapse = '\n') cat(fasta_file) # delete after example db_delete(everything = TRUE)
Sends message to console stating number of missing IDs.
message_missing(n)
message_missing(n)
n |
Number of missing IDs |
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Make a mock sequence definition. Designed to be part of a loop.
mock_def(i)
mock_def(i)
i |
integer, iterator |
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Make a mock nucleotide data.frame for entry into a demonstration SQL database.
mock_gb_df_generate(n)
mock_gb_df_generate(n)
n |
integer, number of entries |
data.frame
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Make a mock sequence organism. Designed to be part of a loop.
mock_org(i)
mock_org(i)
i |
integer, iterator |
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Create a mock GenBank record for demo-ing and testing purposes. Designed to be part of a loop. Accession, organism... etc. are optional arguments.
mock_rec( i, definition = NULL, accession = NULL, version = NULL, organism = NULL, sequence = NULL )
mock_rec( i, definition = NULL, accession = NULL, version = NULL, organism = NULL, sequence = NULL )
i |
integer, iterator |
definition |
character |
accession |
character |
version |
character |
organism |
character |
sequence |
character |
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Make a mock sequence. Designed to be part of a loop.
mock_seq(i, sqlngth = 10)
mock_seq(i, sqlngth = 10)
i |
integer, iterator |
sqlngth |
integer, sequence length |
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
The query string can be formatted using GenBank advanced query terms to obtain accession numbers corresponding to a specific set of criteria.
ncbi_acc_get(query, strict = TRUE, drop_ver = TRUE)
ncbi_acc_get(query, strict = TRUE, drop_ver = TRUE)
query |
Character vector of length 1; query string to search GenBank. |
strict |
Logical vector of length 1; should an error be issued if the number of unique accessions retrieved does not match the number of hits from GenBank? Default TRUE. |
drop_ver |
Logical vector of length 1; should the version part of the accession number (e.g., '.1' in 'AB001538.1') be dropped? Default TRUE. |
Note this queries NCBI GenBank, not the local database generated with restez.
It can be used either to restrict the accessions used to construct the local
database (acc_filter
argument of db_create()
) or to specify accessions
to read from the local database (id
argument of gb_fasta_get()
and other
gb_*_get() functions).
Character vector; accession numbers resulting from query.
## Not run: # requires an internet connection cmin_accs <- ncbi_acc_get("Crepidomanes minutum") length(cmin_accs) head(cmin_accs) ## End(Not run)
## Not run: # requires an internet connection cmin_accs <- ncbi_acc_get("Crepidomanes minutum") length(cmin_accs) head(cmin_accs) ## End(Not run)
Predicts the file sizes of the downloads and the database from the GenBank filesize information. Conversion factors are based on previous restez downloads.
predict_datasizes(uncompressed_filesize)
predict_datasizes(uncompressed_filesize)
uncompressed_filesize |
GBs of the stated filesize, numeric |
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Prints to screen the three sections of the status class. Not meant to be used interactively.
## S3 method for class 'status' print(x, ...)
## S3 method for class 'status' print(x, ...)
x |
Status object |
... |
Other arguments (not used by this function) |
Write notes for the curious sorts who peruse the restez_path.
readme_log()
readme_log()
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Example GenBank record in text format for demonstration purposes.
data("record")
data("record")
A large character object containing record information and DNA sequence.
https://www.ncbi.nlm.nih.gov/nuccore/AY952423.1
GenBank
data(record) cat(record)
data(record) cat(record)
Sets a connection to the local database.
restez_connect(read_only = FALSE)
restez_connect(read_only = FALSE)
read_only |
Logical; should the connection be made in read-only mode? Read-only mode is required for multiple R processes to access the database simultaneously. Default FALSE. |
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Safely disconnect from the restez connection
restez_disconnect()
restez_disconnect()
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Raises error if restez path does not exist.
restez_path_check()
restez_path_check()
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return filepath to where the restez database is stored.
restez_path_get()
restez_path_get()
character
Other setup:
restez_path_set()
,
restez_path_unset()
,
restez_ready()
,
restez_status()
library(restez) # set a restez path with a tempdir restez_path_set(filepath = tempdir()) # check what the set path is (restez_path_get())
library(restez) # set a restez path with a tempdir restez_path_set(filepath = tempdir()) # check what the set path is (restez_path_get())
Specify the filepath for the local GenBank database.
restez_path_set(filepath)
restez_path_set(filepath)
filepath |
character, valid filepath to the folder where the database should be stored. |
Adds 'restez_path' to options(). In this path the folder 'restez' will be created and all downloaded and database files will be stored there.
Other setup:
restez_path_get()
,
restez_path_unset()
,
restez_ready()
,
restez_status()
## Not run: library(restez) restez_path_set(filepath = 'path/to/where/you/want/files/to/download') ## End(Not run)
## Not run: library(restez) restez_path_set(filepath = 'path/to/where/you/want/files/to/download') ## End(Not run)
Set the restez path to NULL
restez_path_unset()
restez_path_unset()
Other setup:
restez_path_get()
,
restez_path_set()
,
restez_ready()
,
restez_status()
Returns TRUE if a restez SQL database is available. Use restez_status() for more information.
restez_ready()
restez_ready()
Logical
Other setup:
restez_path_get()
,
restez_path_set()
,
restez_path_unset()
,
restez_status()
library(restez) fp <- tempdir() restez_path_set(filepath = fp) demo_db_create(n = 5) (restez_ready()) db_delete(everything = TRUE) (restez_ready())
library(restez) fp <- tempdir() restez_path_set(filepath = fp) demo_db_create(n = 5) (restez_ready()) db_delete(everything = TRUE) (restez_ready())
Wrapper for base readline.
restez_rl(prompt)
restez_rl(prompt)
prompt |
character, display text |
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Report to console current setup status of restez.
restez_status(gb_check = FALSE)
restez_status(gb_check = FALSE)
gb_check |
Check whether last download was from latest GenBank release? Default FALSE. |
Set gb_check=TRUE to see if your downloads are up-to-date.
Status class
Other setup:
restez_path_get()
,
restez_path_set()
,
restez_path_unset()
,
restez_ready()
library(restez) fp <- tempdir() restez_path_set(filepath = fp) demo_db_create(n = 5) restez_status() db_delete(everything = TRUE) # Errors: # restez_status()
library(restez) fp <- tempdir() restez_path_set(filepath = fp) demo_db_create(n = 5) restez_status() db_delete(everything = TRUE) # Errors: # restez_status()
Scans a zipped file for text strings and returns TRUE if any are present.
search_gz(terms, path)
search_gz(terms, path)
terms |
Character vector; search terms (most likely GenBank accession numbers) |
path |
Path to the gzipped file to scan |
Logical
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Records the session and system information to file.
seshinfo_log()
seshinfo_log()
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Creates temporary test folders.
setup()
setup()
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Returns the selections made by the user.
slctn_get()
slctn_get()
If no file found, returns empty character vector.
character vector
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
This function is called whenever a user makes a selection with
the db_download()
. It records GenBank numbers selections.
slctn_log(selection)
slctn_log(selection)
selection |
selected GenBank sequences, named vector |
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
sql_path_get()
,
status_class()
,
stat()
,
testdatadir_get()
Return path to where SQL database is stored.
sql_path_get()
sql_path_get()
character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
status_class()
,
stat()
,
testdatadir_get()
Print to console blue text to indicate a number/statistic.
stat(...)
stat(...)
... |
Any number of text arguments to print, character |
coloured character encoding, character
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
testdatadir_get()
Creates a three-part list for holding information on the status of the restez file path.
status_class()
status_class()
Status class
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
stat()
,
testdatadir_get()
Get the folder containing test data.
testdatadir_get()
testdatadir_get()
Other private:
add_rcrd_log()
,
cat_line()
,
char()
,
check_connection()
,
cleanup()
,
connected()
,
connection_get()
,
db_download_intern()
,
db_sqlngths_get()
,
db_sqlngths_log()
,
dir_size()
,
dwnld_path_get()
,
dwnld_rcrd_log()
,
entrez_fasta_get()
,
entrez_gb_get()
,
extract_accession()
,
extract_by_patterns()
,
extract_clean_sequence()
,
extract_definition()
,
extract_features()
,
extract_inforecpart()
,
extract_keywords()
,
extract_locus()
,
extract_organism()
,
extract_seqrecpart()
,
extract_sequence()
,
extract_version()
,
file_download()
,
filename_log()
,
flatfile_read()
,
gb_build()
,
gb_df_create()
,
gb_df_generate()
,
gb_sql_add()
,
gb_sql_query()
,
gbrelease_check()
,
gbrelease_get()
,
gbrelease_log()
,
has_data()
,
identify_downloadable_files()
,
last_add_get()
,
last_dwnld_get()
,
last_entry_get()
,
latest_genbank_release_notes()
,
latest_genbank_release()
,
message_missing()
,
mock_def()
,
mock_gb_df_generate()
,
mock_org()
,
mock_rec()
,
mock_seq()
,
predict_datasizes()
,
readme_log()
,
restez_connect()
,
restez_disconnect()
,
restez_path_check()
,
restez_rl()
,
search_gz()
,
seshinfo_log()
,
setup()
,
slctn_get()
,
slctn_log()
,
sql_path_get()
,
status_class()
,
stat()