Title: | Get 'SNP' ('Single-Nucleotide' 'Polymorphism') Data on the Web |
---|---|
Description: | A programmatic interface to various 'SNP' 'datasets' on the web: 'OpenSNP' (<https://opensnp.org>), and 'NBCIs' 'dbSNP' database (<https://www.ncbi.nlm.nih.gov/projects/SNP/>). Functions are included for searching for 'NCBI'. For 'OpenSNP', functions are included for getting 'SNPs', and data for 'genotypes', 'phenotypes', annotations, and bulk downloads of data by user. |
Authors: | Julia Gustavsen [aut, cre] , Sina Rüeger [aut] , Scott Chamberlain [aut] , Kevin Ushey [aut], Hao Zhu [aut] |
Maintainer: | Julia Gustavsen <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.6.1 |
Built: | 2024-10-28 06:06:31 UTC |
Source: | https://github.com/ropensci/rsnps |
This package gives you access to data from OpenSNP (https://opensnp.org) via their API (https://opensnp.org/faq#api) and NCBI's dbSNP SNP database (https://www.ncbi.nlm.nih.gov/snp).
This applies the function ncbi_snp_query()
:
You can optionally use an API key, if you do it will allow higher rate limits (more requests per time period)
To get an API key from NCBI you can login to create a key via your account settings at https://www.ncbi.nlm.nih.gov/account/settings/
#' Note: NCBI login is via with a 3rd party account (e.g. Google, orcid, etc.). If you had an already existing NCBI account you can link it with a 3rd party login and then you can retire your old NCBI login if you haven't already), otherwise just #' create a new account.
Once you are logged on to your NCBI account settings (https://www.ncbi.nlm.nih.gov/account/settings/) you can go to the section "API Key Management"
Here you can select "Create an API Key" (which will give you up to 10 requests per second, instead of the 3 per second without the API key.).
After generating your key, set an environment variable as ENTREZ_KEY
in
.Renviron. This .Renviron file can be edited using usethis::edit_r_environ()
or by locating and creating/editing this file yourself.
ENTREZ_KEY='youractualkeynotthisstring'
Once the API is added to your .Renviron file you can then restart R for this to take effect.
You can optionally pass in your API key to the key parameter in NCBI functions in this package. However, it's much better from a security perspective to set an environment variable.
Scott Chamberlain [email protected]
Kevin Ushey [email protected]
Hao Zhu [email protected]
Sina Rüeger [email protected]
Julia Gustavsen [email protected]
Get openSNP genotype data for all users at a particular snp.
allgensnp(snp = NA, usersubset = FALSE, ...)
allgensnp(snp = NA, usersubset = FALSE, ...)
snp |
(character) A SNP name |
usersubset |
Get a subset of users, integer numbers, e.g. 1-8 (default: none) |
... |
Curl options passed on to crul::HttpClient |
data.frame of genotypes for all users at a certain SNP
Other opensnp-fxns:
allphenotypes()
,
annotations()
,
download_users()
,
fetch_genotypes()
,
genotypes()
,
phenotypes_byid()
,
phenotypes()
,
users()
## Not run: x <- allgensnp(snp = "rs7412") head(x) ## End(Not run)
## Not run: x <- allgensnp(snp = "rs7412") head(x) ## End(Not run)
Either return data.frame with all results, or output a list, then call the characteristic by id (parameter = "id") or name (parameter = "characteristic").
allphenotypes(df = FALSE, ...)
allphenotypes(df = FALSE, ...)
df |
Return a data.frame of all data. The column known_variations
can take multiple values, so the other columns id, characteristic, and
number_of_users are replicated in the data.frame. Default: |
... |
Curl options passed on to crul::HttpClient |
data.frame of results, or list if df=FALSE
Other opensnp-fxns:
allgensnp()
,
annotations()
,
download_users()
,
fetch_genotypes()
,
genotypes()
,
phenotypes_byid()
,
phenotypes()
,
users()
## Not run: # Get all data allphenotypes(df = TRUE) # Output a list, then call the characterisitc of interest by 'id' or # 'characteristic' datalist <- allphenotypes() names(datalist) # get list of all characteristics you can call datalist[["ADHD"]] # get data.frame for 'ADHD' datalist[c("mouth size", "SAT Writing")] # get data.frame for 'ADHD' ## End(Not run)
## Not run: # Get all data allphenotypes(df = TRUE) # Output a list, then call the characterisitc of interest by 'id' or # 'characteristic' datalist <- allphenotypes() names(datalist) # get list of all characteristics you can call datalist[["ADHD"]] # get data.frame for 'ADHD' datalist[c("mouth size", "SAT Writing")] # get data.frame for 'ADHD' ## End(Not run)
Either return data.frame with all results, or output a list, then call the characteristic by id (parameter = "id") or name (parameter = "characteristic").
annotations( snp = NA, output = c("all", "plos", "mendeley", "snpedia", "metadata"), ... )
annotations( snp = NA, output = c("all", "plos", "mendeley", "snpedia", "metadata"), ... )
snp |
SNP name. |
output |
Name the source or sources you want annotations from (options are: 'plos', 'mendeley', 'snpedia', 'metadata'). 'metadata' gives the metadata for the response. |
... |
Curl options passed on to crul::HttpClient |
data.frame of results
Other opensnp-fxns:
allgensnp()
,
allphenotypes()
,
download_users()
,
fetch_genotypes()
,
genotypes()
,
phenotypes_byid()
,
phenotypes()
,
users()
## Not run: # Get all data ## get just the metadata annotations(snp = "rs7903146", output = "metadata") ## just from plos annotations(snp = "rs7903146", output = "plos") ## just from snpedia annotations(snp = "rs7903146", output = "snpedia") ## get all annotations annotations(snp = "rs7903146", output = "all") ## End(Not run)
## Not run: # Get all data ## get just the metadata annotations(snp = "rs7903146", output = "metadata") ## just from plos annotations(snp = "rs7903146", output = "plos") ## just from snpedia annotations(snp = "rs7903146", output = "snpedia") ## get all annotations annotations(snp = "rs7903146", output = "all") ## End(Not run)
Download openSNP user files.
download_users(name = NULL, id = NULL, dir = "~/", ...)
download_users(name = NULL, id = NULL, dir = "~/", ...)
name |
User name |
id |
User id |
dir |
Directory to save file to |
... |
Curl options passed on to crul::HttpClient |
File downloaded to directory you specify (or default), nothing returned in R.
Other opensnp-fxns:
allgensnp()
,
allphenotypes()
,
annotations()
,
fetch_genotypes()
,
genotypes()
,
phenotypes_byid()
,
phenotypes()
,
users()
## Not run: # Download a single user file, by id download_users(id = 14) # Download a single user file, by user name download_users(name = "kevinmcc") # Download many user files lapply(c(14, 22), function(x) download_users(id = x)) read_users(id = 14, nrows = 5) ## End(Not run)
## Not run: # Download a single user file, by id download_users(id = 14) # Download a single user file, by user name download_users(name = "kevinmcc") # Download many user files lapply(c(14, 22), function(x) download_users(id = x)) read_users(id = 14, nrows = 5) ## End(Not run)
Download openSNP genotype data for a user
fetch_genotypes(url, rows = 100, filepath = NULL, quiet = TRUE, ...)
fetch_genotypes(url, rows = 100, filepath = NULL, quiet = TRUE, ...)
url |
(character) URL for the download. See example below of function use. |
rows |
(integer) Number of rows to read in. Useful for getting a glimpse of the data. Negative and other invalid values are ignored, giving back all data. Default: 100 |
filepath |
(character) If none is given the file is saved to a temporary file, which will be lost after your session is closed. Save to a file if you want to access it later. |
quiet |
(logical) Should download progress be suppressed. Default:
|
... |
Further args passed on to |
Beware, not setting the rows parameter means that you download the entire file, which can be large (e.g., 15MB), and so take a while to download depending on your connection speed. Therefore, rows is set to 10 by default to sort of protect the user.
Internally, we use download.file()
to download each file, then
read.table()
to read the file to a data.frame.
data.frame for a single user, with four columns:
rsid (character)
chromosome (integer)
position (integer)
genotype (character)
Other opensnp-fxns:
allgensnp()
,
allphenotypes()
,
annotations()
,
download_users()
,
genotypes()
,
phenotypes_byid()
,
phenotypes()
,
users()
## Not run: # get a data.frame of the users data data <- users(df = TRUE) head(data[[1]]) # users with links to genome data mydata <- fetch_genotypes( url = data[[1]][1, "genotypes.download_url"], file = "~/myfile.txt" ) # see some data right away mydata # Or read in data later separately read.table("~/myfile.txt", nrows = 10) ## End(Not run)
## Not run: # get a data.frame of the users data data <- users(df = TRUE) head(data[[1]]) # users with links to genome data mydata <- fetch_genotypes( url = data[[1]][1, "genotypes.download_url"], file = "~/myfile.txt" ) # see some data right away mydata # Or read in data later separately read.table("~/myfile.txt", nrows = 10) ## End(Not run)
Get openSNP genotype data for one or multiple users.
genotypes(snp = NA, userid = NA, df = FALSE, ...)
genotypes(snp = NA, userid = NA, df = FALSE, ...)
snp |
SNP name. |
userid |
ID of openSNP user. |
df |
Return data.frame ( |
... |
Curl options passed on to crul::HttpClient] |
List (or data.frame) of genotypes for specified user(s) at a certain SNP.
Other opensnp-fxns:
allgensnp()
,
allphenotypes()
,
annotations()
,
download_users()
,
fetch_genotypes()
,
phenotypes_byid()
,
phenotypes()
,
users()
## Not run: genotypes(snp = "rs9939609", userid = 1) genotypes("rs9939609", userid = "1,6,8", df = TRUE) genotypes("rs9939609", userid = "1-2", df = FALSE) ## End(Not run)
## Not run: genotypes(snp = "rs9939609", userid = 1) genotypes("rs9939609", userid = "1,6,8", df = TRUE) genotypes("rs9939609", userid = "1-2", df = FALSE) ## End(Not run)
Internal function to get the frequency of the variants from different studies.
get_frequency(Class, primary_info)
get_frequency(Class, primary_info)
Class |
What kind of variant is the rsid. Accepted options are "snv", "snp" and "delins". |
primary_info |
refsnp entry read in JSON format |
If multiple gene names are encountered they are collapsed with a "/".
get_gene_names(primary_info)
get_gene_names(primary_info)
primary_info |
refsnp entry read in JSON format |
Internal function to get the position, alleles, assembly, hgvs notation
get_placements(primary_info)
get_placements(primary_info)
primary_info |
refsnp entry read in JSON format |
This function queries NCBI's refSNP for information related to the latest dbSNP build and latest reference genome for information on the vector of snps submitted.
ncbi_snp_query(snps)
ncbi_snp_query(snps)
snps |
(character) A vector of SNPs (rs numbers). |
This function currently pulling data for Assembly 38 - in particular note that if you think the BP position is wrong, that you may be hoping for the BP position for a different Assembly.
Note that you are limited in the to a max of one query per second and concurrent queries are not allowed. If users want to set curl options when querying for the SNPs they can do so by using httr::set_config/httr::with_config
A dataframe with columns:
query: The rs ID that was queried.
chromosome: The chromosome that the marker lies on.
bp: The chromosomal position, in base pairs, of the marker, as aligned with the current genome used by dbSNP. we add 1 to the base pair position in the BP column in the output data.frame to agree with what the dbSNP website has.
rsid: Reference SNP cluster ID. If the rs ID queried has been merged, the up-to-date name of the ID is returned here, and a warning is issued.
class: The rsid's 'class'. See https://www.ncbi.nlm.nih.gov/projects/SNP/snp_legend.cgi?legend=snpClass for more details.
gene: If the rsid lies within a gene (either within the exon
or introns of a gene), the name of that gene is returned here; otherwise,
NA
. Note that
the gene may not be returned if the rsid lies too far upstream or downstream
of the particular gene of interest.
alleles: The alleles associated with the SNP if it is a SNV; otherwise, if it is an INDEL, microsatellite, or other kind of polymorphism the relevant information will be available here.
minor: The allele for which the MAF is computed,
given it is an SNV; otherwise, NA
.
maf: The minor allele frequency of the SNP, given it is an SNV. This is drawn from the current global reference population used by NCBI (GnomAD).
ancestral_allele: allele as described in the current assembly
variation_allele: difference to the current assembly
seqname - Chromosome RefSeq reference.
hgvs - full hgvs notation for variant
assembly - which assembly was used for the annotations
ref_seq - sequence in reference assembly
maf_population - dataframe of all minor allele frequencies reported, with columns study, reference allele, alternative allele (minor) and minor allele frequency.
https://www.ncbi.nlm.nih.gov/projects/SNP/
https://pubmed.ncbi.nlm.nih.gov/31738401/ SPDI model
## Not run: ## an example with both merged SNPs, non-SNV SNPs, regular SNPs, ## SNPs not found, microsatellite SNPs <- c("rs332", "rs420358", "rs1837253", "rs1209415715", "rs111068718") ncbi_snp_query(SNPs) # ncbi_snp_query("123456") ##invalid: must prefix with 'rs' ncbi_snp_query("rs420358") ncbi_snp_query("rs332") # warning that its merged into another, try that ncbi_snp_query("rs121909001") ncbi_snp_query("rs1837253") ncbi_snp_query("rs1209415715") ncbi_snp_query("rs111068718") ncbi_snp_query(snps = "rs9970807") ncbi_snp_query("rs121909001") ncbi_snp_query("rs121909001", verbose = TRUE) ## End(Not run)
## Not run: ## an example with both merged SNPs, non-SNV SNPs, regular SNPs, ## SNPs not found, microsatellite SNPs <- c("rs332", "rs420358", "rs1837253", "rs1209415715", "rs111068718") ncbi_snp_query(SNPs) # ncbi_snp_query("123456") ##invalid: must prefix with 'rs' ncbi_snp_query("rs420358") ncbi_snp_query("rs332") # warning that its merged into another, try that ncbi_snp_query("rs121909001") ncbi_snp_query("rs1837253") ncbi_snp_query("rs1209415715") ncbi_snp_query("rs111068718") ncbi_snp_query(snps = "rs9970807") ncbi_snp_query("rs121909001") ncbi_snp_query("rs121909001", verbose = TRUE) ## End(Not run)
Get openSNP phenotype data for one or multiple users.
phenotypes(userid = NA, df = FALSE, ...)
phenotypes(userid = NA, df = FALSE, ...)
userid |
ID of openSNP user. |
df |
Return data.frame ( |
... |
Curl options passed on to crul::HttpClient |
List of phenotypes for specified user(s).
Other opensnp-fxns:
allgensnp()
,
allphenotypes()
,
annotations()
,
download_users()
,
fetch_genotypes()
,
genotypes()
,
phenotypes_byid()
,
users()
## Not run: phenotypes(userid = 1) phenotypes(userid = "1,6,8", df = TRUE) phenotypes(userid = "1-8", df = TRUE) # coerce to data.frame library(plyr) df <- ldply(phenotypes(userid = "1-8", df = TRUE)) head(df) tail(df) # pass on curl options phenotypes(1, verbose = TRUE) ## End(Not run)
## Not run: phenotypes(userid = 1) phenotypes(userid = "1,6,8", df = TRUE) phenotypes(userid = "1-8", df = TRUE) # coerce to data.frame library(plyr) df <- ldply(phenotypes(userid = "1-8", df = TRUE)) head(df) tail(df) # pass on curl options phenotypes(1, verbose = TRUE) ## End(Not run)
Get all openSNP known variations and all users sharing that phenotype for one phenotype(-ID).
phenotypes_byid( phenotypeid = NA, return_ = c("description", "knownvars", "users"), ... )
phenotypes_byid( phenotypeid = NA, return_ = c("description", "knownvars", "users"), ... )
phenotypeid |
ID of openSNP phenotype. |
return_ |
Return data.frame ( |
... |
Curl options passed on to crul::HttpClient |
List of description of phenotype, list of known variants, or data.frame of variants for each user with that phenotype.
Other opensnp-fxns:
allgensnp()
,
allphenotypes()
,
annotations()
,
download_users()
,
fetch_genotypes()
,
genotypes()
,
phenotypes()
,
users()
## Not run: phenotypes_byid(phenotypeid = 12, return_ = "desc") phenotypes_byid(phenotypeid = 12, return_ = "knownvars") phenotypes_byid(phenotypeid = 12, return_ = "users") # pass on curl options phenotypes_byid(phenotypeid = 12, return_ = "desc", verbose = TRUE) ## End(Not run)
## Not run: phenotypes_byid(phenotypeid = 12, return_ = "desc") phenotypes_byid(phenotypeid = 12, return_ = "knownvars") phenotypes_byid(phenotypeid = 12, return_ = "users") # pass on curl options phenotypes_byid(phenotypeid = 12, return_ = "desc", verbose = TRUE) ## End(Not run)
Beware, these tables can be large. Check your RAM before executing. Or possibly read in a subset of the data. This function reads in the whole kitten kaboodle.
read_users(name = NULL, id = NULL, path = NULL, ...)
read_users(name = NULL, id = NULL, path = NULL, ...)
name |
User name |
id |
User id |
path |
Path to file to read from. |
... |
Parameters passed on to |
If you specify a name or id, this function reads environment variables written in the function download_users, and then searches against those variables for the path to the file saved. Alternatively, you can supply the path.
A data.frame.
## Not run: # dat <- read_users(name = "kevinmcc") # head(dat) # dat <- read_users(id = 285) ## End(Not run)
## Not run: # dat <- read_users(name = "kevinmcc") # head(dat) # dat <- read_users(id = 285) ## End(Not run)
For use with usethis::use_release_issue()
release_bullets()
release_bullets()
LDSearch()
: Function name changed to ld_search
ld_search()
: The Broad Institute took the service down, see
https://www.broadinstitute.org/snap/snap
NCBI_snp_query()
: Function name changed to ncbi_snp_query
NCBI_snp_query2()
: Function name changed to ncbi_snp_query
ncbi_snp_summary()
: Function name changed to ncbi_snp_query
ncbi_snp_query2()
: Function name changed to ncbi_snp_query
Get openSNP users.
users(df = FALSE, ...)
users(df = FALSE, ...)
df |
Return data.frame ( |
... |
Curl options passed on to crul::HttpClient |
List of openSNP users, their ID numbers, and XX if available.
Other opensnp-fxns:
allgensnp()
,
allphenotypes()
,
annotations()
,
download_users()
,
fetch_genotypes()
,
genotypes()
,
phenotypes_byid()
,
phenotypes()
## Not run: # just the list data <- users(df = FALSE) data # get a data.frame of the users data data <- users(df = TRUE) data[[1]] # users with links to genome data data[[2]] # users without links to genome data ## End(Not run)
## Not run: # just the list data <- users(df = FALSE) data # get a data.frame of the users data data <- users(df = TRUE) data[[1]] # users with links to genome data data[[2]] # users without links to genome data ## End(Not run)