--- title: "The pkgmatch package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{The pkgmatch package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set ( collapse = TRUE, comment = "#>" ) ``` The "pkgmatch" package is a search and matching engine for R packages. It finds the best-matching R packages to an input of either a text description, or a local path to an R package. `pkgmatch` was developed to enable rOpenSci to identify similar packages to each new package submitted for [our software peer-review scheme](https://ropensci.org/software-review/). Matching packages can be found either in [rOpenSci's own package suite](https://ropensci.org/packages/), or all [packages currently on CRAN](https://cran.r-project.org). ## What does the package do? What the package does is best understood by example, starting with loading the package. ```{r library} library (pkgmatch) ``` Then match packages to an input string: ```{r match-text-1-fakey, eval = FALSE} input <- "genomics and transcriptomics sequence location data" pkgmatch_similar_pkgs (input, corpus = "ropensci") ``` ```{r redef-sim-pkgs1, eval = TRUE, echo = FALSE} c ("biomartr", "traits", "phylotaR", "phruta", "rebird") ``` By default, the top five matching packages are printed to the screen. The function actually returns information on all packages, along with a `head` method to display the first few rows: ```{r match-text-1-fakey-return, eval = FALSE} p <- pkgmatch_similar_pkgs (input, corpus = "ropensci") head (p) ``` ```{r match-text-1-return, eval = TRUE, echo = FALSE} data.frame ( package = c ("biomartr", "traits", "phylotaR", "phruta", "rebird"), rank = 1:5 ) ``` The `head` method also accepts an `n` parameter to control how many rows are displayed, or `as.data.frame` can be used to see the entire `data.frame` of results. The following lines find equivalent matches against all packages currently on CRAN: ```{r match-text-2-cran-fakey, eval = FALSE} pkgmatch_similar_pkgs (input, corpus = "cran") ``` ```{r redef-sim-pkgs2, eval = TRUE, echo = FALSE} c ("omicsTools", "ggalign", "omixVizR", "singleCellHaystack", "spatialGE") ``` ### Using an R package as input The package also accepts as input a path to a local R package. The following code downloads a "tarball" (`.tar.gz` file) from CRAN and finds matching packages from that corpus. We of course expect the best matches against CRAN packages to include that package itself: ```{r odbc-cran-match-fakey, eval = FALSE} u <- "https://cran.r-project.org/src/contrib/Archive/odbc/odbc_1.5.0.tar.gz" destfile <- file.path (tempdir (), basename (u)) download.file (u, destfile = destfile, quiet = TRUE) pkgmatch_similar_pkgs (destfile, corpus = "cran") ``` ```{r odbc-cran-match, echo = FALSE, eval = TRUE} c ("odbc", "RODBC", "DatabaseConnector", "dbplyr", "reticulate") ``` which they indeed do. As explained in the documentation, the `pkgmatch_similar_pkgs()` function ranks final results from [document token-frequency analyses](https://en.wikipedia.org/wiki/Okapi_BM25). The rankings from each of these components can be seen as above with the `head` method: ```{r odbc-match-head-fakey, eval = FALSE} p <- pkgmatch_similar_pkgs (destfile, corpus = "cran") head (p) ``` ```{r odbc-cran-match-head, echo = FALSE, eval = TRUE} data.frame ( package = c ("odbc", "RODBC", "DatabaseConnector", "dbplyr", "reticulate"), version = c ("1.6.4.1", "1.3-26.1", "7.1.0", "2.5.2", "1.45.0"), rank = 1:5 ) ```