Package: textreuse 0.1.5
textreuse: Detect Text Reuse and Document Similarity
Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.
Authors:
textreuse_0.1.5.tar.gz
textreuse_0.1.5.zip(r-4.5)textreuse_0.1.5.zip(r-4.4)textreuse_0.1.5.zip(r-4.3)
textreuse_0.1.5.tgz(r-4.4-x86_64)textreuse_0.1.5.tgz(r-4.4-arm64)textreuse_0.1.5.tgz(r-4.3-x86_64)textreuse_0.1.5.tgz(r-4.3-arm64)
textreuse_0.1.5.tar.gz(r-4.5-noble)textreuse_0.1.5.tar.gz(r-4.4-noble)
textreuse_0.1.5.tgz(r-4.4-emscripten)textreuse_0.1.5.tgz(r-4.3-emscripten)
textreuse.pdf |textreuse.html✨
textreuse/json (API)
NEWS
# Install 'textreuse' in R: |
install.packages('textreuse', repos = c('https://ropensci.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/ropensci/textreuse/issues
Pkgdown:https://docs.ropensci.org
Last updated 5 months agofrom:9cf2568ee8 (on master). Checks:OK: 1 NOTE: 8. Indexed: yes.
Target | Result | Date |
---|---|---|
Doc / Vignettes | OK | Nov 27 2024 |
R-4.5-win-x86_64 | NOTE | Nov 27 2024 |
R-4.5-linux-x86_64 | NOTE | Nov 27 2024 |
R-4.4-win-x86_64 | NOTE | Nov 27 2024 |
R-4.4-mac-x86_64 | NOTE | Nov 27 2024 |
R-4.4-mac-aarch64 | NOTE | Nov 27 2024 |
R-4.3-win-x86_64 | NOTE | Nov 27 2024 |
R-4.3-mac-x86_64 | NOTE | Nov 27 2024 |
R-4.3-mac-aarch64 | NOTE | Nov 27 2024 |
Exports:align_localcontentcontent<-filenameshas_contenthas_hasheshas_minhasheshas_tokenshash_stringhasheshashes<-is.TextReuseCorpusis.TextReuseTextDocumentjaccard_bag_similarityjaccard_dissimilarityjaccard_similaritylshlsh_candidateslsh_comparelsh_probabilitylsh_querylsh_subsetlsh_thresholdmetameta<-minhash_generatorminhashesminhashes<-pairwise_candidatespairwise_compareratio_of_matchesrehashskippedTextReuseCorpusTextReuseTextDocumenttokenizetokenize_ngramstokenize_sentencestokenize_skip_ngramstokenize_wordstokenstokens<-wordcount
Dependencies:assertthatBHclicpp11digestdplyrfansigenericsgluelifecyclemagrittrNLPpillarpkgconfigpurrrR6RcppRcppProgressrlangstringistringrtibbletidyrtidyselectutf8vctrswithr
Introduction to the textreuse package
Rendered fromtextreuse-introduction.Rmd
usingknitr::rmarkdown
on Nov 27 2024.Last update: 2020-05-12
Started: 2015-10-22
Minhash and locality-sensitive hashing
Rendered fromtextreuse-minhash.Rmd
usingknitr::rmarkdown
on Nov 27 2024.Last update: 2015-10-31
Started: 2015-10-22
Pairwise comparisons for document similarity
Rendered fromtextreuse-pairwise.Rmd
usingknitr::rmarkdown
on Nov 27 2024.Last update: 2015-10-31
Started: 2015-10-22
Text Alignment
Rendered fromtextreuse-alignment.Rmd
usingknitr::rmarkdown
on Nov 27 2024.Last update: 2015-10-22
Started: 2015-10-22