Package: pangoling 1.0.3

Bruno Nicenboim

pangoling: Access to Large Language Model Predictions

Provides access to word predictability estimates using large language models (LLMs) based on 'transformer' architectures via integration with the 'Hugging Face' ecosystem <https://huggingface.co/>. The package interfaces with pre-trained neural networks and supports both causal/auto-regressive LLMs (e.g., 'GPT-2') and masked/bidirectional LLMs (e.g., 'BERT') to compute the probability of words, phrases, or tokens given their linguistic context. For details on GPT-2 and causal models, see Radford et al. (2019) <https://storage.prod.researchhub.com/uploads/papers/2020/06/01/language-models.pdf>, for details on BERT and masked models, see Devlin et al. (2019) <doi:10.48550/arXiv.1810.04805>. By enabling a straightforward estimation of word predictability, the package facilitates research in psycholinguistics, computational linguistics, and natural language processing (NLP).

Authors:Bruno Nicenboim [aut, cre], Chris Emmerly [ctb], Giovanni Cassani [ctb], Lisa Levinson [rev], Utku Turk [rev]

pangoling_1.0.3.tar.gz
pangoling_1.0.3.zip(r-4.6)pangoling_1.0.3.zip(r-4.5)pangoling_1.0.3.zip(r-4.4)
pangoling_1.0.3.tgz(r-4.5-any)pangoling_1.0.3.tgz(r-4.4-any)
pangoling_1.0.3.tar.gz(r-4.6-noble)pangoling_1.0.3.tar.gz(r-4.5-noble)
pangoling_1.0.3.tgz(r-4.4-emscripten)pangoling_1.0.3.tgz(r-4.3-emscripten)
pangoling.pdf |pangoling.html
pangoling/json (API)
NEWS

# Install 'pangoling' in R:
install.packages('pangoling', repos = c('https://packages.ropensci.org', 'https://cloud.r-project.org'))

Reviews:rOpenSci Software Review #575

Bug tracker:https://github.com/ropensci/pangoling/issues

Pkgdown site:https://docs.ropensci.org

Datasets:
  • df_jaeger14 - Self-Paced Reading Dataset on Chinese Relative Clauses
  • df_sent - Example dataset: Two word-by-word sentences

On CRAN:

Conda:

nlppsycholinguisticstransformers

6.08 score 10 stars 9 scripts 355 downloads 24 exports 26 dependencies

Last updated 18 days agofrom:7e6f2cf9dc (on main). Checks:6 OK, 2 NOTE. Indexed: yes.

TargetResultLatest binary
Doc / VignettesOKApr 23 2025
R-4.6-winOKApr 23 2025
R-4.6-linuxOKApr 23 2025
R-4.5-winOKApr 23 2025
R-4.5-macOKApr 23 2025
R-4.5-linuxOKApr 23 2025
R-4.4-winNOTEApr 23 2025
R-4.4-macNOTEApr 23 2025

Exports:causal_configcausal_lpcausal_lp_matscausal_next_tokens_pred_tblcausal_next_tokens_tblcausal_pred_matscausal_preloadcausal_targets_predcausal_tokens_lp_tblcausal_tokens_pred_lstcausal_words_predinstall_py_pangolinginstalled_py_pangolingmasked_configmasked_lpmasked_preloadmasked_targets_predmasked_tokens_pred_tblmasked_tokens_tblntokensperplexity_calcset_cache_foldertokenize_lsttransformer_vocab

Dependencies:cachemclidata.tablefastmapglueherejsonlitelatticelifecyclemagrittrMatrixmemoisepillarpngrappdirsRcppRcppTOMLreticulaterlangrprojrootrstudioapitidyselecttidytableutf8vctrswithr

Troubleshooting the use of Python in R

Rendered fromtroubleshooting.Rmdusingknitr::rmarkdownon Apr 23 2025.

Last update: 2025-03-11
Started: 2025-03-11

Using a Bert model to get the predictability of words in their context

Rendered fromintro-bert.Rmdusingknitr::rmarkdownon Apr 23 2025.

Last update: 2025-03-11
Started: 2025-03-11

Using a GPT2 transformer model to get word predictability

Rendered fromintro-gpt2.Rmdusingknitr::rmarkdownon Apr 23 2025.

Last update: 2025-03-11
Started: 2025-03-11

Worked-out example: Surprisal from a causal (GPT) model as a cognitive processing bottleneck in reading

Rendered fromexample.Rmdusingknitr::rmarkdownon Apr 23 2025.

Last update: 2025-03-11
Started: 2025-03-11