Package: pangoling 1.0.3

Bruno Nicenboim

pangoling: Access to Large Language Model Predictions

Provides access to word predictability estimates using large language models (LLMs) based on 'transformer' architectures via integration with the 'Hugging Face' ecosystem <https://huggingface.co/>. The package interfaces with pre-trained neural networks and supports both causal/auto-regressive LLMs (e.g., 'GPT-2') and masked/bidirectional LLMs (e.g., 'BERT') to compute the probability of words, phrases, or tokens given their linguistic context. For details on GPT-2 and causal models, see Radford et al. (2019) <https://storage.prod.researchhub.com/uploads/papers/2020/06/01/language-models.pdf>, for details on BERT and masked models, see Devlin et al. (2019) <doi:10.48550/arXiv.1810.04805>. By enabling a straightforward estimation of word predictability, the package facilitates research in psycholinguistics, computational linguistics, and natural language processing (NLP).

Authors:Bruno Nicenboim [aut, cre], Chris Emmerly [ctb], Giovanni Cassani [ctb], Lisa Levinson [rev], Utku Turk [rev]

pangoling_1.0.3.tar.gz
pangoling_1.0.3.zip(r-4.6)pangoling_1.0.3.zip(r-4.5)pangoling_1.0.3.zip(r-4.4)
pangoling_1.0.3.tgz(r-4.5-any)pangoling_1.0.3.tgz(r-4.4-any)
pangoling_1.0.3.tar.gz(r-4.6-any)pangoling_1.0.3.tar.gz(r-4.5-any)
pangoling_1.0.3.tgz(r-4.5-emscripten)pangoling_1.0.3.tgz(r-4.4-emscripten)
pangoling.pdf |pangoling.html✨
pangoling/json (API)
NEWS

# Install 'pangoling' in R:

install.packages('pangoling', repos = c('https://packages.ropensci.org', 'https://cloud.r-project.org'))

Reviews:rOpenSci Software Review #575

Bug tracker:https://github.com/ropensci/pangoling/issues

Pkgdown site:https://docs.ropensci.org

Datasets:

df_jaeger14 - Self-Paced Reading Dataset on Chinese Relative Clauses
df_sent - Example dataset: Two word-by-word sentences

On CRAN:

nlp psycholinguistics transformers

6.19 score 13 stars 10 scripts 456 downloads 24 exports 26 dependencies

Last updated 3 months ago from:7e6f2cf9dc (on main). Checks:9 OK, 2 NOTE. Indexed: yes.

Target	Result	Total time
linux-devel-x86_64	OK	174
pkgdown docs	OK	169
source / vignettes	OK	248
linux-release-x86_64	OK	177
macos-release-arm64	OK	91
macos-oldrel-arm64	NOTE	113
windows-devel	OK	228
windows-release	OK	210
windows-oldrel	NOTE	224
wasm-release	OK	140
wasm-oldrel	OK	176

Exports:causal_config causal_lp causal_lp_mats causal_next_tokens_pred_tbl causal_next_tokens_tbl causal_pred_mats causal_preload causal_targets_pred causal_tokens_lp_tbl causal_tokens_pred_lst causal_words_pred install_py_pangoling installed_py_pangoling masked_config masked_lp masked_preload masked_targets_pred masked_tokens_pred_tbl masked_tokens_tbl ntokens perplexity_calc set_cache_folder tokenize_lst transformer_vocab

Dependencies:cachem cli data.table fastmap glue here jsonlite lattice lifecycle magrittr Matrix memoise pillar png rappdirs Rcpp RcppTOML reticulate rlang rprojroot rstudioapi tidyselect tidytable utf8 vctrs withr

Troubleshooting the use of Python in R

Rendered fromtroubleshooting.Rmdusingknitr::rmarkdownon Jun 28 2025.

Last update: 2025-03-11
Started: 2025-03-11

Using a Bert model to get the predictability of words in their context

Rendered fromintro-bert.Rmdusingknitr::rmarkdownon Jun 28 2025.

Last update: 2025-03-11
Started: 2025-03-11

Using a GPT2 transformer model to get word predictability

Rendered fromintro-gpt2.Rmdusingknitr::rmarkdownon Jun 28 2025.

Last update: 2025-03-11
Started: 2025-03-11

Worked-out example: Surprisal from a causal (GPT) model as a cognitive processing bottleneck in reading

Rendered fromexample.Rmdusingknitr::rmarkdownon Jun 28 2025.

Last update: 2025-03-11
Started: 2025-03-11

Help page	Topics
Returns the configuration of a causal model	causal_config
Generate next tokens after a context and their predictability using a causal transformer model	causal_next_tokens_pred_tbl
Generate a list of predictability matrices using a causal transformer model	causal_pred_mats
Preloads a causal language model	causal_preload
Compute predictability using a causal transformer model	causal_targets_pred causal_tokens_pred_lst causal_words_pred
Self-Paced Reading Dataset on Chinese Relative Clauses	df_jaeger14
Example dataset: Two word-by-word sentences	df_sent
Install the Python packages needed for 'pangoling'	install_py_pangoling
Check if the required Python dependencies for 'pangoling' are installed	installed_py_pangoling
Returns the configuration of a masked model	masked_config
Preloads a masked language model	masked_preload
Get the predictability of a target word (or phrase) given a left and right context	masked_targets_pred
Get the possible tokens and their log probabilities for each mask in a sentence	masked_tokens_pred_tbl
The number of tokens in a string or vector of strings	ntokens
Calculates perplexity	perplexity_calc
Set cache folder for HuggingFace transformers	set_cache_folder
Tokenize an input	tokenize_lst
Returns the vocabulary of a model	transformer_vocab

Package: pangoling 1.0.3

pangoling: Access to Large Language Model Predictions

Troubleshooting the use of Python in R

Using a Bert model to get the predictability of words in their context

Using a GPT2 transformer model to get word predictability

Worked-out example: Surprisal from a causal (GPT) model as a cognitive processing bottleneck in reading

Citation

Development and contributors

Readme and manuals

Help Manual

Usage by other packages (reverse dependencies)