NEWS

pangoling 1.0.3 (2025-04-07)

Added installed_py_pangoling() to check if required Python dependencies (transformers and torch) are installed.

changed the ownership of the repo to ropensci
deprecated functions are now defunct and have been replaced with their respective alternative functions

Added word_n argument in causal_words_pred() to indicate word order of the texts.
Allows for models with larger vocabulary than tokenizer.

Added checkpoint parameter to causal_preload() and masked_preload() to allow loading models from checkpoints.
Introduced causal_next_tokens_pred_tbl(), which replaces causal_next_tokens_tbl() and provides improved predictability calculations.
Added causal_words_pred(), causal_targets_pred(), and causal_tokens_pred_lst() to compute predictability for words, phrases, or tokens, replacing causal_lp() and causal_tokens_lp_tbl().
Introduced masked_tokens_pred_tbl(), replacing masked_tokens_tbl(), for retrieving possible tokens and their log probabilities.
Introduced masked_targets_pred(), replacing masked_lp(), for calculating predictability based on left and right context.
Introduced transformer_vocab() with an optional decode parameter to return decoded tokenized words.
New dataset df_jaeger14: Self-paced reading data on Chinese relative clauses.
New dataset df_sent: Example dataset with two word-by-word sentences.
New vignette: Added a worked-out example of a causal model.

Added sep argument in causal_words_pred() to support languages without spaces between words (e.g., Chinese).
New log.p argument across multiple functions to specify how predictability is calculated (e.g., log base e, log base 2 for bits, or raw probabilities).
Improved tokenization utilities: tokenize_lst() now supports decoded outputs via the decode parameter.
Updated install_py_pangoling() to enhance Python environment handling.
Added perplexity_calc() for computing perplexity from probabilities.

Deprecated causal_next_tokens_tbl(), causal_lp(), causal_tokens_lp_tbl(), and causal_lp_mats(). Use causal_next_tokens_pred_tbl(), causal_targets_pred(), causal_words_pred(), and causal_pred_mats() instead.
Deprecated masked_tokens_tbl() and masked_lp(). Use masked_tokens_pred_tbl() and masked_targets_pred() instead.