| Title: | Google's Compact Language Detector 3 |
|---|---|
| Description: | Google's Compact Language Detector 3 is a neural network model for language identification and the successor of 'cld2' (available from CRAN). The algorithm is still experimental and takes a novel approach to language detection with different properties and outcomes. It can be useful to combine this with the Bayesian classifier results from 'cld2'. See <https://github.com/google/cld3#readme> for more information. |
| Authors: | Jeroen Ooms [aut, cre] (ORCID: <https://orcid.org/0000-0002-4035-0289>), Google Inc [cph] (CLD3 C++ library) |
| Maintainer: | Jeroen Ooms <[email protected]> |
| License: | Apache License 2.0 |
| Version: | 1.6.1 |
| Built: | 2025-10-01 06:18:46 UTC |
| Source: | https://github.com/ropensci/cld3 |
The function detect_language() is vectorised and guesses the the language of each string
in text or returns NA if the language could not reliably be determined. The function
detect_language_multi() is not vectorised and detects all languages inside the entire
character vector as a whole.
detect_language(text) detect_language_mixed(text, size = 3)detect_language(text) detect_language_mixed(text, size = 3)
text |
a string with text to classify or a connection to read from |
size |
number of languages to detect |
# Vectorized best guess text <- c("To be or not to be?", "Ce n'est pas grave.", "Hij heeft de klok horen luiden maar weet niet waar de klepel hangt.") detect_language(text) # Multiple languages in one text (doesn't seem to work well) detect_language_mixed(text)# Vectorized best guess text <- c("To be or not to be?", "Ce n'est pas grave.", "Hij heeft de klok horen luiden maar weet niet waar de klepel hangt.") detect_language(text) # Multiple languages in one text (doesn't seem to work well) detect_language_mixed(text)