The gutenbergr package helps you download and process public domain works from Project Gutenberg. This vignette introduces the package’s metadata datasets and core downloading functionality.
gutenberg_metadataThe gutenberg_metadata dataset contains information
about each work in the Project Gutenberg collection:
#> # A tibble: 81,068 × 8
#> gutenberg_id title author gutenberg_author_id language gutenberg_bookshelf
#> <int> <chr> <chr> <int> <fct> <chr>
#> 1 1 "The De… Jeffe… 1638 en Politics/American …
#> 2 2 "The Un… Unite… 1 en Politics/American …
#> 3 3 "John F… Kenne… 1666 en Category: Essays, …
#> 4 4 "Lincol… Linco… 3 en US Civil War/Categ…
#> 5 5 "The Un… Unite… 1 en United States/Poli…
#> 6 6 "Give M… Henry… 4 en American Revolutio…
#> 7 7 "The Ma… <NA> NA en Category: History …
#> 8 8 "Abraha… Linco… 3 en US Civil War/Categ…
#> 9 9 "Abraha… Linco… 3 en US Civil War/Categ…
#> 10 10 "The Ki… <NA> NA en Banned Books List …
#> # ℹ 81,058 more rows
#> # ℹ 2 more variables: rights <fct>, has_text <lgl>
You can filter this to find specific works:
#> # A tibble: 3 × 8
#> gutenberg_id title author gutenberg_author_id language gutenberg_bookshelf
#> <int> <chr> <chr> <int> <fct> <chr>
#> 1 105 Persuasi… Auste… 68 en "Category: Novels/…
#> 2 22963 Persuasi… Auste… 68 en ""
#> 3 36777 Persuasi… Auste… 68 fr "FR Littérature/Ca…
#> # ℹ 2 more variables: rights <fct>, has_text <lgl>
The metadata currently in the package was last updated on 11 January 2026.
gutenberg_works()In most analyses, you’ll want to filter for English works, avoid
duplicates, and include only books with downloadable text. The
gutenberg_works() function does this automatically:
#> # A tibble: 62,685 × 8
#> gutenberg_id title author gutenberg_author_id language gutenberg_bookshelf
#> <int> <chr> <chr> <int> <fct> <chr>
#> 1 1 "The De… Jeffe… 1638 en Politics/American …
#> 2 2 "The Un… Unite… 1 en Politics/American …
#> 3 3 "John F… Kenne… 1666 en Category: Essays, …
#> 4 4 "Lincol… Linco… 3 en US Civil War/Categ…
#> 5 5 "The Un… Unite… 1 en United States/Poli…
#> 6 6 "Give M… Henry… 4 en American Revolutio…
#> 7 7 "The Ma… <NA> NA en Category: History …
#> 8 8 "Abraha… Linco… 3 en US Civil War/Categ…
#> 9 9 "Abraha… Linco… 3 en US Civil War/Categ…
#> 10 10 "The Ki… <NA> NA en Banned Books List …
#> # ℹ 62,675 more rows
#> # ℹ 2 more variables: rights <fct>, has_text <lgl>
You can also filter directly within the function:
#> # A tibble: 14 × 8
#> gutenberg_id title author gutenberg_author_id language gutenberg_bookshelf
#> <int> <chr> <chr> <int> <fct> <chr>
#> 1 105 "Persua… Auste… 68 en "Category: Novels/…
#> 2 121 "Northa… Auste… 68 en "Gothic Fiction/Ca…
#> 3 141 "Mansfi… Auste… 68 en "Category: Novels/…
#> 4 158 "Emma" Auste… 68 en "Category: Novels/…
#> 5 161 "Sense … Auste… 68 en "Category: Romance…
#> 6 946 "Lady S… Auste… 68 en "Category: Novels/…
#> 7 1212 "Love a… Auste… 68 en "Category: Romance…
#> 8 1342 "Pride … Auste… 68 en "Best Books Ever L…
#> 9 31100 "The Co… Auste… 68 en "Category: Romance…
#> 10 37431 "Pride … Auste… 68 en "Category: Plays/F…
#> 11 42078 "The Le… Auste… 68 en "Category: Biograp…
#> 12 63569 "The Wa… Auste… 68 en "Category: Novels/…
#> 13 74233 "Fragme… Auste… 68 en "Category: Novels/…
#> 14 77117 "The Wa… Auste… 68 en ""
#> # ℹ 2 more variables: rights <fct>, has_text <lgl>
#> # A tibble: 24 × 8
#> gutenberg_id title author gutenberg_author_id language gutenberg_bookshelf
#> <int> <chr> <chr> <int> <fct> <chr>
#> 1 105 Persuas… Auste… 68 en Category: Novels/C…
#> 2 121 Northan… Auste… 68 en Gothic Fiction/Cat…
#> 3 141 Mansfie… Auste… 68 en Category: Novels/C…
#> 4 158 Emma Auste… 68 en Category: Novels/C…
#> 5 161 Sense a… Auste… 68 en Category: Romance/…
#> 6 946 Lady Su… Auste… 68 en Category: Novels/C…
#> 7 1212 Love an… Auste… 68 en Category: Romance/…
#> 8 1342 Pride a… Auste… 68 en Best Books Ever Li…
#> 9 17797 Memoir … Auste… 7603 en Category: Biograph…
#> 10 22536 Jane Au… Auste… 25392 en Category: Biograph…
#> # ℹ 14 more rows
#> # ℹ 2 more variables: rights <fct>, has_text <lgl>
#> # A tibble: 93 × 8
#> gutenberg_id title author gutenberg_author_id language gutenberg_bookshelf
#> <int> <chr> <chr> <int> <fct> <chr>
#> 1 46 "A Chri… Dicke… 37 en Children's Literat…
#> 2 98 "A Tale… Dicke… 37 en Historical Fiction…
#> 3 564 "The My… Dicke… 37 en Mystery Fiction/Ca…
#> 4 580 "The Pi… Dicke… 37 en Best Books Ever Li…
#> 5 588 "Master… Dicke… 37 en Category: Novels/C…
#> 6 644 "The Ha… Dicke… 37 en Christmas/Category…
#> 7 650 "Pictur… Dicke… 37 en Category: Travel W…
#> 8 653 "The Ch… Dicke… 37 en Category: Novels/C…
#> 9 675 "Americ… Dicke… 37 en Category: Travel W…
#> 10 676 "The Ba… Dicke… 37 en Christmas/Category…
#> # ℹ 83 more rows
#> # ℹ 2 more variables: rights <fct>, has_text <lgl>
gutenberg_subjectsThe gutenberg_subjects dataset pairs works with Library
of Congress classifications and subject headings:
#> # A tibble: 260,915 × 3
#> gutenberg_id subject_type subject
#> <int> <fct> <chr>
#> 1 1 lcsh United States -- History -- Revolution, 1775-1783 …
#> 2 1 lcsh United States. Declaration of Independence
#> 3 1 lcc E201
#> 4 1 lcc JK
#> 5 2 lcsh Civil rights -- United States -- Sources
#> 6 2 lcsh United States. Constitution. 1st-10th Amendments
#> 7 2 lcc JK
#> 8 2 lcc KF
#> 9 3 lcsh United States -- Foreign relations -- 1961-1963
#> 10 3 lcsh Presidents -- United States -- Inaugural addresses
#> # ℹ 260,905 more rows
This is useful for finding works by genre or topic:
#> # A tibble: 974 × 3
#> gutenberg_id subject_type subject
#> <int> <fct> <chr>
#> 1 170 lcsh Detective and mystery stories
#> 2 173 lcsh Detective and mystery stories
#> 3 244 lcsh Detective and mystery stories
#> 4 305 lcsh Detective and mystery stories
#> 5 330 lcsh Detective and mystery stories
#> 6 481 lcsh Detective and mystery stories
#> 7 547 lcsh Detective and mystery stories
#> 8 863 lcsh Detective and mystery stories
#> 9 905 lcsh Detective and mystery stories
#> 10 1155 lcsh Detective and mystery stories
#> # ℹ 964 more rows
#> # A tibble: 59 × 3
#> gutenberg_id subject_type subject
#> <int> <fct> <chr>
#> 1 108 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 2 221 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 3 244 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 4 834 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 5 1661 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 6 2097 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 7 2343 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 8 2344 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 9 2345 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> 10 2346 lcsh Holmes, Sherlock (Fictitious character) -- Fiction
#> # ℹ 49 more rows
You can join this with gutenberg_works() to download
books by subject:
Download a book using its Gutenberg ID with
gutenberg_download():
#> # A tibble: 8,357 × 4
#> gutenberg_id text title author
#> <int> <chr> <chr> <chr>
#> 1 105 "Persuasion" Persuasion Austen, Jane
#> 2 105 "" Persuasion Austen, Jane
#> 3 105 "" Persuasion Austen, Jane
#> 4 105 "by Jane Austen" Persuasion Austen, Jane
#> 5 105 "" Persuasion Austen, Jane
#> 6 105 "(1818)" Persuasion Austen, Jane
#> 7 105 "" Persuasion Austen, Jane
#> 8 105 "" Persuasion Austen, Jane
#> 9 105 "" Persuasion Austen, Jane
#> 10 105 "" Persuasion Austen, Jane
#> # ℹ 8,347 more rows
The result is a tibble with:
gutenberg_id - the book’s IDtext - one row per line of textDownload multiple books by providing a vector of IDs:
#> # A tibble: 9,579 × 4
#> gutenberg_id text title author
#> <int> <chr> <chr> <chr>
#> 1 109 "Renascence and Other Poems" Renascence, and Other Poems Millay…
#> 2 109 "" Renascence, and Other Poems Millay…
#> 3 109 "" Renascence, and Other Poems Millay…
#> 4 109 "by" Renascence, and Other Poems Millay…
#> 5 109 "" Renascence, and Other Poems Millay…
#> 6 109 "Edna St. Vincent Millay" Renascence, and Other Poems Millay…
#> 7 109 "" Renascence, and Other Poems Millay…
#> 8 109 "" Renascence, and Other Poems Millay…
#> 9 109 "" Renascence, and Other Poems Millay…
#> 10 109 "" Renascence, and Other Poems Millay…
#> # ℹ 9,569 more rows
Use the meta_fields argument to include additional
information:
#> # A tibble: 2 × 2
#> title n
#> <chr> <int>
#> 1 Persuasion 8357
#> 2 Renascence, and Other Poems 1222
Now that you have book texts as tibbles, you can: