--- title: "Modernising Citation Metadata in R: Introducing `bibrecord`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Modernising Citation Metadata in R: Introducing `bibrecord`} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setupvignette, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(dataset) ``` Descriptive metadata is often added as an afterthought or stored separately from the data it describes. This separation can lead to loss of context when datasets are shared, archived, or reused. To avoid this, the `dataset` package encourages metadata to be embedded at the time of dataset creation. For a `dataset_df`, this means not only providing variable-level definitions, units, and namespaces, but also including a complete, standards-aligned citation record for the dataset itself. Encoding citation information early ensures that it travels with the data, supports the FAIR principles (Findable, Accessible, Interoperable, Reusable), and is ready for export to modern metadata formats. In the [Design Principles & Future Work Semantically Enriched, Standards-Aligned Datasets in R](https://dataset.dataobservatory.eu/articles/design.html), we identify three objectives for dataset-level citation metadata: 1. **Full compliance** with standards such as Dublin Core Terms (DCTERMS) and DataCite 2. **Interoperability** with the R ecosystem, including `dataset_df` and base R tools 3. **Preservation of meaning** throughout the dataset’s lifecycle — from creation to publication and reuse ## Purpose The base R function `utils::bibentry()` offers a way to structure citation metadata and works well for simple references. However, it does not fully support DCTERMS or DataCite, which require: - Clear separation of roles (e.g., creators vs. contributors) - Richly typed relationships between resources - Support for additional metadata fields such as identifiers, subjects, and funding information The `bibrecord` class builds on `bibentry` to bridge this gap while remaining fully compatible with base R. It adds: - Multiple `person()` entries for contributors - Metadata fields aligned with DCTERMS and DataCite - Safe serialization and extended printing methods Ideally, `bibrecord` should evolve in close coordination with `utils::bibentry()` or be replaced by a modernised `bibentry` that supports these capabilities natively, achieving the three objectives described above. ## What is `bibrecord` A `bibrecord` is a standard `bibentry` object with additional fields stored as attributes. This means: - It works with any function that accepts a `bibentry` - It offers structured metadata fields such as `contributor`, `subject`, and `identifier` - Extended methods display both the citation and the enriched metadata ## Creating a `bibrecord` ```{r bibrecord} person_jane <- person("Jane", "Doe", role = "cre") person_alice <- person("Alice", "Smith", role = "dtm") rec <- bibrecord( title = "GDP of Small States", author = list(person_jane), contributor = list(person_alice), publisher = "Tinystat", identifier = "doi:10.1234/example", date = "2023-05-01", subject = "Economic indicators" ) ``` ## Printing a `bibrecord` ```{r print} print(rec) ``` When printed, a `bibrecord` shows the standard citation along with clearly labelled contributor and metadata fields. ## Compatibility with existing infrastructure Because `bibrecord` inherits from `bibentry`: - It works with `citation()` and other base R citation tools - It integrates into existing bibliographic workflows - It can be converted to `as_dublincore()` or `as_datacite()` without loss of information ## Future extensions Planned enhancements to `bibrecord` include: - Support for additional metadata fields such as `funder`, `geolocation`, and `relatedIdentifier` - Export to JSON-LD or RDF formats - Integration with APIs from services like Zenodo, Crossref, or Wikidata In the broader context described in [Design Principles & Future Work Semantically Enriched, Standards-Aligned Datasets in R](https://dataset.dataobservatory.eu/articles/design.html), the long-term goal is to ensure that dataset-level citation metadata in R meets three objectives: 1. **Full compliance with modern metadata standards** such as Dublin Core Terms (DCTERMS) and DataCite 2. **Seamless interoperability with the R ecosystem**, including `dataset_df` and base R tools 3. **Preservation of meaning across the entire data lifecycle**, from dataset creation to long-term publication and reuse To achieve this, `bibrecord` should either evolve in close coordination with `utils::bibentry()` or, ideally, be replaced entirely by a modernised version of `bibentry` that supports these capabilities natively. ## Summary The `bibrecord` class extends base R's `bibentry` to provide structured, standards-aligned citation metadata that can be embedded directly into a `dataset_df`. It keeps full compatibility with existing R workflows while adding support for contributor roles, richer metadata fields, and export to standards like DCTERMS and DataCite. Embedding a `bibrecord` in a `dataset_df` ensures that citation information is: - **Complete** – all key metadata is included at dataset creation - **Portable** – metadata travels with the dataset and can be exported to common formats - **Interoperable** – remains compatible with base R and external metadata consumers By adopting `bibrecord`, you can create datasets that are ready for FAIR-compliant publishing, are easier to share, and maintain their full descriptive context throughout their lifecycle.