Title: | Create Data Frames for Exchange and Reuse |
---|---|
Description: | The 'dataset' package helps create semantically rich, machine-readable, and interoperable datasets in R. It extends tidy data frames with metadata that preserves meaning, improves interoperability, and makes datasets easier to publish, exchange, and reuse in line with ISO and W3C standards. |
Authors: | Daniel Antal [aut, cre] (ORCID: <https://orcid.org/0000-0001-7513-6760>), Marcelo Perlin [rev] (ORCID: <https://orcid.org/0000-0002-9839-4268>), Anna Márta Mester [rev] (ORCID: <https://orcid.org/0009-0008-2274-8163>), Mauro Lepore [rev] (ORCID: <https://orcid.org/0000-0002-1986-7988>) |
Maintainer: | Daniel Antal <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.4.0 |
Built: | 2025-08-27 18:33:05 UTC |
Source: | https://github.com/ropensci/dataset |
as_character()
is the recommended method to convert a
defined()
vector to a character type. It is metadata-aware and
ensures that the underlying data is character before coercion.
Base R's as.character()
method applied to defined
vectors
simply strips the class and returns the values as a plain character vector.
This is equivalent to calling as_character()
with preserve_attributes = FALSE
.
as_character(x, ...) ## S3 method for class 'haven_labelled_defined' as_character(x, preserve_attributes = FALSE, ...) ## S3 method for class 'haven_labelled_defined' as.character(x, ...)
as_character(x, ...) ## S3 method for class 'haven_labelled_defined' as_character(x, preserve_attributes = FALSE, ...) ## S3 method for class 'haven_labelled_defined' as.character(x, ...)
x |
A vector created with |
... |
Reserved for future use. |
preserve_attributes |
Logical. If |
If preserve_attributes = TRUE
, the returned character vector retains
semantic metadata such as unit
, concept
, and namespace
, though the
"defined"
class itself is removed. If preserve_attributes = FALSE
(default), a plain character vector is returned with all attributes stripped.
For numeric-based defined
vectors, as_character()
throws an informative
error to avoid accidental coercion of non-character data.
Note: as.character()
(base R) is supported but simply returns the raw
values, and does not preserve or warn about metadata loss.
A character vector.
# Recommended use fruits <- defined(c("apple", "avocado", "kiwi"), label = "Fruit", unit = "kg") as_character(fruits, preserve_attributes = TRUE) # Strip metadata as_character(fruits, preserve_attributes = FALSE) # Equivalent base R fallback as.character(fruits)
# Recommended use fruits <- defined(c("apple", "avocado", "kiwi"), label = "Fruit", unit = "kg") as_character(fruits, preserve_attributes = TRUE) # Strip metadata as_character(fruits, preserve_attributes = FALSE) # Equivalent base R fallback as.character(fruits)
Constructs a bibliographic metadata record conforming to the
DataCite Metadata Schema. The resulting
object is stored as a modified utils::bibentry()
enriched with structured
Dublin Core and DataCite-compliant metadata.
as_datacite(x, type = "bibentry", ...) datacite( Title, Creator, Identifier = NULL, Publisher = NULL, PublicationYear = NULL, Subject = subject_create(term = "data sets", subjectScheme = "Library of Congress Subject Headings (LCSH)", schemeURI = "https://id.loc.gov/authorities/subjects.html", valueURI = "http://id.loc.gov/authorities/subjects/sh2018002256"), Type = "Dataset", Contributor = NULL, Date = ":tba", DateList = NULL, Language = NULL, AlternateIdentifier = ":unas", RelatedIdentifier = ":unas", Format = ":tba", Version = "0.1.0", Rights = ":tba", Description = ":tba", Geolocation = ":unas", FundingReference = ":unas" ) is.datacite(x) ## S3 method for class 'datacite' is.datacite(x) ## S3 method for class 'datacite' print(x, ...)
as_datacite(x, type = "bibentry", ...) datacite( Title, Creator, Identifier = NULL, Publisher = NULL, PublicationYear = NULL, Subject = subject_create(term = "data sets", subjectScheme = "Library of Congress Subject Headings (LCSH)", schemeURI = "https://id.loc.gov/authorities/subjects.html", valueURI = "http://id.loc.gov/authorities/subjects/sh2018002256"), Type = "Dataset", Contributor = NULL, Date = ":tba", DateList = NULL, Language = NULL, AlternateIdentifier = ":unas", RelatedIdentifier = ":unas", Format = ":tba", Version = "0.1.0", Rights = ":tba", Description = ":tba", Geolocation = ":unas", FundingReference = ":unas" ) is.datacite(x) ## S3 method for class 'datacite' is.datacite(x) ## S3 method for class 'datacite' print(x, ...)
x |
An object that is tested if it has a class "datacite". |
type |
A DataCite 4.4 metadata can be returned as:
|
... |
Optional parameters to add to a |
Title |
The name(s) by which the resource is known. Similar to dct:title. |
Creator |
One or more |
Identifier |
A persistent identifier (e.g., DOI or URI). May refer to a specific version or all versions of the resource. |
Publisher |
The name of the organization that holds, publishes, or
distributes the resource. Required by DataCite. See |
PublicationYear |
The year of public availability (in |
Subject |
A topic, keyword, or classification term. See |
Type |
The resource type. Defaults to |
Contributor |
An individual or institution that contributed to the development, distribution, or curation of the resource. |
Date |
A date in |
DateList |
A list of multiple dates. Currently not supported. |
Language |
Language code as per IETF BCP 47 / ISO 639-1. See |
AlternateIdentifier |
Optional local or secondary identifier. Defaults
to |
RelatedIdentifier |
Related resources (e.g., prior versions, papers).
Defaults to |
Format |
A technical format (e.g., |
Version |
A free-text version string (e.g., |
Rights |
Licensing or usage restrictions for the resource. Defaults to
|
Description |
Free-text summary or additional information. Defaults to
|
Geolocation |
Geographic location covered or referenced by the resource.
See |
FundingReference |
Information about funding or financial support.
Defaults to |
DataCite is a leading non-profit organization that provides persistent identifiers (DOIs) for research data and other research outputs. Members of the research community use DataCite to register datasets with globally resolvable metadata for citation and discovery.
This function sets "Dataset"
as the default resource type. The Size
attribute (e.g., bytes, pages, etc.) is automatically added if available.
as_datacite(x, type)
returns the DataCite bibliographical metadata
of x
either as a list, a bibentry object, an N-Triples text serialisation
or a dataset_df object.
A utils::bibentry()
object with DataCite-compliant fields. Use
as_datacite()
to extract the metadata as a list or bibentry object.
is.datacite(x)
returns a logical values (if the object
x
is of class datacite
).
Learn more in the vignette:
bibrecord
Other bibrecord functions:
as_dublincore()
,
bibrecord()
datacite( Title = "Growth of Orange Trees", Creator = c( person( given = "N.R.", family = "Draper", role = "cre", comment = c(VIAF = "http://viaf.org/viaf/84585260") ), person( given = "H", family = "Smith", role = "cre" ) ), Publisher = "Wiley", Date = 1998, Language = "en" ) # Extract bibliographic metadata as_datacite(orange_df) # As a list as_datacite(orange_df, "list")
datacite( Title = "Growth of Orange Trees", Creator = c( person( given = "N.R.", family = "Draper", role = "cre", comment = c(VIAF = "http://viaf.org/viaf/84585260") ), person( given = "H", family = "Smith", role = "cre" ) ), Publisher = "Wiley", Date = 1998, Language = "en" ) # Extract bibliographic metadata as_datacite(orange_df) # As a list as_datacite(orange_df, "list")
Adds or retrieves metadata conforming to the Dublin Core Metadata Terms standard, enabling consistent and structured citation and retrieval of R dataset objects.
is.dublincore()
checks whether an object inherits from the "dublincore"
class.
as_dublincore(x, type = "bibentry", ...) dublincore( title, creator, contributor = NULL, year = NULL, publisher = NULL, identifier = NULL, subject = NULL, type = "DCMITYPE:Dataset", dataset_date = NULL, language = NULL, relation = NULL, dataset_format = "application/r-rds", rights = NULL, datasource = NULL, description = NULL, coverage = NULL ) is.dublincore(x) ## S3 method for class 'dublincore' print(x, ...)
as_dublincore(x, type = "bibentry", ...) dublincore( title, creator, contributor = NULL, year = NULL, publisher = NULL, identifier = NULL, subject = NULL, type = "DCMITYPE:Dataset", dataset_date = NULL, language = NULL, relation = NULL, dataset_format = "application/r-rds", rights = NULL, datasource = NULL, description = NULL, coverage = NULL ) is.dublincore(x) ## S3 method for class 'dublincore' print(x, ...)
x |
An object to test. |
type |
The resource type. For datasets, use |
... |
Additional metadata fields. |
title |
A name given to the resource. See |
creator |
One or more |
contributor |
Additional contributors ( |
year |
An explicit publication year. If omitted, inferred from
|
publisher |
A character or |
identifier |
A unique persistent identifier (e.g., DOI). See |
subject |
A keyword or controlled vocabulary term. See |
dataset_date |
A publication or release date ( |
language |
ISO 639-1 language code. See |
relation |
A related resource (e.g., version, paper, or parent dataset).
Currently only supports an URI, for example,
|
dataset_format |
The technical format of the dataset (e.g., MIME type).
See |
rights |
A string describing intellectual property or usage rights.
Use a URI like |
datasource |
A URL or label for the original source of the dataset. |
description |
A free-text summary of the dataset. See |
coverage |
Geographic or temporal extent (spatial/temporal coverage). |
The Dublin Core Metadata Element Set (DCMES) is a standardized vocabulary for describing digital and physical resources. It includes 15 core fields and is formally standardized as ISO 15836, IETF RFC 5013, and ANSI/NISO Z39.85.
This function constructs a utils::bibentry()
object extended with DCMI
terms and is compatible with dataset_df()
objects. The resulting metadata
can be used for semantic documentation and machine-readable citation.
For compatibility with utils::bibentry()
, the dataset_date
parameter is
automatically used to derive both publication_date
and year
fields.
A bibentry
object extended with class "bibrecord"
, storing structured
Dublin Core metadata. Use as_dublincore()
to extract the metadata in list,
tabular, or RDF form.
A logical value: TRUE
if x
is a Dublin Core metadata record (i.e.,
inherits from "dublincore"
), otherwise FALSE
.
Learn more in the vignette:
bibrecord
Other bibrecord functions:
as_datacite()
,
bibrecord()
orange_bibentry <- dublincore( title = "Growth of Orange Trees", creator = c( person( given = "N.R.", family = "Draper", role = "cre", comment = c(VIAF = "http://viaf.org/viaf/84585260") ), person(given = "H", family = "Smith", role = "cre") ), contributor = person(given = "Antal", family = "Daniel", role = "dtm"), publisher = "Wiley", datasource = "https://isbnsearch.org/isbn/9780471170822", dataset_date = 1998, identifier = "https://doi.org/10.5281/zenodo.14917851", language = "en", description = "The Orange data frame has 35 rows and 3 columns of records of the growth of orange trees." ) # To inspect structured metadata from a dataset_df object: as_dublincore(orange_df, type = "list")
orange_bibentry <- dublincore( title = "Growth of Orange Trees", creator = c( person( given = "N.R.", family = "Draper", role = "cre", comment = c(VIAF = "http://viaf.org/viaf/84585260") ), person(given = "H", family = "Smith", role = "cre") ), contributor = person(given = "Antal", family = "Daniel", role = "dtm"), publisher = "Wiley", datasource = "https://isbnsearch.org/isbn/9780471170822", dataset_date = 1998, identifier = "https://doi.org/10.5281/zenodo.14917851", language = "en", description = "The Orange data frame has 35 rows and 3 columns of records of the growth of orange trees." ) # To inspect structured metadata from a dataset_df object: as_dublincore(orange_df, type = "list")
Converts a defined()
vector with value labels into a
factor using haven::as_factor()
. This allows categorical defined
vectors to behave like standard factors in models and plotting.
as_factor(x, ...)
as_factor(x, ...)
x |
A vector created with |
... |
Reserved for future extensions; not used. |
A factor vector with levels derived from the value labels.
sex <- defined( c(0, 1, 1, 0), label = "Sex", labels = c("Female" = 0, "Male" = 1) ) as_factor(sex)
sex <- defined( c(0, 1, 1, 0), label = "Sex", labels = c("Female" = 0, "Male" = 1) ) as_factor(sex)
as_numeric()
is the recommended method to convert a defined()
vector to a numeric vector. It ensures the underlying data is numeric and can
optionally preserve semantic metadata.
Base R's as.numeric()
does not support custom classes like defined()
.
This method drops all metadata and class information, returning a plain
numeric vector. It is equivalent to as_numeric(x, preserve_attributes = FALSE)
.
as_numeric(x, ...) ## S3 method for class 'haven_labelled_defined' as_numeric(x, preserve_attributes = FALSE, ...) ## S3 method for class 'haven_labelled_defined' as.numeric(x, ...)
as_numeric(x, ...) ## S3 method for class 'haven_labelled_defined' as_numeric(x, preserve_attributes = FALSE, ...) ## S3 method for class 'haven_labelled_defined' as.numeric(x, ...)
x |
A vector created with |
... |
Reserved for future use. |
preserve_attributes |
Logical. Whether to keep metadata attributes.
Defaults to |
If preserve_attributes = TRUE
, the returned vector retains the unit
,
concept
, and namespace
attributes, but is no longer of class "defined"
.
If FALSE
(default), a base numeric vector is returned without metadata.
For character-based defined
vectors, an error is thrown to avoid invalid
coercion.
A numeric vector (either bare or with metadata, depending on the
preserve_attributes
argument).
as.character()
, strip_defined()
gdp <- defined(c(3897L, 7365L), label = "GDP", unit = "million dollars") # Drop all metadata as_numeric(gdp) # Preserve unit and concept as_numeric(gdp, preserve_attributes = TRUE) # Equivalence to base coercion (without metadata) as.numeric(gdp) # Metadata-aware variant preferred in pipelines attr(as_numeric(gdp, TRUE), "unit")
gdp <- defined(c(3897L, 7365L), label = "GDP", unit = "million dollars") # Drop all metadata as_numeric(gdp) # Preserve unit and concept as_numeric(gdp, preserve_attributes = TRUE) # Equivalence to base coercion (without metadata) as.numeric(gdp) # Metadata-aware variant preferred in pipelines attr(as_numeric(gdp, TRUE), "unit")
Constructs a utils::bibentry()
object extended with Dublin Core and
DataCite-compatible fields. This unified structure supports use with
functions such as dublincore()
and datacite()
, and is the internal
format for storing rich metadata with datasets.
bibrecord( title, author, contributor = NULL, publisher = NULL, year = NULL, date = Sys.Date(), identifier = NULL, subject = NULL, ... )
bibrecord( title, author, contributor = NULL, publisher = NULL, year = NULL, date = Sys.Date(), identifier = NULL, subject = NULL, ... )
title |
A character string specifying the dataset title. |
author |
A |
contributor |
Optional list or vector of |
publisher |
A character string or |
year |
Publication year. Automatically derived from |
date |
A Date object or character string in ISO format. |
identifier |
A persistent identifier (e.g., DOI or URL). |
subject |
Optional keyword, tag, or controlled vocabulary term. |
... |
Additional fields such as |
An object of class "bibrecord"
and "bibentry"
, suitable for citation and
embedding in metadata-aware structures such as dataset_df()
.
Learn more in the vignette:
bibrecord
Other bibrecord functions:
as_datacite()
,
as_dublincore()
bibrecord( title = "Gross domestic product, volumes", author = person("Eurosat"), publisher = person("Eurostat"), identifier = "https://doi.org/10.2908/TEINA011", date = as.Date("2025-05-20") )
bibrecord( title = "Gross domestic product, volumes", author = person("Eurosat"), publisher = person("Eurostat"), identifier = "https://doi.org/10.2908/TEINA011", date = as.Date("2025-05-20") )
Add rows of dataset y
to dataset x
, validating all
semantic metadata. Metadata (labels, units, concept definitions,
namespaces) must match exactly. Additional dataset-level metadata such as
title and creator can be overridden using ...
.
bind_defined_rows(x, y, ..., strict = FALSE)
bind_defined_rows(x, y, ..., strict = FALSE)
x |
A |
y |
A |
... |
Optional dataset-level attributes such as |
strict |
Logical. If |
This function combines two semantically enriched datasets created
with dataset_df()
. All variable-level attributes — including labels,
units, concept definitions, and namespaces — must match. If strict =
TRUE
(the default), the row identifier namespace (used in the rowid
column) must also match exactly.
If strict = FALSE
, row identifiers from y
may differ and will
be ignored; the output will inherit x
's row identifier scheme.
A new dataset_df
object with rows from x
and y
, combined
semantically.
A <- dataset_df( length = defined(c(10, 15), label = "Length", unit = "cm", namespace = "http://example.org" ), identifier = c(id = "http://example.org/dataset#"), dataset_bibentry = dublincore( title = "Dataset A", creator = person("Alice", "Smith") ) ) B <- dataset_df( length = defined(c(20, 25), label = "Length", unit = "cm", namespace = "http://example.org" ), identifier = c(id = "http://example.org/dataset#") ) bind_defined_rows(A, B) # succeeds C <- dataset_df( length = defined(c(30, 35), label = "Length", unit = "cm", namespace = "http://example.org" ), identifier = c(id = "http://another.org/dataset#") ) ## Not run: bind_defined_rows(A, C, strict = TRUE) # fails: mismatched rowid ## End(Not run) bind_defined_rows(A, C, strict = FALSE) # succeeds: rowid inherited
A <- dataset_df( length = defined(c(10, 15), label = "Length", unit = "cm", namespace = "http://example.org" ), identifier = c(id = "http://example.org/dataset#"), dataset_bibentry = dublincore( title = "Dataset A", creator = person("Alice", "Smith") ) ) B <- dataset_df( length = defined(c(20, 25), label = "Length", unit = "cm", namespace = "http://example.org" ), identifier = c(id = "http://example.org/dataset#") ) bind_defined_rows(A, B) # succeeds C <- dataset_df( length = defined(c(30, 35), label = "Length", unit = "cm", namespace = "http://example.org" ), identifier = c(id = "http://another.org/dataset#") ) ## Not run: bind_defined_rows(A, C, strict = TRUE) # fails: mismatched rowid ## End(Not run) bind_defined_rows(A, C, strict = FALSE) # succeeds: rowid inherited
The c()
method for defined
vectors ensures that all semantic metadata
(label, unit, concept, namespace, and value labels) match exactly. This
prevents accidental loss or mixing of incompatible definitions during
concatenation.
## S3 method for class 'haven_labelled_defined' c(...)
## S3 method for class 'haven_labelled_defined' c(...)
... |
One or more vectors created with |
All input vectors must:
Have identical label
attributes
Have identical unit
, concept
, and namespace
Have identical value labels (or none)
A single defined
vector with concatenated values and retained
metadata.
a <- defined(1:3, label = "Length", unit = "meter") b <- defined(4:6, label = "Length", unit = "meter") c(a, b)
a <- defined(1:3, label = "Length", unit = "meter") b <- defined(4:6, label = "Length", unit = "meter") c(a, b)
contributor()
is a lightweight wrapper around creator()
that
works only with contributors. It retrieves or updates only the contributor
entries in the dataset's bibliographic metadata.
contributor(x) contributor(x, overwrite = FALSE) <- value
contributor(x) contributor(x, overwrite = FALSE) <- value
x |
A dataset object created with |
overwrite |
Logical. If |
value |
A |
All people are stored in the author
slot of the underlying
utils::bibentry
. This helper preserves primary creators and filters or
updates only those entries that represent contributors.
A contributor is defined as:
a person with role == "ctb"
, or
a person with a comment[["contributorType"]]
.
Primary creators (authors) typically have role %in% c("aut", "cre")
.
Contributors can be further annotated with metadata in comment
, for
example:
comment = c(contributorType = "hostingInstitution", ORCID = "0000-0000-0000-0000")
contributor()
returns a utils::person
or a list of such objects
corresponding to contributors.
contributor<-()
returns the updated dataset (invisibly).
Other bibliographic helper functions:
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
df <- dataset_df(data.frame(x = 1)) creator(df) <- person("Jane", "Doe", role = "aut") # Add a contributor contributor(df, overwrite = FALSE) <- person("GitHub", role = "ctb", comment = c(contributorType = "hostingInstitution") ) # Replace all contributors contributor(df) <- person("Support", "Team", role = "ctb") # Inspect only contributors contributor(df)
df <- dataset_df(data.frame(x = 1)) creator(df) <- person("Jane", "Doe", role = "aut") # Add a contributor contributor(df, overwrite = FALSE) <- person("GitHub", role = "ctb", comment = c(contributorType = "hostingInstitution") ) # Replace all contributors contributor(df) <- person("Support", "Team", role = "ctb") # Inspect only contributors contributor(df)
Add the optional Creator
property as an attribute to a
dataset object.
creator(x) creator(x, overwrite = TRUE) <- value
creator(x) creator(x, overwrite = TRUE) <- value
x |
A semantically rich data frame object created by
|
overwrite |
If the attributes should be overwritten. In case it is set
to |
value |
The |
The Creator
corresponds to
dct:creator
in Dublin Core and Creator in DataCite. The name of the entity that holds,
archives, publishes prints, distributes, releases, issues, or produces the
dataset. This property will be used to formulate the citation, so consider
the prominence of the role.
The Creator attribute as a character of length one is added to
x
.
Other bibliographic helper functions:
contributor()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
creator(orange_df) # To change author: creator(orange_df) <- person("Jane", "Doe") # To add author: creator(orange_df, overwrite = FALSE) <- person("John", "Doe")
creator(orange_df) # To change author: creator(orange_df) <- person("Jane", "Doe") # To add author: creator(orange_df, overwrite = FALSE) <- person("John", "Doe")
dataset_df
objectThe dataset_df()
constructor creates semantically rich modern data frames.
These inherit from tibble::tibble
and carry structured metadata using
attributes.
dataset_df( ..., identifier = c(obs = "http://example.com/dataset#obs"), var_labels = NULL, units = NULL, concepts = NULL, dataset_bibentry = NULL, dataset_subject = NULL ) as_dataset_df( df, identifier = c(obs = "http://example.com/dataset#obs"), var_labels = NULL, units = NULL, concepts = NULL, dataset_bibentry = NULL, dataset_subject = NULL, ... ) is.dataset_df(x) ## S3 method for class 'dataset_df' print(x, ...) is_dataset_df(x)
dataset_df( ..., identifier = c(obs = "http://example.com/dataset#obs"), var_labels = NULL, units = NULL, concepts = NULL, dataset_bibentry = NULL, dataset_subject = NULL ) as_dataset_df( df, identifier = c(obs = "http://example.com/dataset#obs"), var_labels = NULL, units = NULL, concepts = NULL, dataset_bibentry = NULL, dataset_subject = NULL, ... ) is.dataset_df(x) ## S3 method for class 'dataset_df' print(x, ...) is_dataset_df(x)
... |
Vectors (columns) that should be included in the dataset. |
identifier |
A named vector of one or more URI prefixes for row IDs.
Defaults to |
var_labels |
A named list of human-readable labels for each variable. |
units |
A named list of measurement units for measured variables. |
concepts |
A named list of linked concepts (URIs) for variables or dimensions. |
dataset_bibentry |
A bibliographic metadata record for the dataset,
created using |
dataset_subject |
A subject descriptor created with |
df |
A |
x |
A |
Use is.dataset_df()
to check class membership.
S3 methods for dataset_df
include:
print()
to display the dataset with metadata
summary()
to summarize both data and metadata
For full details, see vignette("dataset_df", package = "dataset")
.
A dataset_df
object: a tibble with attached metadata stored in
attributes.
is.dataset_df
returns a logical value
(if the object is of class dataset_df
.)
A simple, serverless scaffolding for publishing dataset_df
objects
on the web (with HTML + RDF exports) is available at
https://github.com/dataobservatory-eu/dataset-template.
defined()
, dublincore()
, datacite()
, subject()
my_dataset <- dataset_df( country_name = defined( c("AD", "LI"), concept = "http://data.europa.eu/bna/c_6c2bb82d", namespace = "https://www.geonames.org/countries/$1/" ), gdp = defined( c(3897, 7365), label = "Gross Domestic Product", unit = "million dollars", concept = "http://data.europa.eu/83i/aa/GDP" ), identifier = c( obs = "https://dataobservatory-eu.github.io/dataset-template#" ), dataset_bibentry = dublincore( title = "GDP of Andorra and Liechtenstein", description = "A small but semantically rich dataset example.", creator = person("Jane", "Doe", role = "cre"), publisher = "Open Data Institute", language = "en" ) ) # Basic usage print(my_dataset) head(my_dataset) summary(my_dataset) # Metadata access as_dublincore(my_dataset) as_datacite(my_dataset) # Export description as RDF triples my_description <- describe(my_dataset, con = tempfile()) my_description
my_dataset <- dataset_df( country_name = defined( c("AD", "LI"), concept = "http://data.europa.eu/bna/c_6c2bb82d", namespace = "https://www.geonames.org/countries/$1/" ), gdp = defined( c(3897, 7365), label = "Gross Domestic Product", unit = "million dollars", concept = "http://data.europa.eu/83i/aa/GDP" ), identifier = c( obs = "https://dataobservatory-eu.github.io/dataset-template#" ), dataset_bibentry = dublincore( title = "GDP of Andorra and Liechtenstein", description = "A small but semantically rich dataset example.", creator = person("Jane", "Doe", role = "cre"), publisher = "Open Data Institute", language = "en" ) ) # Basic usage print(my_dataset) head(my_dataset) summary(my_dataset) # Metadata access as_dublincore(my_dataset) as_datacite(my_dataset) # Export description as RDF triples my_description <- describe(my_dataset, con = tempfile()) my_description
Adds or retrieves the optional "format"
field of a dataset's bibentry.
This field is the dataset's technical/media type (e.g., a MIME type).
dataset_format(x) dataset_format(x, overwrite = FALSE) <- value
dataset_format(x) dataset_format(x, overwrite = FALSE) <- value
x |
A semantically rich data frame created with |
overwrite |
Logical. Replace an existing non‑default value? If |
value |
A length‑one character string specifying the format
(e.g., |
The format field corresponds to
dct:format
in Dublin Core and to format
in
DataCite.
It is useful for indicating serialization such as "text/csv"
,
"application/parquet"
, or "application/r-rds"
.
If no format is set, this helper uses the package default
"application/r-rds"
.
The "format"
(technical format) as a character string (length 1).
When assigning, the updated object x
is returned invisibly.
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
dataset_format(orange_df) <- "text/csv" dataset_format(orange_df) # Reset to the package default dataset_format(orange_df) <- NULL
dataset_format(orange_df) <- "text/csv" dataset_format(orange_df) # Reset to the package default dataset_format(orange_df) <- NULL
Retrieve or assign the main title of a dataset, typically used as the primary label in metadata exports (e.g., DataCite or Dublin Core).
dataset_title(x) dataset_title(x, overwrite = FALSE) <- value
dataset_title(x) dataset_title(x, overwrite = FALSE) <- value
x |
A dataset object created by |
overwrite |
Logical. If |
value |
A character string representing the new title. If |
According to the Dublin Core specification for title
,
the title represents the name by which the resource is formally known.
The DataCite metadata schema supports multiple titles (e.g., translated, alternative), but this function currently supports only a single main title.
dataset_title()
returns the current dataset title as a character
string. dataset_title<-()
returns the updated dataset object (invisible).
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
dataset_title(orange_df) # Set a new title with overwrite = TRUE dataset_title(orange_df, overwrite = TRUE) <- "The Growth of Orange Trees" dataset_title(orange_df)
dataset_title(orange_df) # Set a new title with overwrite = TRUE dataset_title(orange_df, overwrite = TRUE) <- "The Growth of Orange Trees" dataset_title(orange_df)
Converts a dataset to RDF-style triples with subject, predicate, and object columns. Supports semantic expansion via variable metadata.
dataset_to_triples(x, idcol = NULL, expand_uri = TRUE, format = "data.frame")
dataset_to_triples(x, idcol = NULL, expand_uri = TRUE, format = "data.frame")
x |
A |
idcol |
Name or index of the subject column. If NULL, defaults to
|
expand_uri |
Logical; if TRUE, expands URIs using namespaces and definitions. |
format |
Output format: |
For publishing examples, a minimal serverless scaffold is provided at https://github.com/dataobservatory-eu/dataset-template, which shows how to host CSV + RDF serialisations on GitHub Pages without any server setup.
Either a data.frame
with columns s
, p
, and o
, or a character
vector of N-Triple lines.
A simple, serverless scaffolding for publishing dataset_df
objects
on the web (with HTML + RDF exports) is available at
https://github.com/dataobservatory-eu/dataset-template.
# A minimal example with just rowid and geo data("gdp", package = "dataset") small_geo <- dataset_df( geo = defined( gdp$geo[1:3], label = "Geopolitical entity", concept = "http://example.com/prop/geo", namespace = "https://dd.eionet.europa.eu/vocabulary/eurostat/geo/$1" ) ) # View as triple table dataset_to_triples(small_geo) # View as N-Triples dataset_to_triples(small_geo, format = "nt")
# A minimal example with just rowid and geo data("gdp", package = "dataset") small_geo <- dataset_df( geo = defined( gdp$geo[1:3], label = "Geopolitical entity", concept = "http://example.com/prop/geo", namespace = "https://dd.eionet.europa.eu/vocabulary/eurostat/geo/$1" ) ) # View as triple table dataset_to_triples(small_geo) # View as N-Triples dataset_to_triples(small_geo, format = "nt")
defined()
constructs a vector enriched with semantic metadata such as a
label, unit of measurement, concept URI, and optional namespace.
These vectors behave like base R vectors but retain metadata during
subsetting, comparison, and printing.
defined( x, labels = NULL, label = NULL, unit = NULL, concept = NULL, namespace = NULL, ... ) is.defined(x) ## S3 method for class 'haven_labelled_defined' summary(object, ...)
defined( x, labels = NULL, label = NULL, unit = NULL, concept = NULL, namespace = NULL, ... ) is.defined(x) ## S3 method for class 'haven_labelled_defined' summary(object, ...)
x |
A vector of type character, numeric, Date, factor, or a |
labels |
An optional named vector of value labels. Only a subset of values may be labelled. |
label |
A short human-readable label (string of length 1). |
unit |
Unit of measurement (e.g., "kg", "hours"). Must be a string of
length 1 or |
concept |
A URI or concept name representing the meaning of the variable. |
namespace |
Optional string or named character vector, used for value-level URI expansion. |
... |
Reserved for future use. |
object |
An R object to be summarised. |
The resulting object inherits from haven::labelled()
and integrates with
tidyverse workflows, enabling downstream conversion to RDF and other
standards.
A vector of class "defined"
(technically
haven_labelled_defined
), which behaves like a standard vector with
additional semantic metadata and is inherited from haven::labelled()
.
browseVignettes("dataset")
is.defined()
, as_numeric()
, as_character()
, as_factor()
,
strip_defined()
gdp_vector <- defined( c(3897, 7365, 6753), label = "Gross Domestic Product", unit = "million dollars", concept = "http://data.europa.eu/83i/aa/GDP" ) # To check the s3 class of the vector: is.defined(gdp_vector) # To print the defined vector: print(gdp_vector) # To summarise the defined vector: summary(gdp_vector) # Subsetting work as expected: gdp_vector[1:2]
gdp_vector <- defined( c(3897, 7365, 6753), label = "Gross Domestic Product", unit = "million dollars", concept = "http://data.europa.eu/83i/aa/GDP" ) # To check the s3 class of the vector: is.defined(gdp_vector) # To print the defined vector: print(gdp_vector) # To summarise the defined vector: summary(gdp_vector) # Subsetting work as expected: gdp_vector[1:2]
Writes provenance and Dublin Core metadata of a dataset to a file or connection in N-Triples format.
describe(x, con)
describe(x, con)
x |
A |
con |
A connection or a character string path (e.g. from |
Writes N-Triples to con
and invisibly returns x
.
test_ds <- dataset_df( rowid = defined(c("eg:1", "eg:2"), namespace = "http://example.com/dataset#" ), geo = defined( gdp$geo[1:2], label = "Country", concept = "http://example.com/prop/geo", namespace = "https://eionet.europa.eu/geo/$1" ), dataset_bibentry = dublincore( title = "Example Dataset", creator = person("John", "Doe") ) ) # returns invisibly the contents of the text file serialisation: testdescription <- describe(test_ds, con = tempfile()) testdescription
test_ds <- dataset_df( rowid = defined(c("eg:1", "eg:2"), namespace = "http://example.com/dataset#" ), geo = defined( gdp$geo[1:2], label = "Country", concept = "http://example.com/prop/geo", namespace = "https://eionet.europa.eu/geo/$1" ), dataset_bibentry = dublincore( title = "Example Dataset", creator = person("John", "Doe") ) ) # returns invisibly the contents of the text file serialisation: testdescription <- describe(test_ds, con = tempfile()) testdescription
Get or set the optional Description
property as an attribute
on a dataset object.
description(x) description(x, overwrite = FALSE) <- value
description(x) description(x, overwrite = FALSE) <- value
x |
A dataset object created with |
overwrite |
Logical. If |
value |
The new description, as a character string. |
The Description
is recommended for discovery in DataCite. It
captures additional information that does not fit other metadata categories
— such as technical notes or dataset usage. It is a free-text field. See
dct:description.
The Description
attribute as a character vector of length 1.
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
description(orange_df) description(orange_df, overwrite = TRUE) <- "This dataset records orange tree growth." description(orange_df)
description(orange_df) description(orange_df, overwrite = TRUE) <- "This dataset records orange tree growth." description(orange_df)
A compact sample of GDP and main aggregates from Eurostat's annual international cooperation dataset. This data subset contains illustrative records for select countries and time periods.
gdp
gdp
A data frame with 10 rows and 5 variables:
geo
: Country name (character)
year
: Reference year (integer)
gdp
: Gross Domestic Product value (numeric)
unit
: Unit of measurement, e.g., "Million EUR" (character)
freq
: Observation frequency, e.g., "Annual" (character)
This dataset is intended for examples, tests, and demonstration purposes. It reflects simplified GDP data as published by Eurostat. The actual Eurostat dataset includes more countries, breakdowns, and metadata.
Eurostat (2021). GDP and main aggregates - international data cooperation (annual data). doi:10.2908/NAIDA_10_GDP
head(gdp)
head(gdp)
Access or assign the optional geolocation
attribute to a semantically rich
dataset object.
geolocation(x) geolocation(x, overwrite = TRUE) <- value
geolocation(x) geolocation(x, overwrite = TRUE) <- value
x |
A dataset object created by |
overwrite |
Logical. If |
value |
A character string specifying the |
The geolocation
field describes the spatial region or named place where
the data was collected or that the dataset is about. This field is
recommended for data discovery in DataCite Metadata Schema 4.4.
See: DataCite: Geolocation Guidance
A character string of length 1, representing the geolocation
attribute attached to x
.
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
orange_dataset <- orange_df geolocation(orange_df) <- "US" geolocation(orange_df) geolocation(orange_df, overwrite = FALSE) <- "GB"
orange_dataset <- orange_df geolocation(orange_df) <- "US" geolocation(orange_df) geolocation(orange_df, overwrite = FALSE) <- "GB"
Retrieve or replace the bibliographic entry stored in a dataset's attributes.
The entry is a utils::bibentry
used to hold citation metadata for
dataset_df()
objects.
get_bibentry(dataset) set_bibentry(dataset) <- value
get_bibentry(dataset) set_bibentry(dataset) <- value
dataset |
A dataset created with |
value |
A |
New datasets are initialized with reasonable defaults. To build a new
bibentry with sensible defaults and field names, use datacite()
(DataCite)
or dublincore()
(Dublin Core), then assign it with
set_bibentry(dataset) <- value
.
See the vignette for more background:
vignette("bibentry", package = "dataset")
.
get_bibentry(dataset)
returns the utils::bibentry
stored in
dataset
's attributes.
set_bibentry(dataset) <- value
sets the attribute and returns the
modified dataset invisibly.
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
# Get the bibentry of a dataset_df object: be <- get_bibentry(orange_df) # Create a well-formed bibentry (DataCite-style): be2 <- datacite( Creator = person("Jane", "Doe"), Title = "The Orange Trees Dataset", Publisher = "MyOrg" ) # Assign the new bibentry: set_bibentry(orange_df) <- be2 # Inspect in different notations: as_datacite(orange_df, type = "list") as_dublincore(orange_df, type = "list")
# Get the bibentry of a dataset_df object: be <- get_bibentry(orange_df) # Create a well-formed bibentry (DataCite-style): be2 <- datacite( Creator = person("Jane", "Doe"), Title = "The Orange Trees Dataset", Publisher = "MyOrg" ) # Assign the new bibentry: set_bibentry(orange_df) <- be2 # Inspect in different notations: as_datacite(orange_df, type = "list") as_dublincore(orange_df, type = "list")
Returns a named list of concept URIs (or NULLs) for all variables.
get_variable_concepts(x)
get_variable_concepts(x)
x |
A |
A named list of concept URIs for each variable.
get_variable_concepts(orange_df)
get_variable_concepts(orange_df)
Adds a prefixed identifier (e.g., eg:
) to the first column of a dataset,
useful for generating semantic row IDs (e.g., for RDF serialization).
id_to_column(x, prefix = "eg:", ids = NULL)
id_to_column(x, prefix = "eg:", ids = NULL)
x |
A dataset created with |
prefix |
A character string used as the prefix for row identifiers.
Defaults to |
ids |
Optional. A character vector of custom IDs to use instead of row names. |
A dataset of the same class as x
, with the first column updated to include
unique prefixed identifiers.
# Example with a dataset_df object: id_to_column(orange_df) # Example with a regular data.frame: id_to_column(Orange, prefix = "orange:")
# Example with a dataset_df object: id_to_column(orange_df) # Example with a regular data.frame: id_to_column(Orange, prefix = "orange:")
Retrieve or assign the identifier
attribute of a dataset or
bibliographic metadata object.
identifier(x) identifier(x, overwrite = TRUE) <- value
identifier(x) identifier(x, overwrite = TRUE) <- value
x |
A |
overwrite |
Logical. If |
value |
A character string giving the identifier. Can be named (e.g.,
|
An identifier provides an unambiguous reference to a resource. Recommended practice is to supply a persistent identifier string, such as a DOI, ISBN, or URN, that conforms to a recognized identification system.
Both Dublin Core
and DataCite 4.4
define identifier
as a core property. If the identifier is a DOI, it will
also be stored in the doi
field of the metadata record.
Although identifier
is not part of the minimal Dublin Core term set, it is
always included in dataset
metadata for compatibility with publishing and
indexing systems. You may omit it if working under a strict DC profile.
For best practice in choosing identifier schemes, see the IANA-registered URI schemes.
For identifier()
, the current identifier as a character string. For
identifier<-()
, the updated object (invisible).
orange_copy <- orange_df # Get the current identifier identifier(orange_copy) # Set a new identifier (e.g., a DOI) identifier(orange_copy) <- "https://doi.org/10.9999/example.doi" # Prevent accidental overwrite identifier(orange_copy, overwrite = FALSE) <- "https://example.org/id" # Use numeric and NULL values identifier(orange_copy) <- 12345 identifier(orange_copy) <- NULL # Sets ":unas"
orange_copy <- orange_df # Get the current identifier identifier(orange_copy) # Set a new identifier (e.g., a DOI) identifier(orange_copy) <- "https://doi.org/10.9999/example.doi" # Prevent accidental overwrite identifier(orange_copy, overwrite = FALSE) <- "https://example.org/id" # Use numeric and NULL values identifier(orange_copy) <- 12345 identifier(orange_copy) <- NULL # Sets ":unas"
Assign the primary language of a semantically rich dataset object using an
ISO 639 language code or full language name. This sets the language
attribute in the dataset's metadata.
language(x) language(x, iso_639_code = "639-3") <- value language(x, iso_639_code = "639-3") <- value
language(x) language(x, iso_639_code = "639-3") <- value language(x, iso_639_code = "639-3") <- value
x |
A dataset object created by |
iso_639_code |
A character string indicating the desired return format:
either |
value |
A 2-letter or 3-letter language code (ISO 639-1 or ISO 639-2), or a full language name (case-insensitive). |
This function supports recognition of:
2-letter codes (ISO 639-1, e.g., "en"
, "fr"
)
3-letter codes from both:
Alpha_3_B
(bibliographic, e.g., "fre"
)
Alpha_3_T
(terminologic, e.g., "fra"
)
Full language names (e.g., "English"
, "French"
)
For compatibility with open science repositories and modern metadata
standards, this function returns the terminologic code (Alpha_3_T
)
when available. If Alpha_3_T
is missing for a language, the legacy
bibliographic code (Alpha_3_B
) is used as a fallback.
Full language names (e.g., "English"
, "Spanish"
) are matched
case-insensitively against the ISO 639-2 Name field. Exact matches are
attempted first; if none are found, a prefix match is used. For example:
"English"
returns "eng"
"English, Old"
returns "ang"
This means that:
Both "fra"
(terminologic) and "fre"
(bibliographic) will be accepted
as valid input for French
The resulting value stored and returned will be "fra"
This behaviour aligns with:
Common repository practices (Zenodo, OSF, Figshare)
If value
is NULL
, the language is marked as ":unas"
(unspecified).
In some cases<U+2014>especially for historical or moribund languages<U+2014>multiple
similar names may exist. In such cases, it is safer to use a specific
language code (e.g., "ang"
instead of "English, Old"
and "enm"
for "English, Middle (1100-1500)"
). You can also
refer directly to the definitions in ISOcodes::ISO_639_2
for clarity.
The dataset with an updated language
attribute, typically an ISO
639-2/T code (Alpha_3_T
) such as "fra"
, "eng"
, "spa"
, etc.
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
publication_year()
,
publisher()
,
relation()
,
rights()
,
subject()
df <- dataset_df(data.frame(x = 1:3)) language(df) <- "English" # Returns "eng" language(df) <- "fre" # Legacy code; returns "fra" language(df) <- "fra" # Returns "fra" language(df, iso_639_code = "639-1") <- "fra" # Returns "fr" language(df) <- NULL # Sets ":unas"
df <- dataset_df(data.frame(x = 1:3)) language(df) <- "English" # Returns "eng" language(df) <- "fre" # Legacy code; returns "fra" language(df) <- "fra" # Returns "fra" language(df, iso_639_code = "639-1") <- "fra" # Returns "fr" language(df) <- NULL # Sets ":unas"
Create a single N-Triple triple.
n_triple(s, p, o)
n_triple(s, p, o)
s |
The subject of a triplet. |
p |
The predicate of a triplet. |
o |
The object of a triplet. |
N-Triples is an easy to parse line-based subset of Turtle to serialize
RDF. An N-Triple triple is a sequence of RDF terms representing the subject,
predicate and object of an RDF Triple. Use n_triples()
to serialize
multiple statements.
A character vector containing one N-Triple string.
s <- "http://example.org/show/218" p <- "http://www.w3.org/2000/01/rdf-schema#label" o <- "That Seventies Show" n_triple(s, p, o)
s <- "http://example.org/show/218" p <- "http://www.w3.org/2000/01/rdf-schema#label" o <- "That Seventies Show" n_triple(s, p, o)
Create RDF triple statements to annotate your dataset with standard, interoperable metadata.
n_triples(triples)
n_triples(triples)
triples |
A character vector of concatenated N-Triples, created with
|
N-Triples is a line-based serialization format for RDF. It is easy to parse and widely supported. For details, see the W3C RDF 1.2 N-Triples specification.
A character vector of unique N-Triple strings.
triple_1 <- n_triple( "http://example.org/show/218", "http://www.w3.org/2000/01/rdf-schema#label", "That Seventies Show" ) triple_2 <- n_triple( "http://example.org/show/218", "http://example.org/show/localName", '"Cette Série des Années Septante"@fr-be' ) n_triples(c(triple_1, triple_2, triple_1))
triple_1 <- n_triple( "http://example.org/show/218", "http://www.w3.org/2000/01/rdf-schema#label", "That Seventies Show" ) triple_2 <- n_triple( "http://example.org/show/218", "http://example.org/show/localName", '"Cette Série des Années Septante"@fr-be' ) n_triples(c(triple_1, triple_2, triple_1))
A dataset recording the growth of orange trees, replicated from the classic
datasets::Orange
dataset and implemented as a dataset_df
S3 class with enhanced semantic metadata.
orange_df
orange_df
A data frame with 35 rows and 4 variables:
rowid
: A unique identifier for each row (character)
tree
: Tree identifier (ordered factor)
age
: Age of the tree in days (numeric)
circumference
: Trunk circumference in mm (numeric)
This is a semantically enriched version of the classic Orange dataset,
constructed using the dataset_df()
and dublincore()
constructors.
Each column includes semantic metadata such as units, labels, concepts,
or namespace identifiers. The dataset also embeds a machine-readable citation
for reproducibility and provenance tracking.
orange_bibentry <- dublincore( title = "Growth of Orange Trees", creator = c( person( given = "N.R.", family = "Draper", role = "cre", comment = c(VIAF = "http://viaf.org/viaf/84585260") ), person( given = "H", family = "Smith", role = "cre" ) ), contributor = person( given = "Antal", family = "Daniel", role = "dtm" ), publisher = "Wiley", datasource = "https://isbnsearch.org/isbn/9780471170822", dataset_date = 1998, identifier = "https://doi.org/10.5281/zenodo.14917851", language = "en", description = "The Orange data frame has 35 rows and 3 columns of records of the growth of orange trees." ) orange_df <- dataset_df( rowid = defined(paste0("orange:", row.names(Orange)), label = "ID in the Orange dataset", namespace = c("orange" = "datasets::Orange") ), tree = defined(Orange$Tree, label = "The number of the tree" ), age = defined(Orange$age, label = "The age of the tree", unit = "days since 1968/12/31" ), circumference = defined(Orange$circumference, label = "circumference at breast height", unit = "milimeter", concept = "https://www.wikidata.org/wiki/Property:P2043" ), dataset_bibentry = orange_bibentry ) orange_df$rowid <- defined(orange_df$rowid, namespace = "https://doi.org/10.5281/zenodo.14917851" )
Draper, N. R. & Smith, H. (1998). Applied Regression Analysis (3rd ed.). Wiley.
Pinheiro, J. C. & Bates, D. M. (2000). Mixed-effects Models in S and S-PLUS. Springer.
Becker, R. A., Chambers, J. M. & Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole.
# Print with semantic citation and data preview print(orange_df) # Access semantic metadata associated with variables print(orange_df$age) # Retrieve the embedded bibliographic record as_dublincore(orange_df)
# Print with semantic citation and data preview print(orange_df) # Access semantic metadata associated with variables print(orange_df$age) # Retrieve the embedded bibliographic record as_dublincore(orange_df)
Retrieve or append provenance statements (in N‑Triples form) stored on a
dataset_df()
object.
provenance(x) provenance(x) <- value
provenance(x) provenance(x) <- value
x |
A dataset created with |
value |
Character vector of N‑Triples created by |
Provenance is stored in the "prov"
attribute as N‑Triples text. Use
n_triple()
or n_triples()
to construct valid statements that follow
PROV‑O (e.g., prov:wasGeneratedBy
, prov:wasInformedBy
).
provenance(x)
returns the contents of the "prov"
attribute (character
vector of N‑Triples), or NULL
if none is set.
provenance(x) <- value
appends value
to the "prov"
attribute and
returns the modified dataset invisibly.
provenance(orange_df) # Add a provenance statement: provenance(orange_df) <- n_triple( "https://doi.org/10.5281/zenodo.10396807", "http://www.w3.org/ns/prov#wasInformedBy", "http://example.com/source#1" )
provenance(orange_df) # Add a provenance statement: provenance(orange_df) <- n_triple( "https://doi.org/10.5281/zenodo.10396807", "http://www.w3.org/ns/prov#wasInformedBy", "http://example.com/source#1" )
Access or assign the optional publication_year
attribute to a semantically
rich dataset object.
publication_year(x) publication_year(x, overwrite = TRUE) <- value
publication_year(x) publication_year(x, overwrite = TRUE) <- value
x |
A dataset object created by |
overwrite |
Logical. If |
value |
A character string specifying the publication year. |
The publication_year
represents the year when the dataset was or will be
made publicly available, in YYYY
format. For additional context, see
DataCite: Publication Year-Additional Guidance.
The publication_year
attribute as a character string.
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publisher()
,
relation()
,
rights()
,
subject()
publication_year(orange_df) publication_year(orange_df) <- "1998"
publication_year(orange_df) publication_year(orange_df) <- "1998"
The publisher is the entity responsible for holding, archiving, releasing, or distributing the resource. It is typically included in dataset citation metadata.
For software, this might refer to a code repository (e.g., GitHub). If both
a hosting platform and a producing institution are involved, use the
publisher for the institution and creator()
with
contributorType = "hostingInstitution"
for the platform.
publisher(x) publisher(x, overwrite = TRUE) <- value
publisher(x) publisher(x, overwrite = TRUE) <- value
x |
A dataset object created with |
overwrite |
Logical. Should existing publisher metadata be overwritten?
Defaults to |
value |
A character string specifying the publisher. |
Adds or retrieves the optional "publisher"
attribute for a dataset object.
This property aligns with dct:publisher
(Dublin Core) and publisher
(DataCite).
A character string of length one containing the "publisher"
attribute.
When assigning, the updated object x
is returned invisibly.
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
relation()
,
rights()
,
subject()
publisher(orange_df) <- "Wiley" publisher(orange_df)
publisher(orange_df) <- "Wiley" publisher(orange_df)
Manage related resources for a dataset using a unified accessor.
For DataCite 4.x, this maps to relatedIdentifier
(+ type & relation).
For Dublin Core, this maps to dct:relation
(string).
relation(x) relation(x) <- value related_create( relatedIdentifier, relationType, relatedIdentifierType, resourceTypeGeneral = NULL ) is.related(x) related_item(x) related_item(x) <- value
relation(x) relation(x) <- value related_create( relatedIdentifier, relationType, relatedIdentifierType, resourceTypeGeneral = NULL ) is.related(x) related_item(x) related_item(x) <- value
x |
A dataset object created with |
value |
A |
relatedIdentifier |
A string with the identifier of the related resource. |
relationType |
A string naming the relation type (per DataCite vocabulary). |
relatedIdentifierType |
A string naming the identifier type ( |
resourceTypeGeneral |
Optional: a string naming the general type of the related resource. |
To remain compatible with utils::bibentry()
, the bibentry stores
only the
string identifier (e.g., DOI/URL). The full structured object created by
related_create()
is preserved in the "relation"
attribute.
A "related"
object is a small S3 list with the following elements:
relatedIdentifier
: the related resource identifier (DOI, URL, etc.)
relationType
: the DataCite relation type (e.g., "IsPartOf"
, "References"
)
relatedIdentifierType
: the type of identifier ("DOI"
, "URL"
, etc.)
resourceTypeGeneral
: optional, the general type of the related resource (e.g., "Text"
, "Dataset"
)
relation(x)
returns:
a single structured "related"
object (from related_create()
) if only
one relation is present,
a list of "related"
objects if multiple relations are present,
otherwise it falls back to the bibentry field (relatedidentifier
for
DataCite or relation
for Dublin Core).
relation(x) <- value
sets the "relation"
attribute (structured object
or list of objects) and the bibentry string fields (relatedidentifier
and
relation
), and returns the dataset invisibly.
related_create()
constructs a structured "related"
object.
is.related(x)
returns TRUE
if x
inherits from class "related"
.
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
rights()
,
subject()
df <- dataset_df(data.frame(x = 1)) relation(df) <- related_create( relatedIdentifier = "10.1234/example", relationType = "IsPartOf", relatedIdentifierType = "DOI" ) relation(df) # structured object get_bibentry(df)$relation # "10.1234/example" get_bibentry(df)$relatedidentifier # "10.1234/example" # Character input is normalized to a DOI/URL with default types relation(df) <- "https://doi.org/10.5678/xyz" relation(df) # structured object (relationType/Type filled with defaults) # Create related object directly rel <- related_create("https://doi.org/10.5678/xyz", "References", "DOI") is.related(rel) # TRUE
df <- dataset_df(data.frame(x = 1)) relation(df) <- related_create( relatedIdentifier = "10.1234/example", relationType = "IsPartOf", relatedIdentifierType = "DOI" ) relation(df) # structured object get_bibentry(df)$relation # "10.1234/example" get_bibentry(df)$relatedidentifier # "10.1234/example" # Character input is normalized to a DOI/URL with default types relation(df) <- "https://doi.org/10.5678/xyz" relation(df) # structured object (relationType/Type filled with defaults) # Create related object directly rel <- related_create("https://doi.org/10.5678/xyz", "References", "DOI") is.related(rel) # TRUE
Adds or retrieves the optional "rights"
attribute of a dataset object.
This field contains information about intellectual property or usage rights.
rights(x) rights(x, overwrite = FALSE) <- value
rights(x) rights(x, overwrite = FALSE) <- value
x |
A semantically rich data frame created with |
overwrite |
Logical. Should the existing value be replaced? If |
value |
A character string specifying the rights (e.g., |
The "rights"
field corresponds to
dct:rights
from Dublin Core, and to rights
in DataCite.
Rights information typically includes statements about legal ownership, licensing, or usage conditions. It helps ensure that users understand how a dataset may be reused, cited, or shared.
The "rights"
attribute of the dataset as a character string (length 1).
When assigning, the updated object x
is returned invisibly.
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
subject()
rights(orange_df) <- "CC-BY-SA" rights(orange_df)
rights(orange_df) <- "CC-BY-SA" rights(orange_df)
Converts a defined
vector to a base R numeric or character,
retaining metadata as passive attributes.
strip_defined(x)
strip_defined(x)
x |
A |
A base R vector with attributes (label
, unit
, etc.) intact.
gdp <- defined(c(3897L, 7365L), label = "GDP", unit = "million dollars") strip_defined(gdp) fruits <- defined(c("apple", "avocado", "kiwi"), label = "Fruit", unit = "kg" ) strip_defined(fruits)
gdp <- defined(c(3897L, 7365L), label = "GDP", unit = "million dollars") strip_defined(gdp) fruits <- defined(c("apple", "avocado", "kiwi"), label = "Fruit", unit = "kg" ) strip_defined(fruits)
Manage the subject metadata of a dataset. The subject can be stored as a
simple character term or as a structured object with subproperties created by
subject_create()
.
subject(x) subject_create( term, schemeURI = NULL, valueURI = NULL, prefix = NULL, subjectScheme = NULL, classificationCode = NULL ) subject(x) <- value is.subject(x)
subject(x) subject_create( term, schemeURI = NULL, valueURI = NULL, prefix = NULL, subjectScheme = NULL, classificationCode = NULL ) subject(x) <- value is.subject(x)
x |
A dataset object created with |
term |
A subject term, for example |
schemeURI |
URI of the subject identifier scheme, for example
|
valueURI |
URI of the subject term, for example
|
prefix |
Abbreviated prefix for a scheme URI, for example |
subjectScheme |
Name of the subject scheme, classification code, or authority if one is used. This acts as a namespace. |
classificationCode |
Classification code for schemes that do not have
|
value |
A subject object created by |
The subject property records what the dataset is about.
The DataCite subject property
allows multiple subproperties, but these cannot be stored directly in a
standard utils::bibentry
object.
Therefore:
If you set a character string as the subject, it is stored in both the
bibentry and the "subject"
attribute.
If you set a structured subject (via subject_create()
), the $term
value
is stored in the bibentry, and the full object is stored in the "subject"
attribute of the dataset_df
object.
subject(x)
returns:
a single "subject"
object if only one is present,
a list of "subject"
objects if multiple are present,
otherwise falls back to the plain string from the bibentry.
subject(x) <- value
accepts a character vector, a "subject"
object, or
a list of "subject"
objects, and updates both the bibentry slot and the
"subject"
attribute. Returns the dataset invisibly.
subject_create()
returns a structured "subject"
object — or a list of
them if multiple terms are provided.
is.subject(x)
returns TRUE
if x
inherits from class "subject"
.
Other bibliographic helper functions:
contributor()
,
creator()
,
dataset_format()
,
dataset_title()
,
description()
,
geolocation()
,
get_bibentry()
,
language
,
publication_year()
,
publisher()
,
relation()
,
rights()
# Set a structured subject subject(orange_df) <- subject_create( term = "Oranges", schemeURI = "http://id.loc.gov/authorities/subjects", valueURI = "http://id.loc.gov/authorities/subjects/sh85095257", subjectScheme = "LCCH", prefix = "lcch:" ) # Retrieve subject with subproperties subject(orange_df)
# Set a structured subject subject(orange_df) <- subject_create( term = "Oranges", schemeURI = "http://id.loc.gov/authorities/subjects", valueURI = "http://id.loc.gov/authorities/subjects/sh85095257", subjectScheme = "LCCH", prefix = "lcch:" ) # Retrieve subject with subproperties subject(orange_df)
Assigns a concept URI to a vector created with defined()
. This
method updates the concept
attribute and validates that the input is a single
character string or NULL.
var_concept(x, ...) var_concept(x) <- value ## Default S3 replacement method: var_concept(x) <- value
var_concept(x, ...) var_concept(x) <- value ## Default S3 replacement method: var_concept(x) <- value
x |
A vector to which the concept URI will be assigned. |
... |
Further parameters for inheritance, not in use. |
value |
A character string with a concept URI or NULL to remove the concept. |
get_variable_concepts()
is identical to var_concept()
.
The (linked) concept of the meaning of the data contained by a
vector constructed withdefined()
.
The modified vector with updated concept
metadata.
small_country_dataset <- dataset_df( country_name = defined(c("Andorra", "Lichtenstein"), label = "Country"), gdp = defined(c(3897, 7365), label = "Gross Domestic Product", unit = "million dollars" ) ) var_concept(small_country_dataset$country_name) <- "http://data.europa.eu/bna/c_6c2bb82d" var_concept(small_country_dataset$country_name) # To remove a concept definition of variable var_concept(small_country_dataset$country_name) <- NULL x <- defined(c(1, 2, 3), label = "Example Variable") var_concept(x) <- "http://example.org/concept/XYZ" var_concept(x)
small_country_dataset <- dataset_df( country_name = defined(c("Andorra", "Lichtenstein"), label = "Country"), gdp = defined(c(3897, 7365), label = "Gross Domestic Product", unit = "million dollars" ) ) var_concept(small_country_dataset$country_name) <- "http://data.europa.eu/bna/c_6c2bb82d" var_concept(small_country_dataset$country_name) # To remove a concept definition of variable var_concept(small_country_dataset$country_name) <- NULL x <- defined(c(1, 2, 3), label = "Example Variable") var_concept(x) <- "http://example.org/concept/XYZ" var_concept(x)
Adds or retrieves a human-readable label as a metadata attribute for a variable or vector. This label is useful for making variables easier to understand than their programmatic names (e.g., column names).
label_attribute()
is a low-level helper that retrieves the "label"
attribute
of an object without any fallback or printing logic. It is primarily used internally.
The var_label<-
assignment method sets or removes the "label"
attribute
of a vector or data frame column. This allows attaching human-readable
descriptions to variables for interpretability and downstream metadata use.
## S3 method for class 'defined' var_label(x, ...) label_attribute(x) var_label(x) <- value ## S3 replacement method for class 'haven_labelled_defined' var_label(x) <- value ## S3 method for class 'dataset_df' var_label( x, unlist = FALSE, null_action = c("keep", "fill", "skip", "na", "empty"), recurse = FALSE, ... )
## S3 method for class 'defined' var_label(x, ...) label_attribute(x) var_label(x) <- value ## S3 replacement method for class 'haven_labelled_defined' var_label(x) <- value ## S3 method for class 'dataset_df' var_label( x, unlist = FALSE, null_action = c("keep", "fill", "skip", "na", "empty"), recurse = FALSE, ... )
x |
A vector or data frame. |
... |
Further arguments passed to or used by methods. |
value |
A character string to assign as the label, or |
unlist |
For data frames, return a named vector instead of a list. |
null_action |
For data frames, controls how to handle columns without a variable label. Options are:
|
recurse |
If |
This interface builds on labelled::var_label()
and is compatible with
the defined()
infrastructure for semantic metadata (labels, namespaces,
units, and variable identifiers).
See labelled::var_label()
for low-level usage. For a comprehensive
guide to working with variable labels and semantic metadata, see:
vignette("defined", package = "dataset")
.
var_label(x)
returns the "label"
attribute of x
as a character string.
var_label(x) <- value
sets, removes, or replaces the label attribute of x
,
returning the updated object invisibly.
A character string if the "label"
attribute exists, or NULL
if not present.
The modified object x
, returned invisibly with the updated "label"
attribute.
labelled::var_label()
, var_labels()
, defined()
Other defined metadata methods and functions:
var_labels()
,
var_namespace()
,
var_unit()
# Retrieve the label attribute var_label(orange_df$circumference) # Set or update the label attribute var_label(orange_df$circumference) <- "circumference (breast height)" # Example: Retrieve variable labels from a dataset_df df <- dataset_df( id = defined(1:3, label = "Observation ID"), temp = defined(c(22.5, 23.0, 21.8), label = "Temperature (°C)"), site = defined(c("A", "B", "A")) ) # List form (default) var_label(df) # Character vector form var_label(df, unlist = TRUE, null_action = "empty") # Exclude variables without labels var_label(df, null_action = "skip") # Replace missing labels with column names var_label(df, null_action = "fill")
# Retrieve the label attribute var_label(orange_df$circumference) # Set or update the label attribute var_label(orange_df$circumference) <- "circumference (breast height)" # Example: Retrieve variable labels from a dataset_df df <- dataset_df( id = defined(1:3, label = "Observation ID"), temp = defined(c(22.5, 23.0, 21.8), label = "Temperature (°C)"), site = defined(c("A", "B", "A")) ) # List form (default) var_label(df) # Character vector form var_label(df, unlist = TRUE, null_action = "empty") # Exclude variables without labels var_label(df, null_action = "skip") # Replace missing labels with column names var_label(df, null_action = "fill")
Retrieve or assign labels for all variables (columns) in a dataset.
var_labels( x, unlist = FALSE, null_action = c("keep", "fill", "skip", "na", "empty") ) var_labels(x) <- value
var_labels( x, unlist = FALSE, null_action = c("keep", "fill", "skip", "na", "empty") ) var_labels(x) <- value
x |
A |
unlist |
Logical; if |
null_action |
How to handle columns without labels. One of:
|
value |
|
This is the dataset-level equivalent of var_label()
.
It works with any data.frame
-like object, including dataset_df()
, and
returns/sets the "label"
attribute of each column.
Labels are useful for storing human-readable descriptions of variables that may have short or cryptic column names.
For internal purposes, this function uses the "var_labels"
dataset
attribute and delegates to var_label()
and
var_label<-()
on individual columns.
Getter: a named list (or vector if unlist = TRUE
) of variable labels.
Setter: the modified x
with updated labels, returned invisibly.
Other defined metadata methods and functions:
var_label()
,
var_namespace()
,
var_unit()
df <- dataset_df( id = defined(1:3, label = "Observation ID"), temp = defined(c(22.5, 23.0, 21.8), label = "Temperature (°C)"), site = defined(c("A", "B", "A")) ) # Get all variable labels var_labels(df) # Set multiple labels at once var_labels(df) <- list(site = "Site code") # Return as a named vector with empty string for unlabeled vars var_labels(df, unlist = TRUE, null_action = "empty")
df <- dataset_df( id = defined(1:3, label = "Observation ID"), temp = defined(c(22.5, 23.0, 21.8), label = "Temperature (°C)"), site = defined(c("A", "B", "A")) ) # Get all variable labels var_labels(df) # Set multiple labels at once var_labels(df) <- list(site = "Site code") # Return as a named vector with empty string for unlabeled vars var_labels(df, unlist = TRUE, null_action = "empty")
Retrieve or assign the namespace part of a permanent, global variable identifier, independent of the current R session or instance.
var_namespace(x, ...) var_namespace(x) <- value get_variable_namespaces(x, ...) namespace_attribute(x) get_namespace_attribute(x) set_namespace_attribute(x, value) namespace_attribute(x) <- value
var_namespace(x, ...) var_namespace(x) <- value get_variable_namespaces(x, ...) namespace_attribute(x) get_namespace_attribute(x) set_namespace_attribute(x, value) namespace_attribute(x) <- value
x |
A vector. |
... |
Additional arguments for method compatibility with other classes. |
value |
A character string specifying the namespace, or |
The namespace
attribute is useful when working with remote, linked, or
open data sources. Variable identifiers in such datasets are often qualified
with a common namespace prefix. When combined, the prefix and namespace form
a persistent URI or IRI for the variable.
Retaining the namespace ensures the identifiers remain valid and resolvable during validation, merging, or future updates of the vector (such as when it is used as a column in a dataset).
get_variable_namespaces()
is an alias for var_namespace()
.
namespace_attribute()
and set_namespace_attribute()
are internal helpers.
For full usage, see:
vignette("defined", package = "dataset")
<U+2014> demonstrating integration of
variable labels, namespaces, units of measure, and machine-independent
identifiers.
A character string representing the namespace attribute of a vector
constructed with defined()
. Returns the updated object (in setter forms).
Other defined metadata methods and functions:
var_label()
,
var_labels()
,
var_unit()
# Define a vector with a namespace x <- defined("Q42", namespace = c(wd = "https://www.wikidata.org/wiki/")) # Get the namespace var_namespace(x) get_variable_namespaces(x) # Set the namespace var_namespace(x) <- "https://example.org/ns/" # Remove the namespace var_namespace(x) <- NULL # Use lower-level helpers (not typically used directly) namespace_attribute(x) namespace_attribute(x) <- "https://example.org/custom/"
# Define a vector with a namespace x <- defined("Q42", namespace = c(wd = "https://www.wikidata.org/wiki/")) # Get the namespace var_namespace(x) get_variable_namespaces(x) # Set the namespace var_namespace(x) <- "https://example.org/ns/" # Remove the namespace var_namespace(x) <- NULL # Use lower-level helpers (not typically used directly) namespace_attribute(x) namespace_attribute(x) <- "https://example.org/custom/"
Adds or retrieves a unit of measure (UoM) attribute to a vector. Units provide semantic meaning for numeric or character data — such as currency, weight, or time — helping prevent incorrect operations like merging values measured in incompatible units.
The var_unit<-
assignment method sets, updates, or removes the "unit"
attribute of a vector. This can be used with defined()
vectors or base
vectors to ensure consistent semantic annotation.
unit_attribute()
is a low-level helper to directly access the "unit"
attribute of a vector, without applying fallback logic. It is mainly used
internally.
get_unit_attribute()
is an alias for unit_attribute()
, included for naming
consistency in codebases that distinguish getter/setter patterns.
set_unit_attribute()
is the low-level assignment function that sets or
removes the "unit"
attribute of an object. Used internally by
unit_attribute<-
.
var_unit(x, ...) var_unit(x) <- value ## Default S3 replacement method: var_unit(x) <- value get_variable_units(x, ...) unit_attribute(x) get_unit_attribute(x) set_unit_attribute(x, value) unit_attribute(x) <- value
var_unit(x, ...) var_unit(x) <- value ## Default S3 replacement method: var_unit(x) <- value get_variable_units(x, ...) unit_attribute(x) get_unit_attribute(x) set_unit_attribute(x, value) unit_attribute(x) <- value
x |
A vector. |
... |
Further arguments for method extensions. |
value |
A single character string or |
The "unit"
attribute stores a machine-readable representation of a
unit of measure (e.g., "kg"
, "USD"
, "days"
). This is useful when
working with linked open data or when combining data from multiple sources
where silent mismatches in units could cause errors.
For full integration with semantic metadata (e.g., labels, concepts,
namespaces), use defined()
vectors or dataset_df()
objects.
get_variable_units()
is an alias for var_unit()
.
See vignette("defined", package = "dataset")
for end-to-end examples
involving semantic enrichment.
var_unit(x)
returns the "unit"
attribute as a character string.
var_unit(x) <- value
sets, updates, or removes the unit and returns
the modified vector invisibly.
The modified object x
, returned invisibly with the updated "unit"
attribute.
The "unit"
attribute of the object x
, or NULL
if not set.
The object x
with updated "unit"
attribute.
Other defined metadata methods and functions:
var_label()
,
var_labels()
,
var_namespace()
# Retrieve the unit of measure (if defined) var_unit(orange_df$circumference) # Regular data.frame columns have no unit by default var_unit(mtcars$wt) # Add a unit to a column var_unit(mtcars$wt) <- "1000 lbs" # Remove the unit var_unit(mtcars$wt) <- NULL
# Retrieve the unit of measure (if defined) var_unit(orange_df$circumference) # Regular data.frame columns have no unit by default var_unit(mtcars$wt) # Add a unit to a column var_unit(mtcars$wt) <- "1000 lbs" # Remove the unit var_unit(mtcars$wt) <- NULL
S3 method for vctrs::vec_cast()
that converts a
haven_labelled_defined
vector (created by defined()
) to a base
numeric
(double) vector, dropping all semantic metadata.
## S3 method for class 'haven_labelled_defined' vec_cast.double(x, to, ...)
## S3 method for class 'haven_labelled_defined' vec_cast.double(x, to, ...)
x |
|
to |
Target type (must be |
... |
Ignored; reserved for future use. |
A plain numeric (double) vector.
x <- defined(c(10, 20), unit = "kg") vctrs::vec_cast(x, double()) as.numeric(x)
x <- defined(c(10, 20), unit = "kg") vctrs::vec_cast(x, double()) as.numeric(x)
Converts R vectors, data frames, and dataset_df
objects to
XML Schema Definition (XSD)
compatible string representations such as xsd:decimal
, xsd:boolean
,
xsd:date
, and xsd:dateTime
.
xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'haven_labelled_defined' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'data.frame' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'dataset_df' xsd_convert(x, idcol = "rowid", shortform = TRUE, ...) ## S3 method for class 'tbl_df' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'character' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'numeric' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'integer' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'logical' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'factor' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'POSIXct' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'Date' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'difftime' xsd_convert(x, idcol = NULL, shortform = TRUE, ...)
xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'haven_labelled_defined' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'data.frame' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'dataset_df' xsd_convert(x, idcol = "rowid", shortform = TRUE, ...) ## S3 method for class 'tbl_df' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'character' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'numeric' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'integer' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'logical' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'factor' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'POSIXct' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'Date' xsd_convert(x, idcol = NULL, shortform = TRUE, ...) ## S3 method for class 'difftime' xsd_convert(x, idcol = NULL, shortform = TRUE, ...)
x |
An object (vector, data frame, tibble, or |
idcol |
Column name or position to use as row (observation) identifier.
If |
shortform |
Logical. If |
... |
Additional arguments passed to methods. |
This is primarily used for generating RDF-compatible typed literals.
For vectors, returns a character vector of typed literals.
For data frames or tibbles, returns a data frame with the same structure but with all values converted to XSD strings.
For dataset_df
objects, behaves like the data frame method but
preserves dataset-level attributes.
A character vector or data frame with values serialized as XSD-compatible RDF literals.
xsd_convert(42L) # integer -> xsd:integer xsd_convert(c(TRUE, FALSE, NA)) # logical -> xsd:boolean xsd_convert(Sys.Date()) # Date -> xsd:date xsd_convert(Sys.time()) # POSIXct -> xsd:dateTime xsd_convert(factor("apple")) # factor -> xsd:string xsd_convert(c("apple", "banana")) # character -> xsd:string
# Simple data frame with mixed types df <- data.frame( id = 1:2, value = c(3.14, 2.71), active = c(TRUE, FALSE), date = as.Date(c("2020-01-01", "2020-12-31")) ) # Short vs long-form URI: xsd_convert(120L, shortform = TRUE) xsd_convert(121L, shortform = FALSE)
# Simple data frame with mixed types df <- data.frame( id = 1:2, value = c(3.14, 2.71), active = c(TRUE, FALSE), date = as.Date(c("2020-01-01", "2020-12-31")) ) # Short vs long-form URI: xsd_convert(120L, shortform = TRUE) xsd_convert(121L, shortform = FALSE)