| Title: | Read and Write Frictionless Data Packages |
|---|---|
| Description: | Read and write Frictionless Data Packages. A 'Data Package' (<https://specs.frictionlessdata.io/data-package/>) is a simple container format and standard to describe and package a collection of (tabular) data. It is typically used to publish FAIR (<https://www.go-fair.org/fair-principles/>) and open datasets. |
| Authors: | Peter Desmet [aut, cre] (ORCID: <https://orcid.org/0000-0002-8442-8025>, affiliation: Research Institute for Nature and Forest (INBO)), Damiano Oldoni [aut] (ORCID: <https://orcid.org/0000-0003-3445-7562>, affiliation: Research Institute for Nature and Forest (INBO)), Pieter Huybrechts [aut] (ORCID: <https://orcid.org/0000-0002-6658-6062>, affiliation: Research Institute for Nature and Forest (INBO)), Sanne Govaert [aut] (ORCID: <https://orcid.org/0000-0002-8939-1305>, affiliation: Research Institute for Nature and Forest (INBO)), Kyle Husmann [ctb] (ORCID: <https://orcid.org/0000-0001-9875-8976>, affiliation: Pennsylvania State University), Research Institute for Nature and Forest (INBO) [cph] (ROR: <https://ror.org/00j54wy13>), Research Foundation - Flanders [fnd] (https://lifewatch.be), Beatriz Milz [rev] (ORCID: <https://orcid.org/0000-0002-3064-4486>), João Martins [rev] (ORCID: <https://orcid.org/0000-0001-7961-4280>) |
| Maintainer: | Peter Desmet <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.2.1.9000 |
| Built: | 2025-10-08 09:26:56 UTC |
| Source: | https://github.com/frictionlessdata/frictionless-r |
Adds a Data Resource to a Data Package.
The resource will be a Tabular Data Resource.
The resource name can only contain lowercase alphanumeric characters plus
., - and _.
add_resource( package, resource_name, data, schema = NULL, replace = FALSE, delim = ",", ... )add_resource( package, resource_name, data, schema = NULL, replace = FALSE, delim = ",", ... )
package |
Data Package object, as returned by |
resource_name |
Name of the Data Resource. |
data |
Data to attach, either a data frame or path(s) to CSV file(s):
|
schema |
Either a list, or path or URL to a JSON file describing a Table
Schema for the |
replace |
If |
delim |
Single character used to separate the fields in the CSV file(s),
e.g. |
... |
Additional metadata properties
to add to the resource, e.g. |
See vignette("data-resource") (and to a lesser extend
vignette("table-dialect")) to learn how this function implements the
Data Package standard.
package with one additional resource.
Other edit functions:
remove_resource()
# Load the example Data Package package <- example_package() # List the resources resource_names(package) # Create a data frame df <- data.frame( multimedia_id = c( "aed5fa71-3ed4-4284-a6ba-3550d1a4de8d", "da81a501-8236-4cbd-aa95-4bc4b10a05df" ), x = c(718, 748), y = c(860, 900) ) # Add the resource "positions" from the data frame package <- add_resource(package, "positions", data = df) # Add the resource "positions_with_schema", with a user-defined schema and title my_schema <- create_schema(df) package <- add_resource( package, resource_name = "positions_with_schema", data = df, schema = my_schema, title = "Positions with schema" ) # Replace the resource "observations" with a file-based resource (2 TSV files) path_1 <- system.file("extdata", "v1", "observations_1.tsv", package = "frictionless") path_2 <- system.file("extdata", "v1", "observations_2.tsv", package = "frictionless") package <- add_resource( package, resource_name = "observations", data = c(path_1, path_2), replace = TRUE, delim = "\t" ) # List the resources ("positions" and "positions_with_schema" added) resource_names(package)# Load the example Data Package package <- example_package() # List the resources resource_names(package) # Create a data frame df <- data.frame( multimedia_id = c( "aed5fa71-3ed4-4284-a6ba-3550d1a4de8d", "da81a501-8236-4cbd-aa95-4bc4b10a05df" ), x = c(718, 748), y = c(860, 900) ) # Add the resource "positions" from the data frame package <- add_resource(package, "positions", data = df) # Add the resource "positions_with_schema", with a user-defined schema and title my_schema <- create_schema(df) package <- add_resource( package, resource_name = "positions_with_schema", data = df, schema = my_schema, title = "Positions with schema" ) # Replace the resource "observations" with a file-based resource (2 TSV files) path_1 <- system.file("extdata", "v1", "observations_1.tsv", package = "frictionless") path_2 <- system.file("extdata", "v1", "observations_2.tsv", package = "frictionless") package <- add_resource( package, resource_name = "observations", data = c(path_1, path_2), replace = TRUE, delim = "\t" ) # List the resources ("positions" and "positions_with_schema" added) resource_names(package)
Check if an object is a Data Package object with the required properties.
check_package(package)check_package(package)
package |
Data Package object, as returned by |
package invisibly or an error.
# Load the example Data Package package <- example_package() # Check if the Data Package is valid (invisible return) check_package(package)# Load the example Data Package package <- example_package() # Check if the Data Package is valid (invisible return) check_package(package)
Initiates a Data Package object, either from scratch or from an existing list. This Data Package object is a list with the following characteristics:
A datapackage subclass.
All properties of the original descriptor.
A resources property, set to an empty list if undefined.
A directory property, set to "." for the current directory if
undefined.
It is used as the base path to access resources with read_resource().
create_package(descriptor = NULL)create_package(descriptor = NULL)
descriptor |
List to be made into a Data Package object. If undefined, an empty Data Package will be created from scratch. |
See vignette("data-package") to learn how this function implements the
Data Package standard.
check_package() is automatically called on the created package to make sure
it is valid.
A Data Package object.
Other create functions:
create_schema()
# Create a Data Package package <- create_package() package # See the structure of the (empty) Data Package str(package)# Create a Data Package package <- create_package() package # See the structure of the (empty) Data Package str(package)
Creates a Table Schema for a data frame, listing all column names and types as field names and (converted) types.
create_schema(data)create_schema(data)
data |
A data frame. |
See vignette("table-schema") to learn how this function implements the
Data Package standard.
List describing a Table Schema.
Other create functions:
create_package()
# Create a data frame df <- data.frame( id = c(as.integer(1), as.integer(2)), timestamp = c( as.POSIXct("2020-03-01 12:00:00", tz = "EET"), as.POSIXct("2020-03-01 18:45:00", tz = "EET") ), life_stage = factor(c("adult", "adult"), levels = c("adult", "juvenile")) ) # Create a Table Schema from the data frame schema <- create_schema(df) str(schema)# Create a data frame df <- data.frame( id = c(as.integer(1), as.integer(2)), timestamp = c( as.POSIXct("2020-03-01 12:00:00", tz = "EET"), as.POSIXct("2020-03-01 18:45:00", tz = "EET") ), life_stage = factor(c("adult", "adult"), levels = c("adult", "juvenile")) ) # Create a Table Schema from the data frame schema <- create_schema(df) str(schema)
Reads the example Data Package included in frictionless.
This dataset is used in examples, vignettes, and tests and contains dummy
camera trap data organized in 3 Data Resources:
deployments: one local data file referenced in
"path": "deployments.csv".
observations: two local data files referenced in
"path": ["observations_1.tsv", "observations_2.tsv"].
media: inline data stored in data.
example_package(version = "1.0")example_package(version = "1.0")
version |
Data Package standard version. |
The example Data Package is available in two versions:
1.0: specified as a Data Package v1.
2.0: specified as a Data Package v2.
A Data Package object, see create_package().
# Version 1 example_package() # Version 2 example_package(version = "2.0")# Version 1 example_package() # Version 2 example_package(version = "2.0")
Prints a human-readable summary of a Data Package, including its resources
and a link to more information (if provided in package$id).
## S3 method for class 'datapackage' print(x, ...)## S3 method for class 'datapackage' print(x, ...)
x |
Data Package object, as returned by |
... |
Further arguments, they are ignored by this function. |
print() with a summary of the Data Package object.
# Load the example Data Package package <- example_package() # Print a summary of the Data Package package # Or print(package)# Load the example Data Package package <- example_package() # Print a summary of the Data Package package # Or print(package)
datapackage.json)Reads information from a datapackage.json file, i.e. the descriptor file that
describes the Data Package metadata and its Data Resources.
read_package(file = "datapackage.json")read_package(file = "datapackage.json")
file |
Path or URL to a |
See vignette("data-package") to learn how this function implements the
Data Package standard.
A Data Package object, see create_package().
Other read functions:
read_resource()
# Read a datapackage.json file package <- read_package( system.file("extdata", "v1", "datapackage.json", package = "frictionless") ) package # Access the Data Package properties package$name package$created# Read a datapackage.json file package <- read_package( system.file("extdata", "v1", "datapackage.json", package = "frictionless") ) package # Access the Data Package properties package$name package$created
Reads data from a Data Resource (in a Data Package) into a tibble (a
Tidyverse data frame).
The resource must be a Tabular Data Resource.
The function uses readr::read_delim() to read CSV files, passing the
resource properties path, CSV dialect, column names, data types, etc.
Column names are taken from the provided Table Schema (schema), not from
the header in the CSV file(s).
read_resource(package, resource_name, col_select = NULL)read_resource(package, resource_name, col_select = NULL)
package |
Data Package object, as returned by |
resource_name |
Name of the Data Resource. |
col_select |
Character vector of the columns to include in the result, in the order provided. Selecting columns can improve read speed. |
See vignette("data-resource"), vignette("table-dialect") and
vignette("table-schema") to learn how this function implements the
Data Package standard.
A tibble::tibble() with the Data Resource's tabular data.
If there are parsing problems, a warning will alert you.
You can retrieve the full details by calling problems() on your data
frame.
Other read functions:
read_package()
# Read a datapackage.json file package <- read_package( system.file("extdata", "v1", "datapackage.json", package = "frictionless") ) package # Read data from the resource "observations" read_resource(package, "observations") # The above tibble is merged from 2 files listed in the resource path package$resources[[2]]$path # The column names and types are derived from the resource schema purrr::map_chr(package$resources[[2]]$schema$fields, "name") purrr::map_chr(package$resources[[2]]$schema$fields, "type") # Read data from the resource "deployments" with column selection read_resource(package, "deployments", col_select = c("latitude", "longitude"))# Read a datapackage.json file package <- read_package( system.file("extdata", "v1", "datapackage.json", package = "frictionless") ) package # Read data from the resource "observations" read_resource(package, "observations") # The above tibble is merged from 2 files listed in the resource path package$resources[[2]]$path # The column names and types are derived from the resource schema purrr::map_chr(package$resources[[2]]$schema$fields, "name") purrr::map_chr(package$resources[[2]]$schema$fields, "type") # Read data from the resource "deployments" with column selection read_resource(package, "deployments", col_select = c("latitude", "longitude"))
Removes a Data Resource from a Data Package, i.e. it removes one of the
described resources.
remove_resource(package, resource_name)remove_resource(package, resource_name)
package |
Data Package object, as returned by |
resource_name |
Name of the Data Resource. |
package with one fewer resource.
Other edit functions:
add_resource()
# Load the example Data Package package <- example_package() # List the resources resource_names(package) # Remove the resource "observations" package <- remove_resource(package, "observations") # List the resources ("observations" removed) resource_names(package)# Load the example Data Package package <- example_package() # List the resources resource_names(package) # Remove the resource "observations" package <- remove_resource(package, "observations") # List the resources ("observations" removed) resource_names(package)
Lists the names of the Data Resources included in a Data Package.
resource_names(package)resource_names(package)
package |
Data Package object, as returned by |
Character vector with the Data Resource names.
Other accessor functions:
schema()
# Load the example Data Package package <- example_package() # List the resources resource_names(package)# Load the example Data Package package <- example_package() # List the resources resource_names(package)
Returns the Table Schema of a Data Resource (in a Data Package), i.e. the
content of its schema property, describing the resource's fields, data
types, relationships, and missing values.
The resource must be a Tabular Data Resource.
schema(package, resource_name)schema(package, resource_name)
package |
Data Package object, as returned by |
resource_name |
Name of the Data Resource. |
See vignette("table-schema") to learn more about Table Schema.
List describing a Table Schema.
Other accessor functions:
resource_names()
# Load the example Data Package package <- example_package() # Get the Table Schema for the resource "observations" schema <- schema(package, "observations") str(schema)# Load the example Data Package package <- example_package() # Get the Table Schema for the resource "observations" schema <- schema(package, "observations") str(schema)
Writes a Data Package and its related Data Resources to disk as a
datapackage.json and CSV files.
Already existing CSV files of the same name will not be overwritten.
The function can also be used to download a Data Package in its entirety.
The Data Resources are handled as follows:
Resource path has at least one local path (e.g. deployments.csv):
CSV files are copied or downloaded to directory and path points to new
location of file(s).
Resource path has only URL(s): resource stays as is.
Resource has inline data originally: resource stays as is.
Resource has inline data as result of adding data with add_resource():
data are written to a CSV file using readr::write_csv(), path points to
location of file, data property is removed.
Use compress = TRUE to gzip those CSV files.
write_package(package, directory, compress = FALSE)write_package(package, directory, compress = FALSE)
package |
Data Package object, as returned by |
directory |
Path to local directory to write files to. |
compress |
If |
package invisibly, as written to file.
# Load the example Data Package from disk package <- read_package( system.file("extdata", "v1", "datapackage.json", package = "frictionless") ) package # Write the (unchanged) Data Package to disk write_package(package, directory = "my_directory") # Check files list.files("my_directory") # No files written for the "observations" resource, since those are all URLs. # No files written for the "media" resource, since it has inline data. # Clean up (don't do this if you want to keep your files) unlink("my_directory", recursive = TRUE)# Load the example Data Package from disk package <- read_package( system.file("extdata", "v1", "datapackage.json", package = "frictionless") ) package # Write the (unchanged) Data Package to disk write_package(package, directory = "my_directory") # Check files list.files("my_directory") # No files written for the "observations" resource, since those are all URLs. # No files written for the "media" resource, since it has inline data. # Clean up (don't do this if you want to keep your files) unlink("my_directory", recursive = TRUE)