Title: | Tools to Manipulate and Query Semantic Data |
---|---|
Description: | The Resource Description Framework, or 'RDF' is a widely used data representation model that forms the cornerstone of the Semantic Web. 'RDF' represents data as a graph rather than the familiar data table or rectangle of relational databases. The 'rdflib' package provides a friendly and concise user interface for performing common tasks on 'RDF' data, such as reading, writing and converting between the various serializations of 'RDF' data, including 'rdfxml', 'turtle', 'nquads', 'ntriples', and 'json-ld'; creating new 'RDF' graphs, and performing graph queries using 'SPARQL'. This package wraps the low level 'redland' R package which provides direct bindings to the 'redland' C library. Additionally, the package supports the newer and more developer friendly 'JSON-LD' format through the 'jsonld' package. The package interface takes inspiration from the Python 'rdflib' library. |
Authors: | Carl Boettiger [aut, cre, cph] , Bryce Mecum [rev] , Anna Krystalli [rev] , Viktor Senderov [ctb] |
Maintainer: | Carl Boettiger <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.9 |
Built: | 2024-11-27 03:09:20 UTC |
Source: | https://github.com/ropensci/rdflib |
The Resource Description Framework, or RDF is a widely used data representation model that forms the cornerstone of the Semantic Web. 'RDF' represents data as a graph rather than the familiar data table or rectangle of relational databases.
It has three main goals:
Easily read, write, and convert between all major RDF serialization formats
Support SPARQL queries to extract data from an RDF graph into a data.frame
Support JSON-LD format as a first-class citizen in RDF manipulations
For more information, see the Wikipedia pages for RDF, SPARQL, and JSON-LD:
To learn more about rdflib, start with the vignettes:
browseVignettes(package = "rdflib")
Configurations via options()
rdf_print_format
:
NULL or "nquads" (default)
any valid serializer name: e.g. "rdfxml", "jsonld", "turtle", "ntriples"
rdf_base_uri
:
Default base URI to use (when serializing JSON-LD only at this time) default is "localhost://"
rdf_max_print
:
maximum number of lines to print from rdf, default 10
Maintainer: Carl Boettiger [email protected] (ORCID) [copyright holder]
Other contributors:
Bryce Mecum (ORCID) [reviewer]
Anna Krystalli (ORCID) [reviewer]
Viktor Senderov [email protected] (ORCID) [contributor]
Useful links:
Coerce an object into RDF
as_rdf( x, rdf = NULL, prefix = NULL, base = getOption("rdf_base_uri", "localhost://"), context = NULL, key_column = NULL )
as_rdf( x, rdf = NULL, prefix = NULL, base = getOption("rdf_base_uri", "localhost://"), context = NULL, key_column = NULL )
x |
an object to coerce into RDF (list, list-like, or data.frame) |
rdf |
An existing rdf object, (by default a new object will be initialized) |
prefix |
A default vocabulary (URI prefix) to assume for all predicates |
base |
A base URI to assume for blank subject nodes |
context |
a named list mapping any string to a URI |
key_column |
name of a column which should be treated as the primary key in a table. must be unique |
as_rdf(mtcars) as_rdf(list(repo = "rdflib", owner = list("id", "ropensci")))
as_rdf(mtcars) as_rdf(list(repo = "rdflib", owner = list("id", "ropensci")))
All subsequent rdf objects will be appended to the first rdf object Note: this does not free memory from any of the individual rdf objects Note: It is generally better to avoid the use of this function by passing an existing rdf object to and rdf_parse or rdf_add objects. Multiple active rdf objects can cause problems when using disk-based storage backends.
## S3 method for class 'rdf' c(...)
## S3 method for class 'rdf' c(...)
... |
objects to be concatenated |
rdf
ObjectInitialize an rdf
Object
rdf( storage = c("memory", "BDB", "sqlite", "postgres", "mysql", "virtuoso"), host = NULL, port = NULL, user = NULL, password = NULL, database = NULL, charset = NULL, dir = NULL, dsn = "Local Virtuoso", name = "rdflib", new_db = FALSE, fallback = TRUE )
rdf( storage = c("memory", "BDB", "sqlite", "postgres", "mysql", "virtuoso"), host = NULL, port = NULL, user = NULL, password = NULL, database = NULL, charset = NULL, dir = NULL, dsn = "Local Virtuoso", name = "rdflib", new_db = FALSE, fallback = TRUE )
storage |
Storage backend to use; see details |
host |
host address for mysql, postgres, or virtuoso storage |
port |
port for mysql (mysql storage defaults to mysql standard port, 3306) or postgres (postgres storage defaults to postgres standard port, 4321) |
user |
user name for postgres, mysql, or virtuoso |
password |
password for postgres, mysql, or virtuoso |
database |
name of the database to be created/used |
charset |
charset for virtuoso database, if desired |
dir |
directory of where to write sqlite or berkeley database. |
dsn |
Virtuoso dsn, either "Local Virtuoso" or "Remote Virtuoso" |
name |
name for the storage object created. Default is usually fine. |
new_db |
logical, default FALSE. Create new database or connect to existing? |
fallback |
logical, default TRUE. If requested storage system cannot initialize,
should |
an rdf Object is a list of class 'rdf', consisting of
three pointers to external C objects managed by the redland library.
These are the world
object: basically a top-level pointer for
all RDF models, and a model
object: a collection of RDF statements,
and a storage
object, indicating how these statements are stored.
rdflib
defaults to an in-memory hash-based storage structure.
which should be best for most use cases. For very large triplestores,
disk-based storage will be necessary. Enabling external storage devices
will require additional libraries and custom compiling. See the storage
vignette for details.
an rdf object
x <- rdf()
x <- rdf()
add a triple (subject, predicate, object) to the RDF graph
rdf_add( rdf, subject, predicate, object, subjectType = as.character(NA), objectType = as.character(NA), datatype_uri = as.character(NA) )
rdf_add( rdf, subject, predicate, object, subjectType = as.character(NA), objectType = as.character(NA), datatype_uri = as.character(NA) )
rdf |
an rdf object |
subject |
character string containing the subject |
predicate |
character string containing the predicate |
object |
character string containing the object |
subjectType |
the Node type of the subject, i.e. "uri", "blank" |
objectType |
the Node type of the object, i.e. "literal", "uri", "blank" |
datatype_uri |
the datatype URI to associate with a object literal value |
rdf_add()
will automatically 'duck type' nodes (if looks like a duck...).
That is, strings that look like URIs will be declared as URIs. (See
URI).
Predicate should always be a URI (e.g. URL or a prefix:string
),
cannot be blank or literal. Subjects that look like strings will be
treated as Blank Nodes (i.e.
will be prefixed with _:
). An empty subject, ""
, will create a
blank node with random name. Objects that look like URIs will be
typed as resource nodes, otherwise as literals. An empty object ""
will be treated as blank node. Set subjectType
or objectType
explicitly to override this behavior, e.g. to treat an object URI
as a literal string. NAs are also treated as blank nodes in subject
or object See examples for details.
Silently returns the updated RDF graph (rdf object). Since the rdf object simply contains external pointers to the model object in C code, note that the input object is modified directly, so you need not assign the output of rdf_add() to anything.
https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
rdf <- rdf() rdf_add(rdf, subject="http://www.dajobe.org/", predicate="http://purl.org/dc/elements/1.1/language", object="en") ## non-URI string in subject indicates a blank subject ## (prefixes to "_:b0") rdf_add(rdf, "b0", "http://schema.org/jobTitle", "Professor") ## identically a blank subject. ## Note rdf is unchanged when we add the same triple twice. rdf_add(rdf, "b0", "http://schema.org/jobTitle", "Professor", subjectType = "blank") ## blank node with empty string creates a default blank node id rdf_add(rdf, "", "http://schema.org/jobTitle", "Professor") ## Subject and Object both recognized as URI resources: rdf_add(rdf, "https://orcid.org/0000-0002-1642-628X", "http://schema.org/homepage", "http://carlboettiger.info") ## Force object to be literal, not URI resource rdf_add(rdf, "https://orcid.org/0000-0002-1642-628X", "http://schema.org/homepage", "http://carlboettiger.info", objectType = "literal")
rdf <- rdf() rdf_add(rdf, subject="http://www.dajobe.org/", predicate="http://purl.org/dc/elements/1.1/language", object="en") ## non-URI string in subject indicates a blank subject ## (prefixes to "_:b0") rdf_add(rdf, "b0", "http://schema.org/jobTitle", "Professor") ## identically a blank subject. ## Note rdf is unchanged when we add the same triple twice. rdf_add(rdf, "b0", "http://schema.org/jobTitle", "Professor", subjectType = "blank") ## blank node with empty string creates a default blank node id rdf_add(rdf, "", "http://schema.org/jobTitle", "Professor") ## Subject and Object both recognized as URI resources: rdf_add(rdf, "https://orcid.org/0000-0002-1642-628X", "http://schema.org/homepage", "http://carlboettiger.info") ## Force object to be literal, not URI resource rdf_add(rdf, "https://orcid.org/0000-0002-1642-628X", "http://schema.org/homepage", "http://carlboettiger.info", objectType = "literal")
Free Memory Associated with RDF object
rdf_free(rdf, rm = TRUE)
rdf_free(rdf, rm = TRUE)
rdf |
an rdf object |
rm |
logical, default TRUE. Remove pointer from parent.frame()? Usually a good idea since referring to a pointer after it has been removed can crash R. |
Free all pointers associated with an rdf object. Frees memory associated with the storage, world, and model objects.
rdf <- rdf() rdf_free(rdf) rm(rdf)
rdf <- rdf() rdf_free(rdf) rm(rdf)
Detect whether Berkeley Database for disk-based storage of RDF graphs
is available. Disk-based storage requires redland package
to be installed from source with support for the Berkeley DB
(libdb-dev on Ubuntu, berkeley-db on homebrew), otherwise rdf()
will
fall back to in-memory storage with a warning.
rdf_has_bdb()
rdf_has_bdb()
TRUE if BDB support is detected, false otherwise
rdf_has_bdb()
rdf_has_bdb()
Parse RDF Files
rdf_parse( doc, format = c("guess", "rdfxml", "nquads", "ntriples", "turtle", "jsonld"), rdf = NULL, base = getOption("rdf_base_uri", "localhost://"), ... )
rdf_parse( doc, format = c("guess", "rdfxml", "nquads", "ntriples", "turtle", "jsonld"), rdf = NULL, base = getOption("rdf_base_uri", "localhost://"), ... )
doc |
path, URL, or literal string of the rdf document to parse |
format |
rdf serialization format of the doc, one of "rdfxml", "nquads", "ntriples", "turtle" or "jsonld". If not provided, will try to guess based on file extension and fall back on rdfxml. |
rdf |
an existing rdf triplestore to extend with triples from the parsed file. Default will create a new rdf object. |
base |
the base URI to assume for any relative URIs (blank nodes) |
... |
additional parameters (not implemented) |
an rdf object, containing the redland world and model objects
doc <- system.file("extdata", "dc.rdf", package="redland") rdf <- rdf_parse(doc)
doc <- system.file("extdata", "dc.rdf", package="redland") rdf <- rdf_parse(doc)
Perform a SPARQL Query
rdf_query(rdf, query, data.frame = TRUE, ...)
rdf_query(rdf, query, data.frame = TRUE, ...)
rdf |
an rdf object (e.g. from |
query |
a SPARQL query, as text string |
data.frame |
logical, should the results be returned as a data.frame? |
... |
additional arguments to a redland initialize-Query |
a data.frame of all query results (default.) Columns will
be named according to variable names in the SPARQL query. Returned
object values will be coerced to match the corresponding R type
to any associated datatype URI, if provided. If a column would
result in mixed classes (e.g. strings and numerics), all types
in the column will be coerced to character strings. If data.frame
is false, results will be returned as a list with each element
typed by its data URI.
doc <- system.file("extdata", "dc.rdf", package="redland") sparql <- 'PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?a ?c WHERE { ?a dc:creator ?c . }' rdf <- rdf_parse(doc) rdf_query(rdf, sparql)
doc <- system.file("extdata", "dc.rdf", package="redland") sparql <- 'PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?a ?c WHERE { ?a dc:creator ?c . }' rdf <- rdf_parse(doc) rdf_query(rdf, sparql)
Serialize an RDF Document
rdf_serialize( rdf, doc = NULL, format = c("guess", "rdfxml", "nquads", "ntriples", "turtle", "jsonld"), namespace = NULL, prefix = names(namespace), base = getOption("rdf_base_uri", "localhost://"), ... )
rdf_serialize( rdf, doc = NULL, format = c("guess", "rdfxml", "nquads", "ntriples", "turtle", "jsonld"), namespace = NULL, prefix = names(namespace), base = getOption("rdf_base_uri", "localhost://"), ... )
rdf |
an existing rdf triplestore to extend with triples from the parsed file. Default will create a new rdf object. |
doc |
file path to write out to. If null, will write to character. |
format |
rdf serialization format of the doc, one of "rdfxml", "nquads", "ntriples", "turtle" or "jsonld". If not provided, will try to guess based on file extension and fall back on rdfxml. |
namespace |
a named character containing the prefix to namespace bindings. |
prefix |
(optional) for backward compatibility. See |
base |
the base URI to assume for any relative URIs (blank nodes) |
... |
additional arguments to |
rdf_serialize returns the output file path doc
invisibly.
This makes it easier to use rdf_serialize in pipe chains with
rdf_parse
.
infile <- system.file("extdata", "dc.rdf", package="redland") out <- tempfile("file", fileext = ".rdf") some_rdf <- rdf_parse(infile) rdf_add(some_rdf, subject = "http://www.dajobe.org/dave-beckett", predicate = "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", object = "http://xmlns.com/foaf/0.1/Person") rdf_serialize(some_rdf, out) ## With a namespace rdf_serialize(some_rdf, out, format = "turtle", namespace = c(dc = "http://purl.org/dc/elements/1.1/", foaf = "http://xmlns.com/foaf/0.1/") ) readLines(out)
infile <- system.file("extdata", "dc.rdf", package="redland") out <- tempfile("file", fileext = ".rdf") some_rdf <- rdf_parse(infile) rdf_add(some_rdf, subject = "http://www.dajobe.org/dave-beckett", predicate = "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", object = "http://xmlns.com/foaf/0.1/Person") rdf_serialize(some_rdf, out) ## With a namespace rdf_serialize(some_rdf, out, format = "turtle", namespace = c(dc = "http://purl.org/dc/elements/1.1/", foaf = "http://xmlns.com/foaf/0.1/") ) readLines(out)
read an nquads file
read_nquads(file, ...)
read_nquads(file, ...)
file |
path to nquads file |
... |
additional arguments to |
an rdf object. See rdf_parse()
tmp <- tempfile(fileext = ".nq") library(datasets) write_nquads(iris, tmp) read_nquads(tmp)
tmp <- tempfile(fileext = ".nq") library(datasets) write_nquads(iris, tmp) read_nquads(tmp)
write object out as nquads
write_nquads(x, file, ...)
write_nquads(x, file, ...)
x |
an object that can be represented as nquads |
file |
output filename |
... |
additional parameters, see examples |
tmp <- tempfile(fileext = ".nq") library(datasets) ## convert data.frame to nquads write_nquads(iris, tmp) rdf <- read_nquads(tmp) ## or starting a native rdf object write_nquads(rdf, tempfile(fileext = ".nq"))
tmp <- tempfile(fileext = ".nq") library(datasets) ## convert data.frame to nquads write_nquads(iris, tmp) rdf <- read_nquads(tmp) ## or starting a native rdf object write_nquads(rdf, tempfile(fileext = ".nq"))