Title: | Build Reproducible Analytical Pipelines with Nix |
---|---|
Description: | Streamlines the creation of reproducible analytical pipelines using `default.nix` expressions generated via `{rix}` for reproducibility. Define derivations in R, Python or Julia, chain them into a composition of pure functions and build the resulting pipeline using `Nix` as the underlying end-to-end build tool. Functions to plot a DAG representation of the pipeline are included, as well as functions to load and inspect intermediary results for interactive analysis. User experience heavily inspired by the `{targets}` package. |
Authors: | Bruno Rodrigues [aut, cre] (ORCID: <https://orcid.org/0000-0002-3211-3689>), William Michael Landau [rev] (William reviewed the package (v. 0.2.0) for rOpenSci, see <https://github.com/ropensci/software-review/issues/706>), Anthony Martinez [rev] (ORCID: <https://orcid.org/0000-0002-4295-0261>, Anthony reviewed the package (v. 0.2.0) for rOpenSci, see <https://github.com/ropensci/software-review/issues/625>) |
Maintainer: | Bruno Rodrigues <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.3.0 |
Built: | 2025-09-16 16:30:09 UTC |
Source: | https://github.com/ropensci/rixpress |
This function appends a specified import statement to the end of each Python
file within the _rixpress
folder and its subdirectories, but only for files
whose base name matches the provided Nix environment.
add_import(import_statement, nix_env, project_path = ".")
add_import(import_statement, nix_env, project_path = ".")
import_statement |
A character string representing the import statement
to be added. For example, |
nix_env |
A character string naming the Nix environment file (e.g.
|
project_path |
Path to root of project, typically ".". |
No return value; the function performs in-place modifications of the files.
Other python import:
adjust_import()
## Not run: # Assuming project is in current working directory add_import("import numpy as np", "default.nix") # If project is elsewhere: # add_import("import numpy as np", "default.nix", project_path = "path/to/project") ## End(Not run)
## Not run: # Assuming project is in current working directory add_import("import numpy as np", "default.nix") # If project is elsewhere: # add_import("import numpy as np", "default.nix", project_path = "path/to/project") ## End(Not run)
When calling rxp_populate()
, a file containing Python import statements is
automatically generated inside the _rixpress
folder. For example, if the
numpy
package is needed, the file will include a line like
"import numpy"
. However, Python programmers often write
"import numpy as np"
instead.
adjust_import(old_import, new_import, project_path = ".")
adjust_import(old_import, new_import, project_path = ".")
old_import |
A character string representing the import statement to be
replaced. For example, |
new_import |
A character string representing the new import statement to
replace with. For example, |
project_path |
Path to root of project, typically ".". |
In some cases, the correct import statement is entirely different. For
example, for the pillow
package, the generated file will contain
"import pillow"
, which is incorrect—Python code should import from the
PIL
namespace instead, e.g., "from PIL import Image"
.
Because these adjustments cannot be automated reliably, the adjust_import()
function allows you to search and replace import statements programmatically.
It reads each file in the _rixpress
folder, performs the replacement, and
writes the modified content back to the file.
No return value; the function performs in-place modifications of the files.
Other python import:
add_import()
## Not run: # Assuming your project is in the current working directory adjust_import("import pillow", "from PIL import Image") # If project is elsewhere: # adjust_import("import pillow", "from PIL import Image", project_path = "path/to/project") ## End(Not run)
## Not run: # Assuming your project is in the current working directory adjust_import("import pillow", "from PIL import Image") # If project is elsewhere: # adjust_import("import pillow", "from PIL import Image", project_path = "path/to/project") ## End(Not run)
Print method for derivation objects
## S3 method for class 'rxp_derivation' print(x, ...)
## S3 method for class 'rxp_derivation' print(x, ...)
x |
An object of class "rxp_derivation" |
... |
Additional arguments passed to print methods |
Nothing, prints a summary of the derivation object to the console.
Other utilities:
rxp_copy()
,
rxp_gc()
,
rxp_init()
,
rxp_inspect()
,
rxp_list_logs()
,
rxp_load()
,
rxp_read()
,
rxp_trace()
## Not run: # d0 is a previously defined derivation print(d0) ## End(Not run)
## Not run: # d0 is a previously defined derivation print(d0) ## End(Not run)
When Nix builds a derivation, its output is saved in the
Nix store located under /nix/store/
. Even though you can import the
derivations into the current R session using rxp_read()
or rxp_load()
,
it can be useful to copy the outputs to the current working directory. This
is especially useful for Quarto documents, where there can be more than one
input, as is the case for html
output.
rxp_copy(derivation_name = NULL, dir_mode = "0755", file_mode = "0644")
rxp_copy(derivation_name = NULL, dir_mode = "0755", file_mode = "0644")
derivation_name |
The name of the derivation to copy. If empty, then all the derivations are copied. |
dir_mode |
Character, default "0755". POSIX permission mode to apply to directories under the copied output (including the top-level output directory). |
file_mode |
Character, default "0644". POSIX permission mode to apply to files under the copied output. |
Nothing, the contents of the Nix store are copied to the current working directory.
Other utilities:
print.rxp_derivation()
,
rxp_gc()
,
rxp_init()
,
rxp_inspect()
,
rxp_list_logs()
,
rxp_load()
,
rxp_read()
,
rxp_trace()
## Not run: # Copy all derivations to the current working directory rxp_copy() # Copy a specific derivation rxp_copy("mtcars") # Copy with custom permissions (e.g., make scripts executable) rxp_copy("my_deriv", dir_mode = "0755", file_mode = "0644") # Copy a Quarto document output with multiple files rxp_copy("my_quarto_doc") ## End(Not run)
## Not run: # Copy all derivations to the current working directory rxp_copy() # Copy a specific derivation rxp_copy("mtcars") # Copy with custom permissions (e.g., make scripts executable) rxp_copy("my_deriv", dir_mode = "0755", file_mode = "0644") # Copy a Quarto document output with multiple files rxp_copy("my_quarto_doc") ## End(Not run)
This function generates a DOT file representation of the
pipeline DAG, suitable for visualization, potentially on CI platforms. It
is called by rxp_ga()
.
rxp_dag_for_ci( nodes_and_edges = get_nodes_edges(), output_file = "_rixpress/dag.dot" )
rxp_dag_for_ci( nodes_and_edges = get_nodes_edges(), output_file = "_rixpress/dag.dot" )
nodes_and_edges |
List, output of |
output_file |
Character, the path where the DOT file should be saved.
Defaults to |
Nothing, writes the DOT file to the specified output_file
.
Other ci utilities:
rxp_ga()
,
rxp_write_dag()
## Not run: # Generate the default _rixpress/dag.dot rxp_dag_for_ci() ## End(Not run)
## Not run: # Generate the default _rixpress/dag.dot rxp_dag_for_ci() ## End(Not run)
Creates a single archive file containing the specified Nix store paths and their dependencies. This archive can be transferred to another machine and imported into its Nix store.
rxp_export_artifacts( archive_file = "_rixpress/pipeline_outputs.nar", which_log = NULL, project_path = "." )
rxp_export_artifacts( archive_file = "_rixpress/pipeline_outputs.nar", which_log = NULL, project_path = "." )
archive_file |
Character, path to the archive, defaults to "_rixpress/pipeline-outputs.nar" |
which_log |
Character or NULL, regex pattern to match a specific log file. If NULL (default), the most recent log file will be used. |
project_path |
Character, defaults to ".". Path to the root directory of the project. |
Nothing, creates an archive file at the specified location.
Other archive caching functions:
rxp_import_artifacts()
## Not run: # Export the most recent build to the default location rxp_export_artifacts() # Export a specific build to a custom location rxp_export_artifacts( archive_file = "my_archive.nar", which_log = "20250510" ) ## End(Not run)
## Not run: # Export the most recent build to the default location rxp_export_artifacts() # Export a specific build to a custom location rxp_export_artifacts( archive_file = "my_archive.nar", which_log = "20250510" ) ## End(Not run)
Run a pipeline on GitHub Actions
rxp_ga()
rxp_ga()
This function puts a .yaml
file inside the .github/workflows/
folder on the root of your project. This workflow file expects both
scripts generated by rxp_init()
, gen-env.R
and gen-pipeline.R
to be
present. If that's not the case, edit the .yaml
file accordingly. Build
artifacts are archived and restored automatically between runs. Make sure
to give read and write permissions to the GitHub Actions bot.
Nothing, copies file to a directory.
Other ci utilities:
rxp_dag_for_ci()
,
rxp_write_dag()
## Not run: rxp_ga() ## End(Not run)
## Not run: rxp_ga() ## End(Not run)
This function performs garbage collection on Nix store paths and build log files
generated by rixpress. It can operate in two modes: full garbage collection
(when keep_since = NULL
) or targeted deletion based on log file age.
rxp_gc( keep_since = NULL, project_path = ".", dry_run = FALSE, timeout_sec = 300, verbose = FALSE, ask = TRUE )
rxp_gc( keep_since = NULL, project_path = ".", dry_run = FALSE, timeout_sec = 300, verbose = FALSE, ask = TRUE )
keep_since |
Date or character string (YYYY-MM-DD format). If provided,
only build logs older than this date will be targeted for deletion, along
with their associated Nix store paths. If |
project_path |
Character string specifying the path to the project
directory containing the |
dry_run |
Logical. If |
timeout_sec |
Numeric. Timeout in seconds for individual Nix commands. Also used for concurrency lock expiration. Default is 300 seconds. |
verbose |
Logical. If |
ask |
Logical. If |
The function operates in two modes:
Full Garbage Collection Mode (keep_since = NULL
):
Runs nix-store --gc
to delete all unreferenced store paths
Does not delete any build log files
Suitable for complete cleanup of unused Nix store paths
Targeted Deletion Mode (keep_since
specified):
Identifies build logs older than the specified date
Extracts store paths from old logs using rxp_inspect()
Protects recent store paths by creating temporary GC roots
Attempts to delete old store paths individually using nix-store --delete
Deletes the corresponding build log .json
files from _rixpress/
Handles referenced paths gracefully (paths that cannot be deleted due to dependencies)
Concurrency Safety: The function uses a lock file mechanism to prevent multiple instances from running simultaneously, which could interfere with each other's GC root management.
Reference Handling: Some store paths may not be deletable because they are still referenced by:
User or system profile generations
Active Nix shell environments
Result symlinks in project directories
Other store paths that depend on them
These paths are reported but not considered errors.
Invisibly returns a list with cleanup summary information:
kept
: Vector of build log filenames that were kept
deleted
: Vector of build log filenames targeted for deletion
protected
: Number of store paths protected via GC roots (date-based mode)
deleted_count
: Number of store paths successfully deleted
failed_count
: Number of store paths that failed to delete
referenced_count
: Number of store paths skipped due to references
log_files_deleted
: Number of build log files successfully deleted
log_files_failed
: Number of build log files that failed to delete
dry_run_details
: List of detailed information when dry_run = TRUE
Other utilities:
print.rxp_derivation()
,
rxp_copy()
,
rxp_init()
,
rxp_inspect()
,
rxp_list_logs()
,
rxp_load()
,
rxp_read()
,
rxp_trace()
## Not run: # Preview what would be deleted (dry run) rxp_gc(keep_since = "2025-08-01", dry_run = TRUE, verbose = TRUE) # Delete artifacts from builds older than August 1st, 2025 rxp_gc(keep_since = "2025-08-01") # Full garbage collection of all unreferenced store paths rxp_gc() # Clean up artifacts older than 30 days ago rxp_gc(keep_since = Sys.Date() - 30) ## End(Not run)
## Not run: # Preview what would be deleted (dry run) rxp_gc(keep_since = "2025-08-01", dry_run = TRUE, verbose = TRUE) # Delete artifacts from builds older than August 1st, 2025 rxp_gc(keep_since = "2025-08-01") # Full garbage collection of all unreferenced store paths rxp_gc() # Clean up artifacts older than 30 days ago rxp_gc(keep_since = Sys.Date() - 30) ## End(Not run)
{ggplot2}
Uses {ggdag}
to generate the plot. {ggdag}
is a soft
dependency of {rixpress}
so you need to install it to use this
function.
rxp_ggdag(nodes_and_edges = get_nodes_edges())
rxp_ggdag(nodes_and_edges = get_nodes_edges())
nodes_and_edges |
List, output of |
A {ggplot2}
object.
Other visualisation functions:
rxp_visnetwork()
## Not run: rxp_ggdag() ## End(Not run)
## Not run: rxp_ggdag() ## End(Not run)
Imports the store paths contained in an archive file into the local Nix store. Useful for transferring built outputs between machines.
rxp_import_artifacts(archive_file = "_rixpress/pipeline_outputs.nar")
rxp_import_artifacts(archive_file = "_rixpress/pipeline_outputs.nar")
archive_file |
Character, path to the archive, defaults to "_rixpress/pipeline-outputs.nar" |
Nothing, imports the archive contents into the local Nix store.
Other archive caching functions:
rxp_export_artifacts()
## Not run: # Import from the default archive location rxp_import_artifacts() # Import from a custom archive file rxp_import_artifacts("path/to/my_archive.nar") ## End(Not run)
## Not run: # Import from the default archive location rxp_import_artifacts() # Import from a custom archive file rxp_import_artifacts("path/to/my_archive.nar") ## End(Not run)
Generates gen-env.R
and gen-pipeline.R
scripts in the specified project
directory, after asking the user for confirmation. If the user declines, no
changes are made.
rxp_init(project_path = ".", skip_prompt = FALSE)
rxp_init(project_path = ".", skip_prompt = FALSE)
project_path |
Character string specifying the project's path. |
skip_prompt |
Logical. If TRUE, skips all confirmation prompts and proceeds with initialization, useful on continuous integration. Defaults to FALSE. |
Creates (overwriting if they already exist):
gen-env.R
: Script to define an execution environment with {rix}
.
gen-pipeline.R
: Defines a data pipeline with {rixpress}
.
Logical. Returns TRUE if initialization was successful, FALSE if the operation was cancelled by the user.
Other utilities:
print.rxp_derivation()
,
rxp_copy()
,
rxp_gc()
,
rxp_inspect()
,
rxp_list_logs()
,
rxp_load()
,
rxp_read()
,
rxp_trace()
# Default usage (will prompt before any action) ## Not run: rxp_init() ## End(Not run)
# Default usage (will prompt before any action) ## Not run: rxp_init() ## End(Not run)
Returns a data frame with four columns:
- derivation: the name of the derivation
- build_success: whether the build was successful or not
- path: the path of this derivation in the Nix store
- output: the output, if this derivation was built successfully.
Empty outputs mean that this derivation was not built
successfully. Several outputs for a single derivation
are possible.
In the derivation
column you will find an object called all-derivations
.
This object is generated automatically for internal purposes, and you can
safely ignore it.
rxp_inspect(project_path = ".", which_log = NULL)
rxp_inspect(project_path = ".", which_log = NULL)
project_path |
Character, defaults to ".". Path to the root directory of the project. |
which_log |
Character, defaults to NULL. If NULL the most recent build log is used. If a string is provided, it's used as a regular expression to match against available log files. |
A data frame with derivation names, if their build was successful, their paths in the /nix/store, and their build outputs.
Other utilities:
print.rxp_derivation()
,
rxp_copy()
,
rxp_gc()
,
rxp_init()
,
rxp_list_logs()
,
rxp_load()
,
rxp_read()
,
rxp_trace()
## Not run: # Inspect the most recent build build_results <- rxp_inspect() # Inspect a specific build log build_results <- rxp_inspect(which_log = "20250510") # Check which derivations failed failed <- subset(build_results, !build_success) ## End(Not run)
## Not run: # Inspect the most recent build build_results <- rxp_inspect() # Inspect a specific build log build_results <- rxp_inspect(which_log = "20250510") # Check which derivations failed failed <- subset(build_results, !build_success) ## End(Not run)
Create a Nix expression running a Julia function
rxp_jl( name, expr, additional_files = "", user_functions = "", nix_env = "default.nix", encoder = NULL, decoder = NULL, env_var = NULL, noop_build = FALSE )
rxp_jl( name, expr, additional_files = "", user_functions = "", nix_env = "default.nix", encoder = NULL, decoder = NULL, env_var = NULL, noop_build = FALSE )
name |
Symbol, name of the derivation. |
expr |
Character, Julia code to generate the expression. Ideally it should be a call to a pure function. Multi-line expressions are not supported. |
additional_files |
Character vector, additional files to include during the build process. For example, if a function expects a certain file to be available, this is where you should include it. |
user_functions |
Character vector, user-defined functions to include. This should be a script (or scripts) containing user-defined functions to include during the build process for this derivation. It is recommended to use one script per function, and only include the required script(s) in the derivation. |
nix_env |
Character, path to the Nix environment file, default is "default.nix". |
encoder |
Character, defaults to NULL. The name of the Julia
function used to serialize the object. It must accept two arguments: the
object to serialize (first), and the target file path (second). If NULL,
the default behaviour uses the built‐in |
decoder |
Character or named vector/list, defaults to NULL. Can be:
|
env_var |
Character vector, defaults to NULL. A named vector of
environment variables to set before running the Julia script, e.g.,
|
noop_build |
Logical, defaults to FALSE. If TRUE, the derivation produces a no-op build (a stub output with no actual build steps). Any downstream derivations depending on a no-op build will themselves also become no-op builds. |
At a basic level,
rxp_jl(filtered_data, "filter(df, :col .> 10)")
is equivalent to
filtered_data = filter(df, :col .> 10)
in Julia. rxp_jl()
generates the
required Nix boilerplate to output a so‐called "derivation" in Nix jargon.
A Nix derivation is a recipe that defines how to create an output (in this
case filtered_data
) including its dependencies, build steps, and output
paths.
An object of class derivation which inherits from lists.
Other derivations:
rxp_jl_file()
,
rxp_py()
,
rxp_py_file()
,
rxp_qmd()
,
rxp_r()
,
rxp_r_file()
,
rxp_rmd()
## Not run: # Basic usage, no custom serializer rxp_jl( name = filtered_df, expr = "filter(df, :col .> 10)" ) # Skip building this derivation rxp_jl( name = model_result, expr = "train_model(data)", noop_build = TRUE ) # Custom serialization: assume `save_my_obj(obj, path)` is defined in functions.jl rxp_jl( name = model_output, expr = "train_model(data)", encoder = "save_my_obj", user_functions = "functions.jl" ) ## End(Not run)
## Not run: # Basic usage, no custom serializer rxp_jl( name = filtered_df, expr = "filter(df, :col .> 10)" ) # Skip building this derivation rxp_jl( name = model_result, expr = "train_model(data)", noop_build = TRUE ) # Custom serialization: assume `save_my_obj(obj, path)` is defined in functions.jl rxp_jl( name = model_output, expr = "train_model(data)", encoder = "save_my_obj", user_functions = "functions.jl" ) ## End(Not run)
Creates a Nix expression that reads in a file (or folder of data) using Julia.
rxp_jl_file(...)
rxp_jl_file(...)
... |
Arguments passed on to
|
The basic usage is to provide a path to a file, and the function
to read it. For example: rxp_r_file(mtcars, path = "data/mtcars.csv", read_function = read.csv)
.
It is also possible instead to point to a folder that contains many
files that should all be read at once, for example:
rxp_r_file(many_csvs, path = "data", read_function = \(x)(readr::read_csv(list.files(x, full.names = TRUE, pattern = ".csv$"))))
.
See the vignette("importing-data")
vignette for more detailed examples.
An object of class rxp_derivation
.
Other derivations:
rxp_jl()
,
rxp_py()
,
rxp_py_file()
,
rxp_qmd()
,
rxp_r()
,
rxp_r_file()
,
rxp_rmd()
Returns a data frame with information about all build logs in the project's _rixpress directory.
rxp_list_logs(project_path = ".")
rxp_list_logs(project_path = ".")
project_path |
Character, defaults to ".". Path to the root directory of the project. |
A data frame with log filenames, modification times, and file sizes.
Other utilities:
print.rxp_derivation()
,
rxp_copy()
,
rxp_gc()
,
rxp_init()
,
rxp_inspect()
,
rxp_load()
,
rxp_read()
,
rxp_trace()
## Not run: # List all build logs in the current project logs <- rxp_list_logs() # List logs from a specific project directory logs <- rxp_list_logs("path/to/project") ## End(Not run)
## Not run: # List all build logs in the current project logs <- rxp_list_logs() # List logs from a specific project directory logs <- rxp_list_logs("path/to/project") ## End(Not run)
Loads the output of derivations in the parent frame of the current session, returns a path if reading directly is not possible.
rxp_load(derivation_name, which_log = NULL, project_path = ".")
rxp_load(derivation_name, which_log = NULL, project_path = ".")
derivation_name |
Character, the name of the derivation. |
which_log |
Character, defaults to NULL. If NULL the most recent build log is used. If a string is provided, it's used as a regular expression to match against available log files. |
project_path |
Character, defaults to ".". Path to the root directory of the project. |
When derivation_name
points to a single R object, it gets loaded
in the current session using assign(..., envir = parent.frame())
, which
corresponds to the global environment in a regular interactive session. If
you're trying to load a Python object and {reticulate}
is available,
reticulate::py_load_object()
is used and then the object gets loaded into
the global environment. In case the derivation is pointing to several
outputs (which can happen when building a Quarto document for example) or
loading fails, the path to the object is returned instead.
Nothing, this function has the side effect of loading objects into the parent frame.
Other utilities:
print.rxp_derivation()
,
rxp_copy()
,
rxp_gc()
,
rxp_init()
,
rxp_inspect()
,
rxp_list_logs()
,
rxp_read()
,
rxp_trace()
## Not run: # Load an R object rxp_load("mtcars") # Load a Python object rxp_load("my_python_model") # Load from a specific build log rxp_load("mtcars", which_log = "2025-05-10") ## End(Not run)
## Not run: # Load an R object rxp_load("mtcars") # Load a Python object rxp_load("my_python_model") # Load from a specific build log rxp_load("mtcars", which_log = "2025-05-10") ## End(Not run)
Runs nix-build
with a quiet flag, outputting to _rixpress/result
.
rxp_make(verbose = 0L, max_jobs = 1, cores = 1)
rxp_make(verbose = 0L, max_jobs = 1, cores = 1)
verbose |
Integer, defaults to 0L. Verbosity level: 0 = show progress indicators only, 1+ = show nix output with increasing verbosity. 0: "Progress only", 1: "Informational", 2: "Talkative", 3: "Chatty", 4: "Debug", 5: "Vomit". Values higher than 5 are capped to 5. Each level adds one –verbose flag to nix-store command. |
max_jobs |
Integer, number of derivations to be built in parallel. |
cores |
Integer, number of cores a derivation can use during build. |
A character vector of paths to the built outputs.
Other pipeline functions:
rxp_populate()
## Not run: # Build the pipeline with progress indicators (default) rxp_make() # Build with verbose output and parallel execution rxp_make(verbose = 2, max_jobs = 4, cores = 2) # Maximum verbosity rxp_make(verbose = 3) ## End(Not run)
## Not run: # Build the pipeline with progress indicators (default) rxp_make() # Build with verbose output and parallel execution rxp_make(verbose = 2, max_jobs = 4, cores = 2) # Maximum verbosity rxp_make(verbose = 3) ## End(Not run)
This function generates a pipeline.nix
file based on a list of derivation
objects. Each derivation defines a build step, and rxp_populate()
chains these
steps and handles the serialization and conversion of Python objects into R
objects (or vice-versa). Derivations are created with rxp_r()
, rxp_py()
and so on. By default, the pipeline is also immediately built after being
generated, but the build process can be postponed by setting build
to
FALSE. In this case, the pipeline can then be built using rxp_make()
at
a later stage.
rxp_populate(derivs, project_path = ".", build = FALSE, py_imports = NULL, ...)
rxp_populate(derivs, project_path = ".", build = FALSE, py_imports = NULL, ...)
derivs |
A list of derivation objects, where each object is a list of
five elements:
- |
project_path |
Path to root of project, defaults to ".". |
build |
Logical, defaults to FALSE. Should the pipeline get built right
after being generated? When FALSE, use |
py_imports |
Named character vector of Python import rewrites. Names are
the base modules that rixpress auto-imports as "import |
... |
Further arguments passed down to methods. Use |
The generated pipeline.nix
expression includes:
the required imports of environments, typically default.nix
files generated by
the rix
package;
correct handling of interdependencies of the different derivations;
serialization and deserialization of both R and Python objects, and conversion between them when objects are passed from one language to another;
correct loading of R and Python packages, or extra functions needed to build specific targets
The _rixpress
folder contains:
R, Python or Julia scripts to load the required packages that need to be available to the pipeline.
a JSON file with the DAG of the pipeline, used for visualisation, and to
allow rxp_populate()
to generate the right dependencies between derivations.
.rds
files with build logs, required for rxp_inspect()
and rxp_gc()
.
See vignette("debugging")
for more details.
Inline Python import adjustments
In some cases, due to the automatic handling of Python packages, users might
want to change import statements. By default if, say, pandas
is needed to
build a derivation, it will be imported with import pandas
. However, Python
programmers typically use import pandas as pd
. You can either:
use py_imports
to rewrite these automatically during population, or
use adjust_import()
and add_import()
for advanced/manual control.
See vignette("polyglot")
for more details.
Nothing, writes a file called pipeline.nix
with the Nix code to
build the pipeline, as well as folder called _rixpress with required
internal files.
Other pipeline functions:
rxp_make()
## Not run: # Create derivation objects d1 <- rxp_r(mtcars_am, filter(mtcars, am == 1)) d2 <- rxp_r(mtcars_head, head(mtcars_am)) list_derivs <- list(d1, d2) # Generate and build in one go rxp_populate(derivs = list_derivs, project_path = ".", build = TRUE) # Or only populate, with inline Python import adjustments rxp_populate( derivs = list_derivs, project_path = ".", build = FALSE, py_imports = c(pandas = "import pandas as pd") ) # Then later: rxp_make() ## End(Not run)
## Not run: # Create derivation objects d1 <- rxp_r(mtcars_am, filter(mtcars, am == 1)) d2 <- rxp_r(mtcars_head, head(mtcars_am)) list_derivs <- list(d1, d2) # Generate and build in one go rxp_populate(derivs = list_derivs, project_path = ".", build = TRUE) # Or only populate, with inline Python import adjustments rxp_populate( derivs = list_derivs, project_path = ".", build = FALSE, py_imports = c(pandas = "import pandas as pd") ) # Then later: rxp_make() ## End(Not run)
Create a Nix expression running a Python function
rxp_py( name, expr, additional_files = "", user_functions = "", nix_env = "default.nix", encoder = NULL, decoder = NULL, env_var = NULL, noop_build = FALSE )
rxp_py( name, expr, additional_files = "", user_functions = "", nix_env = "default.nix", encoder = NULL, decoder = NULL, env_var = NULL, noop_build = FALSE )
name |
Symbol, name of the derivation. |
expr |
Character, Python code to generate the expression. Ideally it should be a call to a pure function. Multi-line expressions are not supported. |
additional_files |
Character vector, additional files to include during the build process. For example, if a function expects a certain file to be available, this is where you should include it. |
user_functions |
Character vector, user-defined functions to include. This should be a script (or scripts) containing user-defined functions to include during the build process for this derivation. It is recommended to use one script per function, and only include the required script(s) in the derivation. |
nix_env |
Character, path to the Nix environment file, default is "default.nix". |
encoder |
Character, defaults to NULL. The name of the Python
function used to serialize the object. It must accept two arguments: the
object to serialize (first), and the target file path (second). If NULL,
the default behaviour uses |
decoder |
Character or named vector/list, defaults to NULL. Can be:
|
env_var |
Character vector, defaults to NULL. A named vector of environment variables before running the Python script, e.g., c(PYTHONPATH = "/path/to/modules"). Each entry will be added as an export statement in the build phase. |
noop_build |
Logical, defaults to FALSE. If TRUE, the derivation produces a no-op build (a stub output with no actual build steps). Any downstream derivations depending on a no-op build will themselves also become no-op builds. |
At a basic level,
rxp_py(mtcars_am, "mtcars.filter(polars.col('am') == 1).to_pandas()")
is equivalent to
mtcars_am = mtcars.filter(polars.col('am') == 1).to_pandas()
. rxp_py()
generates the required Nix boilerplate to output a so-called "derivation"
in Nix jargon. A Nix derivation is a recipe that defines how to create an
output (in this case mtcars_am
) including its dependencies, build steps,
and output paths.
An object of class derivation which inherits from lists.
Other derivations:
rxp_jl()
,
rxp_jl_file()
,
rxp_py_file()
,
rxp_qmd()
,
rxp_r()
,
rxp_r_file()
,
rxp_rmd()
## Not run: rxp_py( mtcars_pl_am, expr = "mtcars_pl.filter(polars.col('am') == 1).to_pandas()" ) # Skip building this derivation rxp_py( data_prep, expr = "preprocess_data(raw_data)", noop_build = TRUE ) # Custom serialization rxp_py( mtcars_pl_am, expr = "mtcars_pl.filter(polars.col('am') == 1).to_pandas()", user_functions = "functions.py", encoder = "serialize_model", additional_files = "some_required_file.bin") ## End(Not run)
## Not run: rxp_py( mtcars_pl_am, expr = "mtcars_pl.filter(polars.col('am') == 1).to_pandas()" ) # Skip building this derivation rxp_py( data_prep, expr = "preprocess_data(raw_data)", noop_build = TRUE ) # Custom serialization rxp_py( mtcars_pl_am, expr = "mtcars_pl.filter(polars.col('am') == 1).to_pandas()", user_functions = "functions.py", encoder = "serialize_model", additional_files = "some_required_file.bin") ## End(Not run)
Creates a Nix expression that reads in a file (or folder of data) using Python.
rxp_py_file(...)
rxp_py_file(...)
... |
Arguments passed on to
|
The basic usage is to provide a path to a file, and the function
to read it. For example: rxp_r_file(mtcars, path = "data/mtcars.csv", read_function = read.csv)
.
It is also possible instead to point to a folder that contains many
files that should all be read at once, for example:
rxp_r_file(many_csvs, path = "data", read_function = \(x)(readr::read_csv(list.files(x, full.names = TRUE, pattern = ".csv$"))))
See the vignette("importing-data")
vignette for more detailed examples.
An object of class rxp_derivation
.
Other derivations:
rxp_jl()
,
rxp_jl_file()
,
rxp_py()
,
rxp_qmd()
,
rxp_r()
,
rxp_r_file()
,
rxp_rmd()
Transfer Python object into an R session.
rxp_py2r(name, expr, nix_env = "default.nix")
rxp_py2r(name, expr, nix_env = "default.nix")
name |
Symbol, name of the derivation. |
expr |
Symbol, Python object to be loaded into R. |
nix_env |
Character, path to the Nix environment file, default is "default.nix". |
rxp_py2r(my_obj, my_python_object)
loads a serialized Python
object and saves it as an RDS file using reticulate::py_load_object()
.
An object of class rxp_derivation
.
Other interop functions:
rxp_r2py()
Render a Quarto document as a Nix derivation
rxp_qmd( name, qmd_file, additional_files = "", nix_env = "default.nix", args = "", env_var = NULL, noop_build = FALSE )
rxp_qmd( name, qmd_file, additional_files = "", nix_env = "default.nix", args = "", env_var = NULL, noop_build = FALSE )
name |
Symbol, derivation name. |
qmd_file |
Character, path to .qmd file. |
additional_files |
Character vector, additional files to include, for example a folder containing images to include in the Quarto document. |
nix_env |
Character, path to the Nix environment file, default is "default.nix". |
args |
A character of additional arguments to be passed directly to the
|
env_var |
List, defaults to NULL. A named list of environment variables to set before running the Quarto render command, e.g., c(QUARTO_PROFILE = "production"). Each entry will be added as an export statement in the build phase. |
noop_build |
Logical, defaults to FALSE. If TRUE, the derivation produces a no-op build (a stub output with no actual build steps). Any downstream derivations depending on a no-op build will themselves also become no-op builds. |
To include built derivations in the document,
rxp_read("derivation_name")
should be put in the .qmd file.
An object of class derivation which inherits from lists.
Other derivations:
rxp_jl()
,
rxp_jl_file()
,
rxp_py()
,
rxp_py_file()
,
rxp_r()
,
rxp_r_file()
,
rxp_rmd()
## Not run: # Compile a .qmd file to a pdf using typst # `images` is a folder containing images to include in the Quarto doc rxp_qmd( name = report, qmd_file = "report.qmd", additional_files = "images", args = "--to typst" ) # Skip building this derivation rxp_qmd( name = draft_report, qmd_file = "draft.qmd", noop_build = TRUE ) ## End(Not run)
## Not run: # Compile a .qmd file to a pdf using typst # `images` is a folder containing images to include in the Quarto doc rxp_qmd( name = report, qmd_file = "report.qmd", additional_files = "images", args = "--to typst" ) # Skip building this derivation rxp_qmd( name = draft_report, qmd_file = "draft.qmd", noop_build = TRUE ) ## End(Not run)
Create a Nix expression running an R function
rxp_r( name, expr, additional_files = "", user_functions = "", nix_env = "default.nix", encoder = NULL, decoder = NULL, env_var = NULL, noop_build = FALSE )
rxp_r( name, expr, additional_files = "", user_functions = "", nix_env = "default.nix", encoder = NULL, decoder = NULL, env_var = NULL, noop_build = FALSE )
name |
Symbol, name of the derivation. |
expr |
R code to generate the expression. Ideally it should be a call to a pure function, or a piped expression. Multi-line expressions are not supported. |
additional_files |
Character vector, additional files to include during the build process. For example, if a function expects a certain file to be available, this is where you should include it. |
user_functions |
Character vector, user-defined functions to include. This should be a script (or scripts) containing user-defined functions to include during the build process for this derivation. It is recommended to use one script per function, and only include the required script(s) in the derivation. |
nix_env |
Character, path to the Nix environment file, default is "default.nix". |
encoder |
Function or character defaults to NULL. A function used to
encode (serialize) objects for transfer between derivations. It must accept two
arguments: the object to encode (first), and the target file path
(second). If your function has a different signature, wrap it to match this
interface. By default, |
decoder |
Function, character, or named vector/list, defaults to NULL. Can be:
|
env_var |
Character vector, defaults to NULL. A named vector of
environment variables to set before running the R script, e.g.,
|
noop_build |
Logical, defaults to FALSE. If TRUE, the derivation produces a no-op build (a stub output with no actual build steps). Any downstream derivations depending on a no-op build will themselves also become no-op builds. |
At a basic level, rxp_r(mtcars_am, filter(mtcars, am == 1))
is
equivalent to mtcars_am <- filter(mtcars, am == 1)
. rxp_r()
generates the
required Nix boilerplate to output a so-called "derivation" in Nix jargon.
A Nix derivation is a recipe that defines how to create an output (in this
case mtcars_am
) including its dependencies, build steps, and output
paths.
An object of class derivation which inherits from lists.
Other derivations:
rxp_jl()
,
rxp_jl_file()
,
rxp_py()
,
rxp_py_file()
,
rxp_qmd()
,
rxp_r_file()
,
rxp_rmd()
## Not run: # Basic usage rxp_r(name = filtered_mtcars, expr = filter(mtcars, am == 1)) # Skip building this derivation rxp_r( name = turtles, expr = occurrence(species, geometry = atlantic), noop_build = TRUE ) # Serialize object using qs rxp_r( name = filtered_mtcars, expr = filter(mtcars, am == 1), encoder = qs::qsave ) # Unerialize using qs::qread in the next derivation rxp_r( name = mtcars_mpg, expr = select(filtered_mtcars, mpg), decoder = qs::qread ) ## End(Not run)
## Not run: # Basic usage rxp_r(name = filtered_mtcars, expr = filter(mtcars, am == 1)) # Skip building this derivation rxp_r( name = turtles, expr = occurrence(species, geometry = atlantic), noop_build = TRUE ) # Serialize object using qs rxp_r( name = filtered_mtcars, expr = filter(mtcars, am == 1), encoder = qs::qsave ) # Unerialize using qs::qread in the next derivation rxp_r( name = mtcars_mpg, expr = select(filtered_mtcars, mpg), decoder = qs::qread ) ## End(Not run)
Creates a Nix expression that reads in a file (or folder of data) using R.
rxp_r_file(...)
rxp_r_file(...)
... |
Arguments passed on to
|
The basic usage is to provide a path to a file, and the function
to read it. For example: rxp_r_file(mtcars, path = "data/mtcars.csv", read_function = read.csv)
.
It is also possible instead to point to a folder that contains many
files that should all be read at once, for example:
rxp_r_file(many_csvs, path = "data", read_function = \(x)(readr::read_csv(list.files(x, full.names = TRUE, pattern = ".csv$"))))
.
See the vignette("importing-data")
vignette for more detailed examples.
An object of class rxp_derivation
.
Other derivations:
rxp_jl()
,
rxp_jl_file()
,
rxp_py()
,
rxp_py_file()
,
rxp_qmd()
,
rxp_r()
,
rxp_rmd()
Transfer R object into a Python session.
rxp_r2py(name, expr, nix_env = "default.nix")
rxp_r2py(name, expr, nix_env = "default.nix")
name |
Symbol, name of the derivation. |
expr |
Symbol, R object to be saved into a Python pickle. |
nix_env |
Character, path to the Nix environment file, default is "default.nix". |
rxp_r2py(my_obj, my_r_object)
saves an R object to a Python pickle
using reticulate::py_save_object()
.
An object of class rxp_derivation
.
Other interop functions:
rxp_py2r()
Reads the output of derivations in the current session, returns a path if reading directly is not possible.
rxp_read(derivation_name, which_log = NULL, project_path = ".")
rxp_read(derivation_name, which_log = NULL, project_path = ".")
derivation_name |
Character, the name of the derivation. |
which_log |
Character, defaults to NULL. If NULL the most recent build log is used. If a string is provided, it's used as a regular expression to match against available log files. |
project_path |
Character, defaults to ".". Path to the root directory of the project. |
When derivation_name
points to a single R object,
it gets read in the current session using readRDS()
.
If it's a Python object and {reticulate}
is available,
reticulate::py_load_object()
is used. In case
the derivation is pointing to several outputs (which can
happen when building a Quarto document for example) or
neither readRDS()
nor reticulate::py_load_object()
successfully read the object, the path to the object is
returned instead.
The derivation's output.
Other utilities:
print.rxp_derivation()
,
rxp_copy()
,
rxp_gc()
,
rxp_init()
,
rxp_inspect()
,
rxp_list_logs()
,
rxp_load()
,
rxp_trace()
## Not run: mtcars <- rxp_read("mtcars") # Read from a specific build log mtcars <- rxp_read("mtcars", which_log = "2025-05-10") ## End(Not run)
## Not run: mtcars <- rxp_read("mtcars") # Read from a specific build log mtcars <- rxp_read("mtcars", which_log = "2025-05-10") ## End(Not run)
Render an R Markdown document as a Nix derivation
rxp_rmd( name, rmd_file, additional_files = "", nix_env = "default.nix", params = NULL, env_var = NULL, noop_build = FALSE )
rxp_rmd( name, rmd_file, additional_files = "", nix_env = "default.nix", params = NULL, env_var = NULL, noop_build = FALSE )
name |
Symbol, derivation name. |
rmd_file |
Character, path to .Rmd file. |
additional_files |
Character vector, additional files to include, for example a folder containing the pictures to include in the R Markdown document. |
nix_env |
Character, path to the Nix environment file, default is "default.nix". |
params |
List, parameters to pass to the R Markdown document. Default is NULL. |
env_var |
List, defaults to NULL. A named list of environment variables to set before running the R Markdown render command, e.g., c(RSTUDIO_PANDOC = "/path/to/pandoc"). Each entry will be added as an export statement in the build phase. |
noop_build |
Logical, defaults to FALSE. If TRUE, the derivation produces a no-op build (a stub output with no actual build steps). Any downstream derivations depending on a no-op build will themselves also become no-op builds. |
To include objects built in the pipeline,
rxp_read("derivation_name")
should be put in the .Rmd file.
An object of class derivation which inherits from lists.
Other derivations:
rxp_jl()
,
rxp_jl_file()
,
rxp_py()
,
rxp_py_file()
,
rxp_qmd()
,
rxp_r()
,
rxp_r_file()
## Not run: # Compile a .Rmd file to a pdf # `images` is a folder containing images to include in the R Markdown doc rxp_rmd( name = report, rmd_file = "report.Rmd", additional_files = "images" ) # Skip building this derivation rxp_rmd( name = draft_report, rmd_file = "draft.Rmd", noop_build = TRUE ) ## End(Not run)
## Not run: # Compile a .Rmd file to a pdf # `images` is a folder containing images to include in the R Markdown doc rxp_rmd( name = report, rmd_file = "report.Rmd", additional_files = "images" ) # Skip building this derivation rxp_rmd( name = draft_report, rmd_file = "draft.Rmd", noop_build = TRUE ) ## End(Not run)
Trace lineage of derivations
rxp_trace( name = NULL, dag_file = file.path("_rixpress", "dag.json"), transitive = TRUE, include_self = FALSE )
rxp_trace( name = NULL, dag_file = file.path("_rixpress", "dag.json"), transitive = TRUE, include_self = FALSE )
name |
Charcter, defaults to NULL. Name of the derivation to inspect. If NULL, the function prints the whole pipeline (inverted global view). |
dag_file |
Character, defaults to "_rixpress/dag.json". Path to dag.json. |
transitive |
Logical, defaults to TRUE. If TRUE, show transitive closure and mark transitive-only nodes with "*". If FALSE, show immediate neighbours only. |
include_self |
Logical, defaults to FALSE. If TRUE, include |
Invisibly, a named list mapping each inspected derivation name to a list with elements: - dependencies - reverse_dependencies The function also prints a tree representation to the console.
Other utilities:
print.rxp_derivation()
,
rxp_copy()
,
rxp_gc()
,
rxp_init()
,
rxp_inspect()
,
rxp_list_logs()
,
rxp_load()
,
rxp_read()
{visNetwork}
Uses {visNetwork}
to generate the plot. {visNetwork}
is a
soft dependency of {rixpress}
so you need to install it to use this
function.
rxp_visnetwork(nodes_and_edges = get_nodes_edges())
rxp_visnetwork(nodes_and_edges = get_nodes_edges())
nodes_and_edges |
List, output of |
Nothing, this function opens a new tab in your browser with
the DAG generated using {visNetwork}
.
Other visualisation functions:
rxp_ggdag()
## Not run: rxp_visnetwork() ## End(Not run)
## Not run: rxp_visnetwork() ## End(Not run)
Creates a JSON representation of a directed acyclic graph (DAG)
based on dependencies between derivations. Is automatically called
by rxp_populate()
.
rxp_write_dag(rxp_list, output_file = "_rixpress/dag.json")
rxp_write_dag(rxp_list, output_file = "_rixpress/dag.json")
rxp_list |
A list of derivations. |
output_file |
Path to the output JSON file. Defaults to "_rixpress/dag.json". |
Nothing, writes a JSON file representing the DAG.
Other ci utilities:
rxp_dag_for_ci()
,
rxp_ga()
## Not run: rxp_write_dag(rxp_list) ## End(Not run)
## Not run: rxp_write_dag(rxp_list) ## End(Not run)