This vignette details how to
effectively use the {cmdstanr}
package within a
{rixpress}
pipeline for Bayesian statistical modelling with
Stan. For a general introduction to {rixpress}
and its core
concepts, please refer to vignette("intro-concepts")
and
vignette("core-functions")
.
{cmdstanr}
provides a user-friendly R interface to
cmdstan
, Stan’s command-line interface. While powerful, its
reliance on external processes and file system interactions requires
careful handling within the hermetic build environment of
{rixpress}
.
As with any {rixpress}
pipeline, the first step is to
define the execution environment using {rix}
:
library(rix)
rix(
date = "2025-04-29",
r_pkgs = c("readr", "dplyr", "ggplot2"), # Add other R packages as needed
system_pkgs = "cmdstan", # Crucial: include cmdstan as a system dependency
git_pkgs = list(
list(
package_name = "cmdstanr",
repo_url = "https://github.com/stan-dev/cmdstanr",
commit = "79d37792d8e4ffcf3cf721b8d7ee4316a1234b0c" # Pin to a specific commit
),
list(
package_name = "rixpress",
repo_url = "https://github.com/ropensci/rixpress",
commit = "HEAD" # Or pin to a specific commit
)
),
ide = "none", # Or your preferred IDE
project_path = ".",
overwrite = TRUE
)
Key points in this environment definition:
cmdstan
is included in system_pkgs
. This
makes the cmdstan
executables available to the
pipeline.{cmdstanr}
is installed from its GitHub repository, as
it’s not available on CRAN. Pinning to a specific commit is recommended
for maximum reproducibility.With the environment set up, we can define the pipeline:
The Stan model code itself should reside in a .stan
file. We use rxp_r_file()
to bring its contents into the
pipeline as a character string.
Next, we define parameters and simulate some data for our model.
rxp_r(
parameters,
list(
N = 100,
alpha = 2,
beta = -0.5,
sigma = 1.e-1
)
),
rxp_r(
x,
rnorm(parameters$N, 0, 1)
),
rxp_r(
y,
rnorm(
n = parameters$N,
mean = parameters$alpha + parameters$beta * x,
sd = parameters$sigma
)
),
rxp_r(
# Prepare the data list for cmdstanr
inputs,
list(N = parameters$N, x = x, y = y)
),
Interfacing with cmdstan
from within
{rixpress}
requires a specific strategy due to the hermetic
nature of Nix sandboxes. We’ll use a wrapper function to handle model
compilation and sampling within a single rxp_r()
step.
First, let’s define the wrapper function (e.g., in a
functions.R
file that we’ll include):
# In functions.R
cmdstan_model_wrapper <- function(
stan_string = NULL, # The Stan model code as a character string
inputs, # Data list for the model
seed, # Seed for reproducibility
... # Additional arguments for cmdstan_model or sample
) {
# Create a temporary .stan file within the sandbox
stan_file <- tempfile(pattern = "model_", fileext = ".stan")
writeLines(stan_string, con = stan_file)
# Compile the Stan model
# cmdstanr will find cmdstan via the CMDSTAN environment variable
model <- cmdstanr::cmdstan_model(
stan_file = stan_file,
...
)
# Sample from the posterior
fitted_model <- model$sample(
data = inputs,
seed = seed,
...
)
return(fitted_model)
}
Now, we use this wrapper in our pipeline:
# ... (continuation of pipeline_steps list)
rxp_r(
model, # Target name for the fitted model object
cmdstan_model_wrapper(
stan_string = bayesian_linear_regression_model,
inputs = inputs,
seed = 22
),
user_functions = "functions.R",
encoder = "save_model",
env_var = c("CMDSTAN" = "${defaultPkgs.cmdstan}/opt/cmdstan")
)
Explanation of the Wrapper Approach:
stan_string = bayesian_linear_regression_model
:
We pass the model code (read by rxp_r_file
) as a string to
our wrapper.writeLines(stan_string, con = stan_file)
:
Inside the wrapper, the Stan code is written to a temporary
.stan
file. This file exists within the sandbox of
the current rxp_r
step. This is crucial because
cmdstan_model
needs a file path. Attempting to pass the
original model.stan
path directly via
additional_files
to cmdstan_model
can lead to
permission or path issues when cmdstan
tries to compile it
from a different working directory or context.cmdstanr::cmdstan_model()
: Compiles
the model from the temporary stan_file
.model$sample()
: Samples from the
compiled model.rxp_r
step (and thus
the same sandbox). This is because the model
object
returned by cmdstan_model()
contains paths to the compiled
executable. If these were separate steps, the paths from the compilation
sandbox wouldn’t be valid in the sampling sandbox.env_var = c("CMDSTAN" = "${defaultPkgs.cmdstan}/opt/cmdstan")
:
This sets the CMDSTAN
environment variable within the
sandbox for this specific step. {cmdstanr}
uses this
variable to locate the cmdstan
installation. The
${defaultPkgs.cmdstan}
is a Nix interpolation that resolves
to the path of the cmdstan
package in the Nix store. If the
environment providing cmdstan
were named differently, for
example cmdstan-env.nix
, then you would need to use
${cmdstan_envPkgs.cmdstan}
.{cmdstanr}
provides a specific method for saving fitted
model objects to ensure all necessary components are preserved. We
define a simple wrapper for this to use with
{rixpress}
.
By specifying encoder = "save_model"
in the
rxp_r()
call, {rixpress}
will use this
function instead of the default saveRDS()
. The fitted model
can then be read using rxp_read("model")
, which will
internally use readRDS()
.
Using {cmdstanr}
with {rixpress}
involves
these key considerations:
Include cmdstan
in system_pkgs
and
{cmdstanr}
(from Git) in your {rix}
environment definition.
Read your .stan
file into the pipeline using
rxp_r_file()
.
Implement a wrapper function that:
.stan
file inside the wrapper.cmdstanr::cmdstan_model()
on this temporary
file.model$sample()
to fit the model.Perform model compilation and sampling within the same
rxp_r()
call using the wrapper.
Set the CMDSTAN
environment variable for the
rxp_r()
step that runs the wrapper, pointing to the Nix
store path of cmdstan
.
Use {cmdstanr}
’s $save_object()
method
via a custom encoder
for robust saving of the fitted
model.
This approach ensures that cmdstan
can operate correctly
within the isolated and reproducible environment provided by
{rixpress}
and Nix.