Title: | Bayesian Modeling and Causal Inference for Multivariate Longitudinal Data |
---|---|
Description: | Easy-to-use and efficient interface for Bayesian inference of complex panel (time series) data using dynamic multivariate panel models by Helske and Tikka (2024) <doi:10.1016/j.alcr.2024.100617>. The package supports joint modeling of multiple measurements per individual, time-varying and time-invariant effects, and a wide range of discrete and continuous distributions. Estimation of these dynamic multivariate panel models is carried out via 'Stan'. For an in-depth tutorial of the package, see (Tikka and Helske, 2024) <doi:10.48550/arXiv.2302.01607>. |
Authors: | Santtu Tikka [aut, cre] , Jouni Helske [aut] , Nicholas Clark [rev], Lucy D'Agostino McGowan [rev] |
Maintainer: | Santtu Tikka <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.5.4 |
Built: | 2024-10-16 13:35:26 UTC |
Source: | https://github.com/ropensci/dynamite |
Easy-to-use and efficient interface for Bayesian inference of complex panel data consisting of multiple individuals with multiple measurements over time using dynamic multivariate panel models. Supports several observational distributions, time-varying effects and realistic counterfactual predictions which take into account the dynamic structure of the model.
The package vignettes
dynamiteformula()
for information on defining models.
dynamite()
for information on fitting models.
https://github.com/ropensci/dynamite/issues/ to submit a bug report or a feature request.
Santtu Tikka (author) | [email protected] |
Jouni Helske (author) | [email protected] |
Maintainer: Santtu Tikka [email protected] (ORCID)
Authors:
Jouni Helske [email protected] (ORCID)
Other contributors:
Nicholas Clark [reviewer]
Lucy D'Agostino McGowan [reviewer]
Useful links:
Report bugs at https://github.com/ropensci/dynamite/issues/
dynamite
Output to draws_df
FormatConverts the output from a dynamite()
call to a
draws_df
format of the posterior package, enabling the use
of diagnostics and plotting methods of posterior and bayesplot
packages. Note that this function returns variables in a wide format,
whereas as.data.frame.dynamitefit()
uses the long format.
## S3 method for class 'dynamitefit' as_draws_df( x, parameters = NULL, responses = NULL, types = NULL, times = NULL, groups = NULL, ... ) ## S3 method for class 'dynamitefit' as_draws(x, parameters = NULL, responses = NULL, types = NULL, ...)
## S3 method for class 'dynamitefit' as_draws_df( x, parameters = NULL, responses = NULL, types = NULL, times = NULL, groups = NULL, ... ) ## S3 method for class 'dynamitefit' as_draws(x, parameters = NULL, responses = NULL, types = NULL, ...)
x |
[ |
parameters |
[ |
responses |
[ |
types |
[ |
times |
[ |
groups |
[ |
... |
Ignored. |
You can use the arguments parameters
, responses
and types
to extract
only a subset of the model parameters (i.e., only certain types of
parameters related to a certain response variable).
See potential values for the types argument in as.data.frame.dynamitefit()
and get_parameter_names()
for potential values for parameters
argument.
A draws_df
object.
A draws_df
object.
Model outputs
as.data.frame.dynamitefit()
,
as.data.table.dynamitefit()
,
coef.dynamitefit()
,
confint.dynamitefit()
,
dynamite()
,
get_code()
,
get_data()
,
get_parameter_dims()
,
get_parameter_names()
,
get_parameter_types()
,
ndraws.dynamitefit()
,
nobs.dynamitefit()
data.table::setDTthreads(1) # For CRAN as_draws(gaussian_example_fit, types = c("sigma", "beta")) # Compute MCMC diagnostics using the posterior package posterior::summarise_draws(as_draws(gaussian_example_fit))
data.table::setDTthreads(1) # For CRAN as_draws(gaussian_example_fit, types = c("sigma", "beta")) # Compute MCMC diagnostics using the posterior package posterior::summarise_draws(as_draws(gaussian_example_fit))
dynamitefit
Object as a Data FrameProvides a data.frame
representation of the posterior samples of the model
parameters.
## S3 method for class 'dynamitefit' as.data.frame( x, row.names = NULL, optional = FALSE, types = NULL, parameters = NULL, responses = NULL, times = NULL, groups = NULL, summary = FALSE, probs = c(0.05, 0.95), include_fixed = TRUE, ... )
## S3 method for class 'dynamitefit' as.data.frame( x, row.names = NULL, optional = FALSE, types = NULL, parameters = NULL, responses = NULL, times = NULL, groups = NULL, summary = FALSE, probs = c(0.05, 0.95), include_fixed = TRUE, ... )
x |
[ |
row.names |
Ignored. |
optional |
Ignored. |
types |
[ |
parameters |
[ |
responses |
[ |
times |
[ |
groups |
[ |
summary |
[ |
probs |
[ |
include_fixed |
[ |
... |
Ignored. |
The arguments responses
and types
can be used to extract only a subset
of the model parameters (i.e., only certain types of parameters related to a
certain response variable).
Potential values for the types
argument are:
alpha
Intercept terms (time-invariant or time-varying).
beta
Time-invariant regression coefficients.
cutpoint
Cutpoints for ordinal regression.
delta
Time-varying regression coefficients.
nu
Group-level random effects.
lambda
Factor loadings.
kappa
Contribution of the latent factor loadings in the total
variation.
psi
Latent factors.
tau
Standard deviations of the spline coefficients of delta
.
tau_alpha
Standard deviations of the spline coefficients of
time-varying alpha
.
sigma_nu
Standard deviations of the random effects nu
.
corr_nu
Pairwise within-group correlations of random effects nu
.
Samples of the full correlation matrix can be extracted manually as
rstan::extract(fit$stanfit, pars = "corr_matrix_nu")
if necessary.
sigma_lambda
Standard deviations of the latent factor loadings
lambda
.
corr_psi
Pairwise correlations of the noise terms of the latent
factors. Samples of the full correlation matrix can be extracted
manually as rstan::extract(fit$stanfit, pars = "corr_matrix_psi")
if
necessary.
sigma
Standard deviations of Gaussian responses.
corr
Pairwise correlations of multivariate Gaussian responses.
phi
Describes various distributional parameters, such as:
Dispersion parameter of the Negative Binomial distribution.
Shape parameter of the Gamma distribution.
Precision parameter of the Beta distribution.
Degrees of freedom of the Student t-distribution.
omega
Spline coefficients of the regression coefficients delta
.
omega_alpha
Spline coefficients of time-varying alpha
.
omega_psi
Spline coefficients of the latent factors psi
. Note that
in case of nonzero_lambda = FALSE
, mean of these are used to flip the
sign of psi
to avoid multimodality due to sign-switching, but
omega_psi
variables are not modified.
zeta
Total variation of latent factors, i.e., the sum
of sigma_lambda
and tau_psi
.
A tibble
containing either samples or summary statistics of the
model parameters in a long format. For a wide format, see
as_draws.dynamitefit()
.
Model outputs
as.data.table.dynamitefit()
,
as_draws_df.dynamitefit()
,
coef.dynamitefit()
,
confint.dynamitefit()
,
dynamite()
,
get_code()
,
get_data()
,
get_parameter_dims()
,
get_parameter_names()
,
get_parameter_types()
,
ndraws.dynamitefit()
,
nobs.dynamitefit()
data.table::setDTthreads(1) # For CRAN as.data.frame( gaussian_example_fit, responses = "y", types = "beta" ) # Basic summaries can be obtained automatically with summary = TRUE as.data.frame( gaussian_example_fit, responses = "y", types = "beta", summary = TRUE ) # Time-varying coefficients "delta" as.data.frame( gaussian_example_fit, responses = "y", types = "delta", summary = TRUE ) # Obtain summaries for a specific parameters as.data.frame( gaussian_example_fit, parameters = c("tau_y_x", "sigma_y"), summary = TRUE )
data.table::setDTthreads(1) # For CRAN as.data.frame( gaussian_example_fit, responses = "y", types = "beta" ) # Basic summaries can be obtained automatically with summary = TRUE as.data.frame( gaussian_example_fit, responses = "y", types = "beta", summary = TRUE ) # Time-varying coefficients "delta" as.data.frame( gaussian_example_fit, responses = "y", types = "delta", summary = TRUE ) # Obtain summaries for a specific parameters as.data.frame( gaussian_example_fit, parameters = c("tau_y_x", "sigma_y"), summary = TRUE )
dynamitefit
Object as a Data TableProvides a data.table
representation of the posterior samples of the model
parameters. See as.data.frame.dynamitefit()
for details.
## S3 method for class 'dynamitefit' as.data.table( x, keep.rownames = FALSE, row.names = NULL, optional = FALSE, types = NULL, parameters = NULL, responses = NULL, times = NULL, groups = NULL, summary = FALSE, probs = c(0.05, 0.95), include_fixed = TRUE, ... )
## S3 method for class 'dynamitefit' as.data.table( x, keep.rownames = FALSE, row.names = NULL, optional = FALSE, types = NULL, parameters = NULL, responses = NULL, times = NULL, groups = NULL, summary = FALSE, probs = c(0.05, 0.95), include_fixed = TRUE, ... )
x |
[ |
keep.rownames |
[ |
row.names |
Ignored. |
optional |
Ignored. |
types |
[ |
parameters |
[ |
responses |
[ |
times |
[ |
groups |
[ |
summary |
[ |
probs |
[ |
include_fixed |
[ |
... |
Ignored. |
A data.table
containing either samples or summary statistics of
the model parameters.
Model outputs
as.data.frame.dynamitefit()
,
as_draws_df.dynamitefit()
,
coef.dynamitefit()
,
confint.dynamitefit()
,
dynamite()
,
get_code()
,
get_data()
,
get_parameter_dims()
,
get_parameter_names()
,
get_parameter_types()
,
ndraws.dynamitefit()
,
nobs.dynamitefit()
data.table::setDTthreads(1) # For CRAN as.data.table( gaussian_example_fit, responses = "y", types = "beta", summary = FALSE )
data.table::setDTthreads(1) # For CRAN as.data.table( gaussian_example_fit, responses = "y", types = "beta", summary = FALSE )
A simulated data containing multiple individuals with two categorical response variables.
categorical_example
categorical_example
A data frame with 2000 rows and 5 variables:
Variable defining individuals (1 to 100).
Variable defining the time point of the measurement (1 to 20).
Categorical variable with three levels, A, B, and C.
Categorical variable with three levels, a, b, and c.
A continuous covariate.
The data was generated via categorical_example.R
in
https://github.com/ropensci/dynamite/tree/main/data-raw/
Example models
categorical_example_fit
,
gaussian_example
,
gaussian_example_fit
,
multichannel_example
,
multichannel_example_fit
A dynamitefit
object obtained by running dynamite
on the
categorical_example
dataset as
set.seed(1) library(dynamite) f <- obs(x ~ z + lag(x) + lag(y), family = "categorical") + obs(y ~ z + lag(x) + lag(y), family = "categorical") categorical_example_fit <- dynamite( f, data = categorical_example, time = "time", group = "id", chains = 1, refresh = 0, thin = 5, save_warmup = FALSE )
Note the small number of samples due to size restrictions on CRAN.
categorical_example_fit
categorical_example_fit
A dynamitefit
object.
The data was generated via categorical_example_fit.R
in
https://github.com/ropensci/dynamite/tree/main/data-raw/
Example models
categorical_example
,
gaussian_example
,
gaussian_example_fit
,
multichannel_example
,
multichannel_example_fit
Extracts either time-varying or time-invariant parameters of the model.
## S3 method for class 'dynamitefit' coef( object, types = c("alpha", "beta", "delta"), parameters = NULL, responses = NULL, times = NULL, groups = NULL, summary = TRUE, probs = c(0.05, 0.95), ... )
## S3 method for class 'dynamitefit' coef( object, types = c("alpha", "beta", "delta"), parameters = NULL, responses = NULL, times = NULL, groups = NULL, summary = TRUE, probs = c(0.05, 0.95), ... )
object |
[ |
types |
[ |
parameters |
[ |
responses |
[ |
times |
[ |
groups |
[ |
summary |
[ |
probs |
[ |
... |
Ignored. |
A tibble
containing either samples or summary statistics of the
model parameters in a long format.
Model outputs
as.data.frame.dynamitefit()
,
as.data.table.dynamitefit()
,
as_draws_df.dynamitefit()
,
confint.dynamitefit()
,
dynamite()
,
get_code()
,
get_data()
,
get_parameter_dims()
,
get_parameter_names()
,
get_parameter_types()
,
ndraws.dynamitefit()
,
nobs.dynamitefit()
data.table::setDTthreads(1) # For CRAN betas <- coef(gaussian_example_fit, type = "beta") deltas <- coef(gaussian_example_fit, type = "delta")
data.table::setDTthreads(1) # For CRAN betas <- coef(gaussian_example_fit, type = "beta") deltas <- coef(gaussian_example_fit, type = "delta")
Extracts credible intervals from dynamitefit
object.
## S3 method for class 'dynamitefit' confint(object, parm, level = 0.95, ...)
## S3 method for class 'dynamitefit' confint(object, parm, level = 0.95, ...)
object |
[ |
parm |
Ignored. |
level |
[ |
... |
Ignored. |
The rows of the resulting matrix
will be named using the following
logic: {parameter}_{time}_{category}_{group}
where parameter
is the
name of the parameter, time
is the time index of the parameter,
category
specifies the level of the response the parameter
is related to if the response is categorical, and group
determines which
group of observations the parameter is related to in the case of random
effects and loadings. Non-applicable fields in the this syntax are set
to NA
.
Model outputs
as.data.frame.dynamitefit()
,
as.data.table.dynamitefit()
,
as_draws_df.dynamitefit()
,
coef.dynamitefit()
,
dynamite()
,
get_code()
,
get_data()
,
get_parameter_dims()
,
get_parameter_names()
,
get_parameter_types()
,
ndraws.dynamitefit()
,
nobs.dynamitefit()
data.table::setDTthreads(1) # For CRAN confint(gaussian_example_fit, level = 0.9)
data.table::setDTthreads(1) # For CRAN confint(gaussian_example_fit, level = 0.9)
Applies multiple imputation using mice::mice()
to the supplied data
and fits a dynamic multivariate panel model to each imputed data set using
dynamite()
. Posterior samples from each imputation run are
combined. When using wide format imputation, the long format data
is
automatically converted to a wide format before imputation to preserve the
longitudinal structure, and then converted back to long format for
estimation.
dynamice( dformula, data, time, group = NULL, priors = NULL, backend = "rstan", verbose = TRUE, verbose_stan = FALSE, stanc_options = list("O0"), threads_per_chain = 1L, grainsize = NULL, custom_stan_model = NULL, debug = NULL, mice_args = list(), impute_format = "wide", keep_imputed = FALSE, stan_csv_dir = tempdir(), ... )
dynamice( dformula, data, time, group = NULL, priors = NULL, backend = "rstan", verbose = TRUE, verbose_stan = FALSE, stanc_options = list("O0"), threads_per_chain = 1L, grainsize = NULL, custom_stan_model = NULL, debug = NULL, mice_args = list(), impute_format = "wide", keep_imputed = FALSE, stan_csv_dir = tempdir(), ... )
dformula |
[ |
data |
[ |
time |
[ |
group |
[ |
priors |
[ |
backend |
[ |
verbose |
[ |
verbose_stan |
[ |
stanc_options |
[ |
threads_per_chain |
[ |
grainsize |
[ |
custom_stan_model |
[ |
debug |
[ |
mice_args |
[ |
impute_format |
[ |
keep_imputed |
[ |
stan_csv_dir |
[ |
... |
For |
Model fitting
dynamite()
,
get_priors()
,
update.dynamitefit()
Fit a Bayesian dynamic multivariate panel model (DMPM) using Stan for
Bayesian inference. The dynamite package supports a wide range of
distributions and allows the user to flexibly customize the priors for the
model parameters. The dynamite model is specified using standard R formula
syntax via dynamiteformula()
. For more information and examples,
see 'Details' and the package vignettes.
The formula
method returns the model definition as a quoted expression.
Information on the estimated dynamite
model can be obtained via
print()
including the following: The model formula, the data,
the smallest effective sample sizes, largest Rhat and summary statistics of
the time-invariant and group-invariant model parameters.
The summary()
method provides statistics of the posterior samples of the
model; this is an alias of as.data.frame.dynamitefit()
with
summary = TRUE
.
dynamite( dformula, data, time, group = NULL, priors = NULL, backend = "rstan", verbose = TRUE, verbose_stan = FALSE, stanc_options = list("O0"), threads_per_chain = 1L, grainsize = NULL, custom_stan_model = NULL, debug = NULL, ... ) ## S3 method for class 'dynamitefit' formula(x, ...) ## S3 method for class 'dynamitefit' print(x, full_diagnostics = FALSE, ...) ## S3 method for class 'dynamitefit' summary(object, ...)
dynamite( dformula, data, time, group = NULL, priors = NULL, backend = "rstan", verbose = TRUE, verbose_stan = FALSE, stanc_options = list("O0"), threads_per_chain = 1L, grainsize = NULL, custom_stan_model = NULL, debug = NULL, ... ) ## S3 method for class 'dynamitefit' formula(x, ...) ## S3 method for class 'dynamitefit' print(x, full_diagnostics = FALSE, ...) ## S3 method for class 'dynamitefit' summary(object, ...)
dformula |
[ |
data |
[ |
time |
[ |
group |
[ |
priors |
[ |
backend |
[ |
verbose |
[ |
verbose_stan |
[ |
stanc_options |
[ |
threads_per_chain |
[ |
grainsize |
[ |
custom_stan_model |
[ |
debug |
[ |
... |
For |
x |
[ |
full_diagnostics |
By default, the effective sample size (ESS) and Rhat
are computed only for the time- and group-invariant parameters
( |
object |
[ |
The best-case scalability of dynamite
in terms of data size should be
approximately linear in terms of number of time points and and number of
groups, but as wall-clock time of the MCMC algorithms provided by Stan can
depend on the discrepancy of the data and the model (and the subsequent
shape of the posterior), this can vary greatly.
dynamite
returns a dynamitefit
object which is a list containing
the following components:
stanfit
A stanfit
object, see rstan::sampling()
for details.
dformulas
A list of dynamiteformula
objects for internal use.
data
A processed version of the input data
.
data_name
Name of the input data object.
stan
A list
containing various elements related to Stan model
construction and sampling.
group_var
Name of the variable defining the groups.
time_var
Name of the variable defining the time index.
priors
Data frame containing the used priors.
backend
Either "rstan"
or "cmdstanr"
indicating which
package was used in sampling.
permutation
Randomized permutation of the posterior draws.
call
Original function call as an object of class call
.
formula
returns a quoted expression.
print
returns x
invisibly.
summary
returns a data.frame
.
Santtu Tikka and Jouni Helske (2023). dynamite: An R Package for Dynamic Multivariate Panel Models. arXiv preprint, https://arxiv.org/abs/2302.01607.
Jouni Helske and Santtu Tikka (2022). Estimating Causal Effects from Panel Data with Dynamic Multivariate Panel Models. SocArxiv preprint, https://osf.io/preprints/socarxiv/mdwu5/.
Model fitting
dynamice()
,
get_priors()
,
update.dynamitefit()
Model formula construction
dynamiteformula()
,
lags()
,
lfactor()
,
random_spec()
,
splines()
Model outputs
as.data.frame.dynamitefit()
,
as.data.table.dynamitefit()
,
as_draws_df.dynamitefit()
,
coef.dynamitefit()
,
confint.dynamitefit()
,
get_code()
,
get_data()
,
get_parameter_dims()
,
get_parameter_names()
,
get_parameter_types()
,
ndraws.dynamitefit()
,
nobs.dynamitefit()
data.table::setDTthreads(1) # For CRAN # Please update your rstan and StanHeaders installation before running # on Windows if (!identical(.Platform$OS.type, "windows")) { fit <- dynamite( dformula = obs(y ~ -1 + varying(~x), family = "gaussian") + lags(type = "varying") + splines(df = 20), gaussian_example, "time", "id", chains = 1, refresh = 0 ) } data.table::setDTthreads(1) # For CRAN formula(gaussian_example_fit) data.table::setDTthreads(1) # For CRAN print(gaussian_example_fit) data.table::setDTthreads(1) # For CRAN summary(gaussian_example_fit, types = "beta", probs = c(0.05, 0.1, 0.9, 0.95) )
data.table::setDTthreads(1) # For CRAN # Please update your rstan and StanHeaders installation before running # on Windows if (!identical(.Platform$OS.type, "windows")) { fit <- dynamite( dformula = obs(y ~ -1 + varying(~x), family = "gaussian") + lags(type = "varying") + splines(df = 20), gaussian_example, "time", "id", chains = 1, refresh = 0 ) } data.table::setDTthreads(1) # For CRAN formula(gaussian_example_fit) data.table::setDTthreads(1) # For CRAN print(gaussian_example_fit) data.table::setDTthreads(1) # For CRAN summary(gaussian_example_fit, types = "beta", probs = c(0.05, 0.1, 0.9, 0.95) )
These functions are provided for compatibility with older versions of the package. They will eventually be completely removed.
plot_betas(x, ...) plot_deltas(x, ...) plot_nus(x, ...) plot_lambdas(x, ...) plot_psis(x, ...)
plot_betas(x, ...) plot_deltas(x, ...) plot_nus(x, ...) plot_lambdas(x, ...) plot_psis(x, ...)
x |
[ |
... |
Not used. |
A ggplot
object.
plot_betas
is now called via plot(., types = "beta")
plot_deltas
is now called via plot(., types = "delta")
plot_nus
is now called via plot(., types = "nu")
plot_lambdas
is now called via plot(., types = "lambda")
plot_psis
is now called via plot(., types = "psi")
See plot.dynamitefit()
for documentation of the
parameters of these functions
Defines a new observational or a new auxiliary channel for the model using
standard R formula syntax. Formulas of individual response variables can be
joined together via +
. See 'Details' and the package vignettes for more
information. The function obs
is a shorthand alias for dynamiteformula
,
and aux
is a shorthand alias for
dynamiteformula(formula, family = "deterministic")
.
dynamiteformula(formula, family, link = NULL) obs(formula, family, link = NULL) aux(formula) ## S3 method for class 'dynamiteformula' e1 + e2 ## S3 method for class 'dynamiteformula' print(x, ...)
dynamiteformula(formula, family, link = NULL) obs(formula, family, link = NULL) aux(formula) ## S3 method for class 'dynamiteformula' e1 + e2 ## S3 method for class 'dynamiteformula' print(x, ...)
formula |
[ |
family |
[ |
link |
[ |
e1 |
[ |
e2 |
[ |
x |
[ |
... |
Ignored. |
Currently the dynamite package supports the following distributions for the observations:
Categorical: categorical
(with a softmax link using the first category
as reference). See the documentation of the categorical_logit_glm
in the
Stan function reference manual https://mc-stan.org/users/documentation/.
Multinomial: multinomial
(softmax link, first category is reference).
Gaussian: gaussian
(identity link, parameterized using mean and standard
deviation).
Multivariate Gaussian: mvgaussian
(identity link, parameterized using
mean vector, standard deviation vector and the Cholesky decomposition of
the correlation matrix).
Poisson: poisson
(log-link, with an optional known offset variable).
Negative-binomial: negbin
(log-link, using mean and dispersion
parameterization, with an optional known offset variable). See the
documentation on NegBinomial2
in the Stan function reference manual.
Bernoulli: bernoulli
(logit-link).
Binomial: binomial
(logit-link).
Exponential: exponential
(log-link).
Gamma: gamma
(log-link, using mean and shape parameterization).
Beta: beta
(logit-link, using mean and precision parameterization).
Student t: student
(identity link, parameterized using degrees of
freedom, location and scale)
The models in the dynamite package are defined by combining the
channel-specific formulas defined via R formula syntax.
Each channel is defined via the obs
function, and the channels are
combined with +
. For example a formula
obs(y ~ lag(x), family = "gaussian") + obs(x ~ z, family = "poisson")
defines a model with two channels;
first we declare that y
is a Gaussian variable depending on a previous
value of x
(lag(x)
), and then we add a second channel declaring x
as
Poisson distributed depending on some exogenous variable z
(for which we do not define any distribution).
Number of trials for binomial channels should be defined via a trials
model component, e.g., obs(y ~ x + trials(n), family = "binomial")
,
where n
is a data variable defining the number of trials. For multinomial
channels, the number of trials is automatically defined to be the sum
of the observations over the categories, but can also be defined using
the trials
component, for example for prediction.
Multivariate channels are defined by providing a single formula for all
components or by providing component-specific formulas separated by a |
.
The response variables that correspond to the components should be joined by
c()
. For instance, the following would define c(y1, y2)
as multivariate
gaussian with x
as a predictor for the mean of the first component and
x
and z
as predictors for the mean of the second component:
obs(c(y1, y2) ~ x | x + z, family = "mvgaussian")
. A multinomial channel
should only have a single formula.
In addition to declaring response variables via obs
, we can also use
the function aux
to define auxiliary channels which are deterministic
functions of other variables. The values of auxiliary variables are computed
dynamically during prediction, making the use of lagged values and other
transformations possible. The function aux
also does not use the
family
argument, which is automatically set to deterministic
and is a
special channel type of obs
. Note that lagged values of deterministic
aux
channels do not imply fixed time points. Instead they must be given
starting values using a special function init
that directly initializes
the lags to specified values, or by past
which computes the initial values
based on an R expression. Both init
and past
should appear on the
right hand side of the model formula, separated from the primary defining
expression via |
.
The formula within obs
can also contain an additional special
function varying
, which defines the time-varying part of the model
equation, in which case we could write for example
obs(x ~ z + varying(~ -1 + w), family = "poisson")
, which defines a model
equation with a constant intercept and time-invariant effect of z
, and a
time-varying effect of w
. We also remove the duplicate intercept with -1
in order to avoid identifiability issues in the model estimation
(we could also define a time varying intercept, in which case we would write
obs(x ~ -1 + z + varying(~ w), family = "poisson)
). The part of the formula
not wrapped with varying
is assumed to correspond to the fixed part of the
model, so obs(x ~ z + varying(~ -1 + w), family = "poisson")
is actually
identical to
obs(x ~ -1 + fixed(~ z) + varying(~ -1 + w), family = "poisson")
and
obs(x ~ fixed(~ z) + varying(~ -1 + w), family = "poisson")
.
When defining varying effects, we also need to define how the these
time-varying regression coefficient behave. For this, a splines
component
should be added to the model, e.g.,
obs(x ~ varying(~ -1 + w), family = "poisson) + splines(df = 10)
defines a
cubic B-spline with 10 degrees of freedom for the time-varying coefficient
corresponding to the w
. If the model contains multiple time-varying
coefficients, same spline basis is used for all coefficients, with unique
spline coefficients and their standard deviation.
If the desired model contains lagged predictors of each response in each
channel, these can be quickly added to the model as either time-invariant
or time-varying predictors via lags()
instead of writing them manually
for each channel.
It is also possible to define group-specific (random) effects term
using the special syntax random()
similarly as varying()
. For example,
random(~1)
leads to a model where in addition to the common intercept,
each individual/group has their own intercept with zero-mean normal prior and
unknown standard deviation analogously with the typical mixed models. An
additional model component random_spec()
can be used to define
whether the random effects are allowed to correlate within and across
channels and whether to use centered or noncentered parameterization for
the random effects.
A dynamiteformula
object.
Model formula construction
dynamite()
,
lags()
,
lfactor()
,
random_spec()
,
splines()
data.table::setDTthreads(1) # For CRAN # A single gaussian response channel with a time-varying effect of 'x', # and a time-varying effect of the lag of 'y' using B-splines with # 20 degrees of freedom for the coefficients of the time-varying terms. obs(y ~ -1 + varying(~x), family = "gaussian") + lags(type = "varying") + splines(df = 20) # A two-channel categorical model with time-invariant predictors # here, lag terms are specified manually obs(x ~ z + lag(x) + lag(y), family = "categorical") + obs(y ~ z + lag(x) + lag(y), family = "categorical") # The same categorical model as above, but with the lag terms # added using 'lags' obs(x ~ z, family = "categorical") + obs(y ~ z, family = "categorical") + lags(type = "fixed") # A multichannel model with a gaussian, Poisson and a Bernoulli response and # an auxiliary channel for the logarithm of 'p' plus one obs(g ~ lag(g) + lag(logp), family = "gaussian") + obs(p ~ lag(g) + lag(logp) + lag(b), family = "poisson") + obs(b ~ lag(b) * lag(logp) + lag(b) * lag(g), family = "bernoulli") + aux(numeric(logp) ~ log(p + 1)) data.table::setDTthreads(1) # For CRAN obs(y ~ x, family = "gaussian") + obs(z ~ w, family = "exponential") data.table::setDTthreads(1) # For CRAN x <- obs(y ~ x + random(~ 1 + lag(d)), family = "gaussian") + obs(z ~ varying(~w), family = "exponential") + aux(numeric(d) ~ log(y) | init(c(0, 1))) + lags(k = 2) + splines(df = 5) + random_spec(correlated = FALSE) print(x)
data.table::setDTthreads(1) # For CRAN # A single gaussian response channel with a time-varying effect of 'x', # and a time-varying effect of the lag of 'y' using B-splines with # 20 degrees of freedom for the coefficients of the time-varying terms. obs(y ~ -1 + varying(~x), family = "gaussian") + lags(type = "varying") + splines(df = 20) # A two-channel categorical model with time-invariant predictors # here, lag terms are specified manually obs(x ~ z + lag(x) + lag(y), family = "categorical") + obs(y ~ z + lag(x) + lag(y), family = "categorical") # The same categorical model as above, but with the lag terms # added using 'lags' obs(x ~ z, family = "categorical") + obs(y ~ z, family = "categorical") + lags(type = "fixed") # A multichannel model with a gaussian, Poisson and a Bernoulli response and # an auxiliary channel for the logarithm of 'p' plus one obs(g ~ lag(g) + lag(logp), family = "gaussian") + obs(p ~ lag(g) + lag(logp) + lag(b), family = "poisson") + obs(b ~ lag(b) * lag(logp) + lag(b) * lag(g), family = "bernoulli") + aux(numeric(logp) ~ log(p + 1)) data.table::setDTthreads(1) # For CRAN obs(y ~ x, family = "gaussian") + obs(z ~ w, family = "exponential") data.table::setDTthreads(1) # For CRAN x <- obs(y ~ x + random(~ 1 + lag(d)), family = "gaussian") + obs(z ~ varying(~w), family = "exponential") + aux(numeric(d) ~ log(y) | init(c(0, 1))) + lags(k = 2) + splines(df = 5) + random_spec(correlated = FALSE) print(x)
Fitted values for a dynamitefit
object, i.e.,
$E(y_t | newdata, \theta)$
where $\theta$
contains all the
model parameters. See also predict.dynamitefit()
for multi-step
predictions.
## S3 method for class 'dynamitefit' fitted( object, newdata = NULL, n_draws = NULL, thin = 1, expand = TRUE, df = TRUE, ... )
## S3 method for class 'dynamitefit' fitted( object, newdata = NULL, n_draws = NULL, thin = 1, expand = TRUE, df = TRUE, ... )
object |
[ |
newdata |
[ |
n_draws |
[ |
thin |
[ |
expand |
[ |
df |
[ |
... |
Ignored. |
A data.frame
containing the fitted values.
Obtaining predictions
predict.dynamitefit()
data.table::setDTthreads(1) # For CRAN fitted(gaussian_example_fit, n_draws = 2L) set.seed(1) # Please update your rstan and StanHeaders installation before running # on Windows if (!identical(.Platform$OS.type, "windows")) { fit <- dynamite( dformula = obs(LakeHuron ~ 1, "gaussian") + lags(), data = data.frame(LakeHuron, time = seq_len(length(LakeHuron)), id = 1), time = "time", group = "id", chains = 1, refresh = 0 ) if (requireNamespace("dplyr") && requireNamespace("tidyr") && base::getRversion() >= "4.1.0") { # One-step ahead samples (fitted values) from the posterior # (first time point is fixed due to lag in the model): fitted(fit) |> dplyr::filter(time > 2) |> ggplot2::ggplot(ggplot2::aes(time, LakeHuron_fitted, group = .draw)) + ggplot2::geom_line(alpha = 0.5) + # observed values ggplot2::geom_line(ggplot2::aes(y = LakeHuron), colour = "tomato") + ggplot2::theme_bw() # Posterior predictive distribution given the first time point: predict(fit, type = "mean") |> dplyr::filter(time > 2) |> ggplot2::ggplot(ggplot2::aes(time, LakeHuron_mean, group = .draw)) + ggplot2::geom_line(alpha = 0.5) + # observed values ggplot2::geom_line(ggplot2::aes(y = LakeHuron), colour = "tomato") + ggplot2::theme_bw() } }
data.table::setDTthreads(1) # For CRAN fitted(gaussian_example_fit, n_draws = 2L) set.seed(1) # Please update your rstan and StanHeaders installation before running # on Windows if (!identical(.Platform$OS.type, "windows")) { fit <- dynamite( dformula = obs(LakeHuron ~ 1, "gaussian") + lags(), data = data.frame(LakeHuron, time = seq_len(length(LakeHuron)), id = 1), time = "time", group = "id", chains = 1, refresh = 0 ) if (requireNamespace("dplyr") && requireNamespace("tidyr") && base::getRversion() >= "4.1.0") { # One-step ahead samples (fitted values) from the posterior # (first time point is fixed due to lag in the model): fitted(fit) |> dplyr::filter(time > 2) |> ggplot2::ggplot(ggplot2::aes(time, LakeHuron_fitted, group = .draw)) + ggplot2::geom_line(alpha = 0.5) + # observed values ggplot2::geom_line(ggplot2::aes(y = LakeHuron), colour = "tomato") + ggplot2::theme_bw() # Posterior predictive distribution given the first time point: predict(fit, type = "mean") |> dplyr::filter(time > 2) |> ggplot2::ggplot(ggplot2::aes(time, LakeHuron_mean, group = .draw)) + ggplot2::geom_line(alpha = 0.5) + # observed values ggplot2::geom_line(ggplot2::aes(y = LakeHuron), colour = "tomato") + ggplot2::theme_bw() } }
Simulated data containing a Gaussian response variable y
with two
covariates. The dataset was generated from a model with time-varying effects
of covariate x
and the lagged value of the response variable, time-varying
intercept, and time-invariant effect of covariate z
. The time-varying
coefficients vary according to a spline with 20 degrees of freedom.
gaussian_example
gaussian_example
A data frame with 3000 rows and 5 variables:
The response variable.
A continuous covariate.
A binary covariate.
Variable defining individuals (1 to 50).
Variable defining the time point of the measurement (1 to 30).
The data was generated via gaussian_example.R
in
https://github.com/ropensci/dynamite/tree/main/data-raw/
Example models
categorical_example
,
categorical_example_fit
,
gaussian_example_fit
,
multichannel_example
,
multichannel_example_fit
A dynamitefit
object obtained by running dynamite
on the
gaussian_example
dataset as
set.seed(1) library(dynamite) gaussian_example_fit <- dynamite( obs(y ~ -1 + z + varying(~ x + lag(y)) + random(~1), family = "gaussian") + random_spec() + splines(df = 20), data = gaussian_example, time = "time", group = "id", iter = 2000, warmup = 1000, thin = 10, chains = 2, cores = 2, refresh = 0, save_warmup = FALSE, pars = c("omega_alpha_1_y", "omega_raw_alpha_y", "nu_raw", "nu", "L", "sigma_nu", "a_y"), include = FALSE )
Note the very small number of samples due to size restrictions on CRAN.
gaussian_example_fit
gaussian_example_fit
A dynamitefit
object.
The data was generated via gaussian_example_fit.R
in
https://github.com/ropensci/dynamite/tree/main/data-raw/
Example models
categorical_example
,
categorical_example_fit
,
gaussian_example
,
multichannel_example
,
multichannel_example_fit
Returns the Stan code of the model. Mostly useful for debugging or for building a customized version of the model.
get_code(x, ...) ## S3 method for class 'dynamiteformula' get_code(x, data, time, group = NULL, blocks = NULL, ...) ## S3 method for class 'dynamitefit' get_code(x, blocks = NULL, ...)
get_code(x, ...) ## S3 method for class 'dynamiteformula' get_code(x, data, time, group = NULL, blocks = NULL, ...) ## S3 method for class 'dynamitefit' get_code(x, blocks = NULL, ...)
x |
[ |
... |
Ignored. |
data |
[ |
time |
[ |
group |
[ |
blocks |
[ |
The Stan model blocks as a character
string.
Model outputs
as.data.frame.dynamitefit()
,
as.data.table.dynamitefit()
,
as_draws_df.dynamitefit()
,
coef.dynamitefit()
,
confint.dynamitefit()
,
dynamite()
,
get_data()
,
get_parameter_dims()
,
get_parameter_names()
,
get_parameter_types()
,
ndraws.dynamitefit()
,
nobs.dynamitefit()
data.table::setDTthreads(1) # For CRAN d <- data.frame(y = rnorm(10), x = 1:10, time = 1:10, id = 1) cat(get_code(obs(y ~ x, family = "gaussian"), data = d, time = "time", group = "id" )) # same as cat(dynamite(obs(y ~ x, family = "gaussian"), data = d, time = "time", group = "id", debug = list(model_code = TRUE, no_compile = TRUE) )$model_code)
data.table::setDTthreads(1) # For CRAN d <- data.frame(y = rnorm(10), x = 1:10, time = 1:10, id = 1) cat(get_code(obs(y ~ x, family = "gaussian"), data = d, time = "time", group = "id" )) # same as cat(dynamite(obs(y ~ x, family = "gaussian"), data = d, time = "time", group = "id", debug = list(model_code = TRUE, no_compile = TRUE) )$model_code)
Returns the input data to the Stan model. Mostly useful for debugging.
get_data(x, ...) ## S3 method for class 'dynamiteformula' get_data(x, data, time, group = NULL, ...) ## S3 method for class 'dynamitefit' get_data(x, ...)
get_data(x, ...) ## S3 method for class 'dynamiteformula' get_data(x, data, time, group = NULL, ...) ## S3 method for class 'dynamitefit' get_data(x, ...)
x |
[ |
... |
Ignored. |
data |
[ |
time |
[ |
group |
[ |
A list
containing the input data to Stan.
Model outputs
as.data.frame.dynamitefit()
,
as.data.table.dynamitefit()
,
as_draws_df.dynamitefit()
,
coef.dynamitefit()
,
confint.dynamitefit()
,
dynamite()
,
get_code()
,
get_parameter_dims()
,
get_parameter_names()
,
get_parameter_types()
,
ndraws.dynamitefit()
,
nobs.dynamitefit()
data.table::setDTthreads(1) # For CRAN d <- data.frame(y = rnorm(10), x = 1:10, time = 1:10, id = 1) str(get_data(obs(y ~ x, family = "gaussian"), data = d, time = "time", group = "id" ))
data.table::setDTthreads(1) # For CRAN d <- data.frame(y = rnorm(10), x = 1:10, time = 1:10, id = 1) str(get_data(obs(y ~ x, family = "gaussian"), data = d, time = "time", group = "id" ))
Extracts the names and dimensions of all parameters used in the
dynamite
model. See also get_parameter_types()
and
get_parameter_names()
. The returned dimensions match those of
the stanfit
element of the dynamitefit
object. When applied to
dynamiteformula
objects, the model is compiled and sampled for 1 iteration
to get the parameter dimensions.
get_parameter_dims(x, ...) ## S3 method for class 'dynamiteformula' get_parameter_dims(x, data, time, group = NULL, ...) ## S3 method for class 'dynamitefit' get_parameter_dims(x, ...)
get_parameter_dims(x, ...) ## S3 method for class 'dynamiteformula' get_parameter_dims(x, data, time, group = NULL, ...) ## S3 method for class 'dynamitefit' get_parameter_dims(x, ...)
x |
[ |
... |
Ignored. |
data |
[ |
time |
[ |
group |
[ |
A named list with all parameter dimensions of the input model.
Model outputs
as.data.frame.dynamitefit()
,
as.data.table.dynamitefit()
,
as_draws_df.dynamitefit()
,
coef.dynamitefit()
,
confint.dynamitefit()
,
dynamite()
,
get_code()
,
get_data()
,
get_parameter_names()
,
get_parameter_types()
,
ndraws.dynamitefit()
,
nobs.dynamitefit()
data.table::setDTthreads(1) # For CRAN get_parameter_dims(multichannel_example_fit)
data.table::setDTthreads(1) # For CRAN get_parameter_dims(multichannel_example_fit)
Extracts all parameter names of used in the dynamitefit
object.
get_parameter_names(x, types = NULL, ...) ## S3 method for class 'dynamitefit' get_parameter_names(x, types = NULL, ...)
get_parameter_names(x, types = NULL, ...) ## S3 method for class 'dynamitefit' get_parameter_names(x, types = NULL, ...)
x |
[ |
types |
[ |
... |
Ignored. |
The naming of parameters generally follows style where the name starts with the parameter type (e.g. beta for time-invariant regression coefficient), followed by underscore and the name of the response variable, and in case of time-invariant, time-varying or random effect, the name of the predictor. An exception to this is spline coefficients omega, which also contain the number denoting the knot number.
A character
vector with parameter names of the input model.
Model outputs
as.data.frame.dynamitefit()
,
as.data.table.dynamitefit()
,
as_draws_df.dynamitefit()
,
coef.dynamitefit()
,
confint.dynamitefit()
,
dynamite()
,
get_code()
,
get_data()
,
get_parameter_dims()
,
get_parameter_types()
,
ndraws.dynamitefit()
,
nobs.dynamitefit()
data.table::setDTthreads(1) # For CRAN get_parameter_names(multichannel_example_fit)
data.table::setDTthreads(1) # For CRAN get_parameter_names(multichannel_example_fit)
Extracts all parameter types of used in the dynamitefit
object. See
as.data.frame.dynamitefit()
for explanations of different types.
get_parameter_types(x, ...) ## S3 method for class 'dynamitefit' get_parameter_types(x, ...)
get_parameter_types(x, ...) ## S3 method for class 'dynamitefit' get_parameter_types(x, ...)
x |
[ |
... |
Ignored. |
A character
vector with all parameter types of the input model.
Model outputs
as.data.frame.dynamitefit()
,
as.data.table.dynamitefit()
,
as_draws_df.dynamitefit()
,
coef.dynamitefit()
,
confint.dynamitefit()
,
dynamite()
,
get_code()
,
get_data()
,
get_parameter_dims()
,
get_parameter_names()
,
ndraws.dynamitefit()
,
nobs.dynamitefit()
data.table::setDTthreads(1) # For CRAN get_parameter_types(multichannel_example_fit)
data.table::setDTthreads(1) # For CRAN get_parameter_types(multichannel_example_fit)
Extracts the priors used in the dynamite
model as a data frame. You
can then alter the priors by changing the contents of the prior
column and
supplying this data frame to dynamite
function using the argument
priors
. See vignettes for details.
get_priors(x, ...) ## S3 method for class 'dynamiteformula' get_priors(x, data, time, group = NULL, ...) ## S3 method for class 'dynamitefit' get_priors(x, ...)
get_priors(x, ...) ## S3 method for class 'dynamiteformula' get_priors(x, data, time, group = NULL, ...) ## S3 method for class 'dynamitefit' get_priors(x, ...)
x |
[ |
... |
Ignored. |
data |
[ |
time |
[ |
group |
[ |
A data.frame
containing the prior definitions.
Only the prior
column of the output should be altered when defining
the user-defined priors for dynamite
.
Model fitting
dynamice()
,
dynamite()
,
update.dynamitefit()
data.table::setDTthreads(1) # For CRAN d <- data.frame(y = rnorm(10), x = 1:10, time = 1:10, id = 1) get_priors(obs(y ~ x, family = "gaussian"), data = d, time = "time", group = "id" )
data.table::setDTthreads(1) # For CRAN d <- data.frame(y = rnorm(10), x = 1:10, time = 1:10, id = 1) get_priors(obs(y ~ x, family = "gaussian"), data = d, time = "time", group = "id" )
Prints the divergences, saturated treedepths, and low E-BFMI warnings.
hmc_diagnostics(x, ...) ## S3 method for class 'dynamitefit' hmc_diagnostics(x, ...)
hmc_diagnostics(x, ...) ## S3 method for class 'dynamitefit' hmc_diagnostics(x, ...)
x |
[ |
... |
Ignored. |
Returns x
(invisibly).
data.table::setDTthreads(1) # For CRAN
hmc_diagnostics(gaussian_example_fit)
Model diagnostics
lfo()
,
loo.dynamitefit()
,
mcmc_diagnostics()
Adds the lagged value of the response of each channel specified via
dynamiteformula()
as a predictor to each channel. The added predictors
can be either time-varying or time-invariant.
lags(k = 1L, type = c("fixed", "varying", "random"))
lags(k = 1L, type = c("fixed", "varying", "random"))
k |
[ |
type |
[ |
An object of class lags
.
Model formula construction
dynamite()
,
dynamiteformula()
,
lfactor()
,
random_spec()
,
splines()
data.table::setDTthreads(1) # For CRAN obs(y ~ -1 + varying(~x), family = "gaussian") + lags(type = "varying") + splines(df = 20) # A two-channel categorical model with time-invariant predictors # here, lag terms are specified manually obs(x ~ z + lag(x) + lag(y), family = "categorical") + obs(y ~ z + lag(x) + lag(y), family = "categorical") # The same categorical model as above, but with the lag terms # added using 'lags' obs(x ~ z, family = "categorical") + obs(y ~ z, family = "categorical") + lags(type = "fixed")
data.table::setDTthreads(1) # For CRAN obs(y ~ -1 + varying(~x), family = "gaussian") + lags(type = "varying") + splines(df = 20) # A two-channel categorical model with time-invariant predictors # here, lag terms are specified manually obs(x ~ z + lag(x) + lag(y), family = "categorical") + obs(y ~ z + lag(x) + lag(y), family = "categorical") # The same categorical model as above, but with the lag terms # added using 'lags' obs(x ~ z, family = "categorical") + obs(y ~ z, family = "categorical") + lags(type = "fixed")
This function can be used as part of a dynamiteformula()
to define
a common latent factor component. The latent factor is modeled as a spline
similarly as a time-varying intercept, but instead of having equal effect on
each group, there is an additional loading variable for each group so that
in the linear predictor we have a term $\lambda_i \psi_t$
for each
group $i$
.
lfactor( responses = NULL, nonzero_lambda = TRUE, correlated = TRUE, noncentered_psi = FALSE, flip_sign = TRUE )
lfactor( responses = NULL, nonzero_lambda = TRUE, correlated = TRUE, noncentered_psi = FALSE, flip_sign = TRUE )
responses |
[ |
nonzero_lambda |
[ |
correlated |
[ |
noncentered_psi |
[ |
flip_sign |
[ |
An object of class latent_factor
.
Model formula construction
dynamite()
,
dynamiteformula()
,
lags()
,
random_spec()
,
splines()
data.table::setDTthreads(1) # For CRAN # three channel model with common factor affecting for responses x and y obs(y ~ 1, family = "gaussian") + obs(x ~ 1, family = "poisson") + obs(z ~ 1, family = "gaussian") + lfactor( responses = c("y", "x"), nonzero_lambda = c(TRUE, FALSE), correlated = TRUE, noncentered_psi = FALSE )
data.table::setDTthreads(1) # For CRAN # three channel model with common factor affecting for responses x and y obs(y ~ 1, family = "gaussian") + obs(x ~ 1, family = "poisson") + obs(z ~ 1, family = "gaussian") + lfactor( responses = c("y", "x"), nonzero_lambda = c(TRUE, FALSE), correlated = TRUE, noncentered_psi = FALSE )
Estimates the leave-future-out (LFO) information criterion for dynamite
models using Pareto smoothed importance sampling.
lfo(x, ...) ## S3 method for class 'dynamitefit' lfo(x, L, verbose = TRUE, k_threshold = 0.7, ...)
lfo(x, ...) ## S3 method for class 'dynamitefit' lfo(x, L, verbose = TRUE, k_threshold = 0.7, ...)
x |
[ |
... |
Additional arguments passed to |
L |
[ |
verbose |
[ |
k_threshold |
[ |
For multichannel models, the log-likelihoods of all channels are combined. For models with groups, expected log predictive densities (ELPDs) are computed independently for each group, but the re-estimation of the model is triggered if Pareto k values of any group exceeds the threshold.
An lfo
object which is a list
with the following components:
ELPD
Expected log predictive density estimate.
ELPD_SE
Standard error of ELPD. This is a crude approximation which
does not take into account potential serial correlations.
pareto_k
Pareto k values.
refits
Time points where model was re-estimated.
L
L value used in the LFO estimation.
k_threshold
Threshold used in the LFO estimation.
Paul-Christian Bürkner, Jonah Gabry, and Aki Vehtari (2020). Approximate leave-future-out cross-validation for Bayesian time series models, Journal of Statistical Computation and Simulation, 90:14, 2499-2523.
Model diagnostics
hmc_diagnostics()
,
loo.dynamitefit()
,
mcmc_diagnostics()
data.table::setDTthreads(1) # For CRAN # Please update your rstan and StanHeaders installation before running # on Windows if (!identical(.Platform$OS.type, "windows")) { # this gives warnings due to the small number of iterations out <- suppressWarnings( lfo(gaussian_example_fit, L = 20, chains = 1, cores = 1) ) out$ELPD out$ELPD_SE plot(out) }
data.table::setDTthreads(1) # For CRAN # Please update your rstan and StanHeaders installation before running # on Windows if (!identical(.Platform$OS.type, "windows")) { # this gives warnings due to the small number of iterations out <- suppressWarnings( lfo(gaussian_example_fit, L = 20, chains = 1, cores = 1) ) out$ELPD out$ELPD_SE plot(out) }
Estimates the leave-one-out (LOO) information criterion for dynamite
models using Pareto smoothed importance sampling with the loo package.
## S3 method for class 'dynamitefit' loo(x, separate_channels = FALSE, thin = 1L, ...)
## S3 method for class 'dynamitefit' loo(x, separate_channels = FALSE, thin = 1L, ...)
x |
[ |
separate_channels |
[ |
thin |
[ |
... |
Ignored. |
An output from loo::loo()
or a list of such outputs (if
separate_channels
was TRUE
).
Aki Vehtari, Andrew, Gelman, and Johah Gabry (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432.
Model diagnostics
hmc_diagnostics()
,
lfo()
,
mcmc_diagnostics()
data.table::setDTthreads(1) # For CRAN # Please update your rstan and StanHeaders installation before running # on Windows if (!identical(.Platform$OS.type, "windows")) { # this gives warnings due to the small number of iterations suppressWarnings(loo(gaussian_example_fit)) suppressWarnings(loo(gaussian_example_fit, separate_channels = TRUE)) }
data.table::setDTthreads(1) # For CRAN # Please update your rstan and StanHeaders installation before running # on Windows if (!identical(.Platform$OS.type, "windows")) { # this gives warnings due to the small number of iterations suppressWarnings(loo(gaussian_example_fit)) suppressWarnings(loo(gaussian_example_fit, separate_channels = TRUE)) }
Prints HMC diagnostics and lists parameters with smallest effective sample
sizes and largest Rhat values. See hmc_diagnostics()
and
posterior::default_convergence_measures()
for details.
mcmc_diagnostics(x, ...) ## S3 method for class 'dynamitefit' mcmc_diagnostics(x, n = 3L, ...)
mcmc_diagnostics(x, ...) ## S3 method for class 'dynamitefit' mcmc_diagnostics(x, n = 3L, ...)
x |
[ |
... |
Ignored. |
n |
[ |
Returns x
(invisibly).
Model diagnostics
hmc_diagnostics()
,
lfo()
,
loo.dynamitefit()
data.table::setDTthreads(1) # For CRAN mcmc_diagnostics(gaussian_example_fit)
data.table::setDTthreads(1) # For CRAN mcmc_diagnostics(gaussian_example_fit)
A simulated multichannel data containing multiple individuals with multiple response variables of different distributions.
multichannel_example
multichannel_example
A data frame with 3000 rows and 5 variables:
Variable defining individuals (1 to 50).
Variable defining the time point of the measurement (1 to 20).
Response variable following gaussian distribution.
Response variable following Poisson distribution.
Response variable following Bernoulli distribution.
The data was generated via multichannel_example.R
in
https://github.com/ropensci/dynamite/tree/main/data-raw/
Example models
categorical_example
,
categorical_example_fit
,
gaussian_example
,
gaussian_example_fit
,
multichannel_example_fit
A dynamitefit
object obtained by running dynamite
on the
multichannel_example
dataset as
set.seed(1) library(dynamite) f <- obs(g ~ lag(g) + lag(logp), family = "gaussian") + obs(p ~ lag(g) + lag(logp) + lag(b), family = "poisson") + obs(b ~ lag(b) * lag(logp) + lag(b) * lag(g), family = "bernoulli") + aux(numeric(logp) ~ log(p + 1)) multichannel_example_fit <- dynamite( f, data = multichannel_example, time = "time", group = "id", chains = 1, cores = 1, iter = 2000, warmup = 1000, init = 0, refresh = 0, thin = 5, save_warmup = FALSE )
Note the small number of samples due to size restrictions on CRAN.
multichannel_example_fit
multichannel_example_fit
A dynamitefit
object.
THe data was generated via multichannel_example_fit.R
in
https://github.com/ropensci/dynamite/tree/main/data-raw/
Example models
categorical_example
,
categorical_example_fit
,
gaussian_example
,
gaussian_example_fit
,
multichannel_example
dynamitefit
ObjectReturn the Number of Posterior Draws of a dynamitefit
Object
## S3 method for class 'dynamitefit' ndraws(x)
## S3 method for class 'dynamitefit' ndraws(x)
x |
[ |
Number of posterior draws as a single integer
value.
Model outputs
as.data.frame.dynamitefit()
,
as.data.table.dynamitefit()
,
as_draws_df.dynamitefit()
,
coef.dynamitefit()
,
confint.dynamitefit()
,
dynamite()
,
get_code()
,
get_data()
,
get_parameter_dims()
,
get_parameter_names()
,
get_parameter_types()
,
nobs.dynamitefit()
data.table::setDTthreads(1) # For CRAN ndraws(gaussian_example_fit)
data.table::setDTthreads(1) # For CRAN ndraws(gaussian_example_fit)
Extract the Number of Observations Used to Fit a dynamite Model
## S3 method for class 'dynamitefit' nobs(object, ...)
## S3 method for class 'dynamitefit' nobs(object, ...)
object |
[ |
... |
Not used. |
Total number of non-missing observations as an integer
.
Model outputs
as.data.frame.dynamitefit()
,
as.data.table.dynamitefit()
,
as_draws_df.dynamitefit()
,
coef.dynamitefit()
,
confint.dynamitefit()
,
dynamite()
,
get_code()
,
get_data()
,
get_parameter_dims()
,
get_parameter_names()
,
get_parameter_types()
,
ndraws.dynamitefit()
data.table::setDTthreads(1) # For CRAN nobs(gaussian_example_fit)
data.table::setDTthreads(1) # For CRAN nobs(gaussian_example_fit)
dynamitefit
ObjectsProduces the traceplots and the density plots of the model parameters. Can
also be used to plot the time-varying and time-invariant parameters of the
model along with their posterior intervals. See the plot_type
argument
for details on available plots.
## S3 method for class 'dynamitefit' plot( x, plot_type = c("default", "trace", "dag"), types = NULL, parameters = NULL, responses = NULL, groups = NULL, times = NULL, level = 0.05, alpha = 0.5, facet = TRUE, scales = c("fixed", "free"), n_params = NULL, ... )
## S3 method for class 'dynamitefit' plot( x, plot_type = c("default", "trace", "dag"), types = NULL, parameters = NULL, responses = NULL, groups = NULL, times = NULL, level = 0.05, alpha = 0.5, facet = TRUE, scales = c("fixed", "free"), n_params = NULL, ... )
x |
[ |
plot_type |
[ |
types |
[ |
parameters |
[ |
responses |
[ |
groups |
[ |
times |
[ |
level |
[ |
alpha |
[ |
facet |
[ |
scales |
[ |
n_params |
[ |
... |
Arguments passed to |
A ggplot
object.
Drawing plots
plot.dynamiteformula()
data.table::setDTthreads(1) # For CRAN plot(gaussian_example_fit, type = "beta")
data.table::setDTthreads(1) # For CRAN plot(gaussian_example_fit, type = "beta")
Plot a snapshot of the model structure at a specific time point with a window of the highest-order lag dependency both into the past and the future as a directed acyclic graph (DAG). Only response variables are shown in the plot. This function can also produce a TikZ code of the DAG to be used in reports and publications.
## S3 method for class 'dynamiteformula' plot( x, show_auxiliary = TRUE, show_covariates = FALSE, tikz = FALSE, vertex_size = 0.25, label_size = 18, ... )
## S3 method for class 'dynamiteformula' plot( x, show_auxiliary = TRUE, show_covariates = FALSE, tikz = FALSE, vertex_size = 0.25, label_size = 18, ... )
x |
[ |
show_auxiliary |
[ |
show_covariates |
[ |
tikz |
[ |
vertex_size |
[ |
label_size |
[ |
... |
Not used.. |
A ggplot
object, or a character
string if tikz = TRUE
.
Drawing plots
plot.dynamitefit()
data.table::setDTthreads(1) # For CRAN multichannel_formula <- obs(g ~ lag(g) + lag(logp), family = "gaussian") + obs(p ~ lag(g) + lag(logp) + lag(b), family = "poisson") + obs(b ~ lag(b) * lag(logp) + lag(b) * lag(g), family = "bernoulli") + aux(numeric(logp) ~ log(p + 1)) # A ggplot plot(multichannel_formula) # TikZ format plot(multichannel_formula, tikz = TRUE)
data.table::setDTthreads(1) # For CRAN multichannel_formula <- obs(g ~ lag(g) + lag(logp), family = "gaussian") + obs(p ~ lag(g) + lag(logp) + lag(b), family = "poisson") + obs(b ~ lag(b) * lag(logp) + lag(b) * lag(g), family = "bernoulli") + aux(numeric(logp) ~ log(p + 1)) # A ggplot plot(multichannel_formula) # TikZ format plot(multichannel_formula, tikz = TRUE)
Plots Pareto k values per each time point (with one point per group), together with a horizontal line representing the used threshold.
## S3 method for class 'lfo' plot(x, ...)
## S3 method for class 'lfo' plot(x, ...)
x |
[ |
... |
Ignored. |
A ggplot
object.
data.table::setDTthreads(1) # For CRAN # Please update your rstan and StanHeaders installation before running # on Windows if (!identical(.Platform$OS.type, "windows")) { # This gives warnings due to the small number of iterations plot(suppressWarnings( lfo(gaussian_example_fit, L = 20, chains = 1, cores = 1) )) }
data.table::setDTthreads(1) # For CRAN # Please update your rstan and StanHeaders installation before running # on Windows if (!identical(.Platform$OS.type, "windows")) { # This gives warnings due to the small number of iterations plot(suppressWarnings( lfo(gaussian_example_fit, L = 20, chains = 1, cores = 1) )) }
Obtain counterfactual predictions for a dynamitefit
object.
## S3 method for class 'dynamitefit' predict( object, newdata = NULL, type = c("response", "mean", "link"), funs = list(), impute = c("none", "locf", "nocb"), new_levels = c("none", "bootstrap", "gaussian", "original"), global_fixed = FALSE, n_draws = NULL, thin = 1, expand = TRUE, df = TRUE, ... )
## S3 method for class 'dynamitefit' predict( object, newdata = NULL, type = c("response", "mean", "link"), funs = list(), impute = c("none", "locf", "nocb"), new_levels = c("none", "bootstrap", "gaussian", "original"), global_fixed = FALSE, n_draws = NULL, thin = 1, expand = TRUE, df = TRUE, ... )
object |
[ |
newdata |
[ |
type |
[ |
funs |
[ |
impute |
[ |
new_levels |
[
This argument is ignored if the model does not contain random effects. |
global_fixed |
[ |
n_draws |
[ |
thin |
[ |
expand |
[ |
df |
[ |
... |
Ignored. |
Note that forecasting (i.e., predictions for time indices beyond the last
time index in the original data) is not supported by the dynamite
package. However, such predictions can be obtained by augmenting the
original data with NA
values before model estimation.
A data.frame
containing the predicted values or a list
of two
data.frames
. See the expand
argument for details. Note that the
.draw
column is not the same as .draw
from as.data.frame
and
as_draws
methods as predict
uses permuted samples. A mapping between
these variables can be done using information in
object$stanfit@sim$permutation
.
Obtaining predictions
fitted.dynamitefit()
data.table::setDTthreads(1) # For CRAN out <- predict(gaussian_example_fit, type = "response", n_draws = 2L) head(out) # using summary functions sumr <- predict(multichannel_example_fit, type = "mean", funs = list(g = list(m = mean, s = sd), b = list(sum = sum)), n_draws = 2L) head(sumr$simulated) # Please update your rstan and StanHeaders installation before running # on Windows if (!identical(.Platform$OS.type, "windows")) { # Simulate from the prior predictive distribution f <- obs(y ~ lag(y) + varying(~ -1 + x), "gaussian") + splines(df = 10, noncentered = TRUE) # Create data with missing observations # Note that due to the lagged term in the model, # we need to fix the first time point d <- data.frame(y = c(0, rep(NA, 49)), x = rnorm(50), time = 1:50) # Suppress warnings due to the lack of data suppressWarnings( priors <- get_priors(f, data = d, time = "time") ) # Modify default priors which can produce exploding behavior when used # without data priors$prior <- c( "normal(0, 1)", "normal(0.6, 0.1)", "normal(-0.2, 0.5)", "normal(0.2, 0.1)", "normal(0.5, 0.1)" ) # Samples from the prior conditional on the first time point and x fit <- dynamite( dformula = f, data = d, time = "time", verbose = FALSE, priors = priors, chains = 1 ) # Simulate new data pp <- predict(fit) ggplot2::ggplot(pp, ggplot2::aes(time, y_new, group = .draw)) + ggplot2::geom_line(alpha = 0.1) + ggplot2::theme_bw() }
data.table::setDTthreads(1) # For CRAN out <- predict(gaussian_example_fit, type = "response", n_draws = 2L) head(out) # using summary functions sumr <- predict(multichannel_example_fit, type = "mean", funs = list(g = list(m = mean, s = sd), b = list(sum = sum)), n_draws = 2L) head(sumr$simulated) # Please update your rstan and StanHeaders installation before running # on Windows if (!identical(.Platform$OS.type, "windows")) { # Simulate from the prior predictive distribution f <- obs(y ~ lag(y) + varying(~ -1 + x), "gaussian") + splines(df = 10, noncentered = TRUE) # Create data with missing observations # Note that due to the lagged term in the model, # we need to fix the first time point d <- data.frame(y = c(0, rep(NA, 49)), x = rnorm(50), time = 1:50) # Suppress warnings due to the lack of data suppressWarnings( priors <- get_priors(f, data = d, time = "time") ) # Modify default priors which can produce exploding behavior when used # without data priors$prior <- c( "normal(0, 1)", "normal(0.6, 0.1)", "normal(-0.2, 0.5)", "normal(0.2, 0.1)", "normal(0.5, 0.1)" ) # Samples from the prior conditional on the first time point and x fit <- dynamite( dformula = f, data = d, time = "time", verbose = FALSE, priors = priors, chains = 1 ) # Simulate new data pp <- predict(fit) ggplot2::ggplot(pp, ggplot2::aes(time, y_new, group = .draw)) + ggplot2::geom_line(alpha = 0.1) + ggplot2::theme_bw() }
Prints the summary of the leave-future-out cross-validation.
## S3 method for class 'lfo' print(x, ...)
## S3 method for class 'lfo' print(x, ...)
x |
[ |
... |
Ignored. |
Returns x
invisibly.
data.table::setDTthreads(1) # For CRAN # Please update your rstan and StanHeaders installation before running # on Windows if (!identical(.Platform$OS.type, "windows")) { # This gives warnings due to the small number of iterations suppressWarnings(lfo(gaussian_example_fit, L = 20)) }
data.table::setDTthreads(1) # For CRAN # Please update your rstan and StanHeaders installation before running # on Windows if (!identical(.Platform$OS.type, "windows")) { # This gives warnings due to the small number of iterations suppressWarnings(lfo(gaussian_example_fit, L = 20)) }
This function can be used as part of dynamiteformula()
to define
whether the group-level random effects should be modeled as correlated or
not.
random_spec(correlated = TRUE, noncentered = TRUE)
random_spec(correlated = TRUE, noncentered = TRUE)
correlated |
[ |
noncentered |
[ |
With a large number of time points random intercepts can become challenging sample with default priors. This is because with large group sizes the group-level intercepts tend to be behave similarly to fixed group-factor variable so the model becomes overparameterized given these and the common intercept term. Another potential cause for sampling problems is relatively large variation in the intercepts (large sigma_nu) compared to the sampling variation (sigma) in the Gaussian case.
An object of class random_spec
.
Model formula construction
dynamite()
,
dynamiteformula()
,
lags()
,
lfactor()
,
splines()
data.table::setDTthreads(1) # For CRAN # two channel model with correlated random effects for responses x and y obs(y ~ 1 + random(~1), family = "gaussian") + obs(x ~ 1 + random(~1 + z), family = "poisson") + random_spec(correlated = TRUE)
data.table::setDTthreads(1) # For CRAN # two channel model with correlated random effects for responses x and y obs(y ~ 1 + random(~1), family = "gaussian") + obs(x ~ 1 + random(~1 + z), family = "poisson") + random_spec(correlated = TRUE)
This function can be used as part of dynamiteformula()
to define the
splines used for the time-varying coefficients $\delta$
.
splines( df = NULL, degree = 3L, lb_tau = 0, noncentered = FALSE, override = FALSE )
splines( df = NULL, degree = 3L, lb_tau = 0, noncentered = FALSE, override = FALSE )
df |
[ |
degree |
[ |
lb_tau |
[ |
noncentered |
[ |
override |
[ |
An object of class splines
.
Model formula construction
dynamite()
,
dynamiteformula()
,
lags()
,
lfactor()
,
random_spec()
data.table::setDTthreads(1) # For CRAN # Two channel model with varying effects, with explicit lower bounds for the # random walk prior standard deviations, with noncentered parameterization # for the first channel and centered for the second channel. obs(y ~ 1, family = "gaussian") + obs(x ~ 1, family = "gaussian") + lags(type = "varying") + splines( df = 20, degree = 3, lb_tau = c(0, 0.1), noncentered = c(TRUE, FALSE) )
data.table::setDTthreads(1) # For CRAN # Two channel model with varying effects, with explicit lower bounds for the # random walk prior standard deviations, with noncentered parameterization # for the first channel and centered for the second channel. obs(y ~ 1, family = "gaussian") + obs(x ~ 1, family = "gaussian") + lags(type = "varying") + splines( df = 20, degree = 3, lb_tau = c(0, 0.1), noncentered = c(TRUE, FALSE) )
Note that using a different backend for the original model fit and when
updating can lead to an error due to different naming in cmdstanr
and
rstan
sampling arguments.
## S3 method for class 'dynamitefit' update( object, dformula = NULL, data = NULL, priors = NULL, recompile = NULL, ... )
## S3 method for class 'dynamitefit' update( object, dformula = NULL, data = NULL, priors = NULL, recompile = NULL, ... )
object |
[ |
dformula |
[ |
data |
[ |
priors |
[ |
recompile |
[ |
... |
Additional parameters to |
An updated dynamitefit
object.
Model fitting
dynamice()
,
dynamite()
,
get_priors()
data.table::setDTthreads(1) # For CRAN ## Not run: # re-estimate the example fit without thinning: # As the model is compiled on Windows, this will fail on other platforms if (identical(.Platform$OS.type, "windows")) { fit <- update(gaussian_example_fit, thin = 1) } ## End(Not run)
data.table::setDTthreads(1) # For CRAN ## Not run: # re-estimate the example fit without thinning: # As the model is compiled on Windows, this will fail on other platforms if (identical(.Platform$OS.type, "windows")) { fit <- update(gaussian_example_fit, thin = 1) } ## End(Not run)