| Title: | Create and Evaluate Probability Distributions |
|---|---|
| Description: | Create and evaluate probability distribution objects from a variety of families or define custom distributions. Automatically compute distributional properties, even when they have not been specified. This package supports statistical modeling and simulations, and forms the core of the probaverse suite of R packages. |
| Authors: | Vincenzo Coia [aut, cre, cph], Amogh Joshi [ctb], Shuyi Tan [ctb], Zhipeng Zhu [ctb] |
| Maintainer: | Vincenzo Coia <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2025-11-10 13:42:14 UTC |
| Source: | https://github.com/probaverse/distionary |
Create and evaluate probability distribution objects from a variety of families or define custom distributions. Automatically compute distributional properties, even when they have not been specified. This package supports statistical modeling and simulations, and forms the core of the probaverse suite of R packages.
The distionary package provides a comprehensive framework for working with probability distributions in R. With distionary, you can:
Specify probability distributions from common families or create custom distributions.
Evaluate distributional properties and representations.
Access distributional calculations even when they're not directly specified.
The main purpose of distionary is to implement a distribution object that powers the wider probaverse ecosystem for making probability distributions that are representative of your data.
Use the dst_*() family of functions to create distributions from
common families:
dst_norm(), dst_exp(),
dst_unif(), etc. are some continuous distributions.
dst_pois(), dst_binom(),
dst_geom(), etc. are some discrete distributions.
dst_empirical() is useful for creating a non-parametric
distribution from data.
You can also make your own distribution using the
distribution() function, which allows you to specify
any combination of distributional representations and properties. For this
version of distionary, the CDF and density/PMF are required
in order to access all functionality.
A distribution's representations are functions that fully describe the
distribution. They can be accessed with the eval_*() functions.
For example, eval_cdf() and eval_quantile()
invoke the distribution's cumulative distribution function (CDF)
and quantile function.
Other properties of the distribution can be calculated by functions of the
property's name, such as mean() and range().
Generate random samples from a distribution using
realise().
New users should start with the package vignettes:
vignette("specify", package = "distionary") -
Learn how to specify distributions.
vignette("evaluate", package = "distionary") -
Learn how to evaluate distributions.
Maintainer: Vincenzo Coia [email protected] [copyright holder]
Other contributors:
Amogh Joshi [contributor]
Shuyi Tan [contributor]
Zhipeng Zhu [contributor]
Useful links:
Report bugs at https://github.com/probaverse/distionary/issues
# Create a Poisson distribution. poisson <- dst_pois(lambda = 1.5) poisson # Evaluate the probability mass function. eval_pmf(poisson, at = 0:4) plot(poisson) # Get distribution properties. mean(poisson) variance(poisson) # Create a continuous distribution (Normal). normal <- dst_norm(mean = 0, sd = 1) # Evaluate quantiles. eval_quantile(normal, at = c(0.025, 0.5, 0.975)) # Create a custom distribution. my_dist <- distribution( density = function(x) ifelse(x >= 0 & x <= 1, 2 * (1 - x), 0), cdf = function(x) ifelse(x >= 0 & x <= 1, 1 - (1 - x)^2, 0), .vtype = "continuous", .name = "Linear" ) plot(my_dist) plot(my_dist, "cdf") # Even without specifying all properties, they can still be computed. mean(my_dist)# Create a Poisson distribution. poisson <- dst_pois(lambda = 1.5) poisson # Evaluate the probability mass function. eval_pmf(poisson, at = 0:4) plot(poisson) # Get distribution properties. mean(poisson) variance(poisson) # Create a continuous distribution (Normal). normal <- dst_norm(mean = 0, sd = 1) # Evaluate quantiles. eval_quantile(normal, at = c(0.025, 0.5, 0.975)) # Create a custom distribution. my_dist <- distribution( density = function(x) ifelse(x >= 0 & x <= 1, 2 * (1 - x), 0), cdf = function(x) ifelse(x >= 0 & x <= 1, 1 - (1 - x)^2, 0), .vtype = "continuous", .name = "Linear" ) plot(my_dist) plot(my_dist, "cdf") # Even without specifying all properties, they can still be computed. mean(my_dist)
Distribution Objects
distribution(..., .vtype = NULL, .name = NULL, .parameters = list()) is_distribution(object) is.distribution(object)distribution(..., .vtype = NULL, .name = NULL, .parameters = list()) is_distribution(object) is.distribution(object)
... |
Name-value pairs for defining the distribution. |
.vtype |
The variable type, typically "discrete" or "continuous".
Can be any character vector of length 1, but is converted to
lowercase with |
.name |
A name to give to the distribution. Can be any character vector of length 1. |
.parameters |
A named list with one entry per distribution parameter, each of which can be any data type. In this version of distionary, these parameters are only stored for the benefit of the user to know what distribution they are working with; the code never looks at these parameters to inform its calculations. This is anticipated to change in a future version of distionary. |
object |
Object to be tested |
Currently, the CDF (cdf) is required to be specified, along with the PMF
(pmf) for discrete distributions and density (density) for continuous
distributions. Otherwise, the full extent of distribution properties will
not be accessible.
A distributional representation is a function that fully describes the
distribution. Besides cdf, density, and pmf, other options
understood by distionary include:
survival: the survival function, or one minus the cdf.
hazard: the hazard function, for continuous variables only.
chf: the cumulative hazard function, for continuous variables only.
quantile: the quantile function, or left-inverse of the cdf.
realise or realize: a function that takes an integer and generates
a vector of that many random draws from the distribution.
odds: for discrete variables, the probability odds function
(pmf / (1 - pmf))
return: the quantiles associated with the provided return periods,
where events are exceedances.
All functions should be vectorized.
Other properties that are understood by distionary include:
mean, stdev, variance, skewness, median are self-explanatory.
kurtosis_exc and kurtosis are the distribution's excess
kurtosis and regular kurtosis.
range: A vector of the minimum and maximum value of a distribution's
support.
A distribution object.
linear <- distribution( density = function(x) { d <- 2 * (1 - x) d[x < 0 | x > 1] <- 0 d }, cdf = function(x) { p <- 2 * x * (1 - x / 2) p[x < 0] <- 0 p[x > 1] <- 1 p }, .vtype = "continuous", .name = "My Linear", .parameters = list(could = "include", anything = data.frame(x = 1:10)) ) # Inspect linear # Plot plot(linear)linear <- distribution( density = function(x) { d <- 2 * (1 - x) d[x < 0 | x > 1] <- 0 d }, cdf = function(x) { p <- 2 * x * (1 - x / 2) p[x < 0] <- 0 p[x > 1] <- 1 p }, .vtype = "continuous", .name = "My Linear", .parameters = list(could = "include", anything = data.frame(x = 1:10)) ) # Inspect linear # Plot plot(linear)
Makes a Bernoulli distribution, representing the outcome of a single trial with a given success probability.
dst_bern(prob)dst_bern(prob)
prob |
Probability of success; single numeric between 0 and 1. |
A Bernoulli distribution.
dst_bern(0.3)dst_bern(0.3)
Makes a Beta distribution.
dst_beta(shape1, shape2)dst_beta(shape1, shape2)
shape1, shape2
|
Shape parameters of the distribution; single positive numerics. |
A Beta distribution.
dst_beta(2, 3)dst_beta(2, 3)
Makes a Binomial distribution, representing the number of successes in a fixed number of independent trials.
dst_binom(size, prob)dst_binom(size, prob)
size |
Number of trials; single positive integer. |
prob |
Success probability of each trial; single numeric between 0 and 1. |
A binomial distribution.
dst_binom(10, 0.6)dst_binom(10, 0.6)
Makes a Cauchy distribution.
dst_cauchy(location, scale)dst_cauchy(location, scale)
location |
Location parameter; single numeric. |
scale |
Scale parameter; single positive numeric. |
A Cauchy distribution.
d <- dst_cauchy(0, 1) # Moments do not exist for the Cauchy distribution. mean(d) variance(d)d <- dst_cauchy(0, 1) # Moments do not exist for the Cauchy distribution. mean(d) variance(d)
Makes a Chi-Squared distribution.
dst_chisq(df)dst_chisq(df)
df |
degrees of freedom parameter; single positive numeric. |
A Chi-Squared distribution
dst_chisq(3)dst_chisq(3)
A degenerate distribution assigns a 100% probability to one outcome.
dst_degenerate(location)dst_degenerate(location)
location |
Outcome of the distribution; single positive numeric. |
A degenerate distribution
d <- dst_degenerate(5) realise(d) variance(d)d <- dst_degenerate(5) realise(d) variance(d)
An empirical distribution is a non-parametric way to
estimate a distribution using data. By default,
it assigns equal probability to all observations
(this can be overridden with the weights argument).
Identical to dst_finite() with NA handling and with weights not needing
to add to 1.
dst_empirical( y, weights = 1, data = NULL, na_action_y = c("null", "drop", "fail"), na_action_w = c("null", "drop", "fail") )dst_empirical( y, weights = 1, data = NULL, na_action_y = c("null", "drop", "fail"), na_action_w = c("null", "drop", "fail") )
y |
< |
weights |
< |
data |
Optionally, a data frame to compute |
na_action_y, na_action_w
|
What should be done with |
y and weights are recycled to have the same length, but only
if one of them has length 1 (via vctrs::vec_recycle_common()).
na_action_y and na_action_w specify the NA action for y and weights.
Options are, in order of precedence:
"fail": Throw an error in the presence of NA.
"null": Return a Null distribution (dst_null()) in the presence
of NA.
"drop" (the default for na_action_w): Remove outcome-weight pairs
having an NA value in the specified vector.
A finite distribution. If only one outcome, returns a degenerate
distribution. Returns a Null distribution if NA values are present
and "null" is specified as an NA action.
t <- -2:7 dst_empirical(t) # Using a data frame df <- data.frame(time = c(NA, NA, t)) dst_empirical(time * 60, data = df) # Null, since `NA` in `time`. # Drop NA `time` values. dst_empirical(time * 60, data = df, na_action_y = "drop") # Weights explicit. Zero-weight outcomes ("-120") are gone. df$w <- c(1, 1, 0:9) dst_empirical(time * 60, w, data = df, na_action_y = "drop") # "Null" takes precedence over "drop". df$w <- c(NA, NA, 0:9) df$time[1] <- -3 df$time[12] <- NA dst_empirical(time, w, data = df, na_action_w = "null", na_action_y = "drop") dst_empirical(time, w, data = df, na_action_w = "drop", na_action_y = "null") dst_empirical(time, w, data = df, na_action_w = "drop", na_action_y = "drop")t <- -2:7 dst_empirical(t) # Using a data frame df <- data.frame(time = c(NA, NA, t)) dst_empirical(time * 60, data = df) # Null, since `NA` in `time`. # Drop NA `time` values. dst_empirical(time * 60, data = df, na_action_y = "drop") # Weights explicit. Zero-weight outcomes ("-120") are gone. df$w <- c(1, 1, 0:9) dst_empirical(time * 60, w, data = df, na_action_y = "drop") # "Null" takes precedence over "drop". df$w <- c(NA, NA, 0:9) df$time[1] <- -3 df$time[12] <- NA dst_empirical(time, w, data = df, na_action_w = "null", na_action_y = "drop") dst_empirical(time, w, data = df, na_action_w = "drop", na_action_y = "null") dst_empirical(time, w, data = df, na_action_w = "drop", na_action_y = "drop")
Makes an Exponential distribution.
dst_exp(rate)dst_exp(rate)
rate |
Rate parameter; single positive numeric. |
An Exponential distribution.
dst_exp(1)dst_exp(1)
Makes an F distribution.
dst_f(df1, df2)dst_f(df1, df2)
df1, df2
|
Degrees of freedom of the numerator and denominator, both single positive numerics. |
An F distribution.
dst_f(2, 3)dst_f(2, 3)
Makes a finite distribution, which is a distribution with a finite number of possible outcomes.
dst_finite(outcomes, probs)dst_finite(outcomes, probs)
outcomes |
Numeric vector representing the potential outcomes of the distribution. |
probs |
Numeric vector of probabilities corresponding to the outcomes
in |
A distribution with finite outcomes.
dst_finite(2:5, probs = 1:4 / 10)dst_finite(2:5, probs = 1:4 / 10)
Makes a Gamma distribution.
dst_gamma(shape, rate)dst_gamma(shape, rate)
shape |
Shape parameter; single positive numeric. |
rate |
Rate parameter; single positive numeric. |
A Gamma distribution.
dst_gamma(2, 1)dst_gamma(2, 1)
Makes a Geometric distribution, corresponding to the number of failures in a sequence of independent trials before observing a success.
dst_geom(prob)dst_geom(prob)
prob |
Probability of success in each trial; single numeric between 0 and 1. |
A Geometric distribution.
d <- dst_geom(0.4) # This version of the Geometric distribution does not count the success. range(d)d <- dst_geom(0.4) # This version of the Geometric distribution does not count the success. range(d)
Makes a Generalised Extreme Value (GEV) distribution, which is the limiting distribution of the maximum.
dst_gev(location, scale, shape)dst_gev(location, scale, shape)
location |
Location parameter; single numeric. |
scale |
Scale parameter; single positive numeric. |
shape |
Shape parameter; single numeric.
This is also the extreme value index,
so that |
A GEV distribution.
# Short-tailed example short <- dst_gev(0, 1, -1) range(short) mean(short) # Heavy-tailed example heavy <- dst_gev(0, 1, 1) range(heavy) mean(heavy) # Light-tailed example (a Gumbel distribution) light <- dst_gev(0, 1, 0) range(light) mean(light)# Short-tailed example short <- dst_gev(0, 1, -1) range(short) mean(short) # Heavy-tailed example heavy <- dst_gev(0, 1, 1) range(heavy) mean(heavy) # Light-tailed example (a Gumbel distribution) light <- dst_gev(0, 1, 0) range(light) mean(light)
Makes a Generalized Pareto (GP) distribution, corresponding to the limiting distribution of excesses over a threshold.
dst_gp(scale, shape)dst_gp(scale, shape)
scale |
Scale parameter; single positive numeric. |
shape |
Shape parameter; single positive numeric.
This is also the extreme value index, so that |
A Generalised Pareto Distribution.
# Short-tailed example short <- dst_gp(1, -1) range(short) mean(short) # Heavy-tailed example heavy <- dst_gp(1, 1) range(heavy) mean(heavy) # Light-tailed example (a Gumbel distribution) light <- dst_gp(1, 0) range(light) mean(light)# Short-tailed example short <- dst_gp(1, -1) range(short) mean(short) # Heavy-tailed example heavy <- dst_gp(1, 1) range(heavy) mean(heavy) # Light-tailed example (a Gumbel distribution) light <- dst_gp(1, 0) range(light) mean(light)
Creates a Hypergeometric distribution. The parameterization used here
is the same as for stats::phyper(), where the outcome can be thought
of as the number of red balls drawn from an urn of coloured balls,
using a scoop that holds a fixed number of balls.
dst_hyper(m, n, k)dst_hyper(m, n, k)
m |
The number of red balls in the urn; single positive integer. |
n |
The number of non-red balls in the urn; single positive integer. |
k |
the number of balls drawn from the urn (between 0 and |
A Hypergeometric distribution.
dst_hyper(15, 50, 10)dst_hyper(15, 50, 10)
Makes a Log Normal distribution, which is the distribution of the exponential of a Normally distributed random variable.
dst_lnorm(meanlog, sdlog)dst_lnorm(meanlog, sdlog)
meanlog |
Mean of the log of the random variable; single numeric. |
sdlog |
Standard deviation of the log of the random variable; single positive numeric. |
A Log Normal distribution.
dst_lnorm(0, 1)dst_lnorm(0, 1)
Makes a Log Pearson Type III distribution, which is the distribution of the exponential of a random variable following a Pearson Type III distribution.
dst_lp3(meanlog, sdlog, skew)dst_lp3(meanlog, sdlog, skew)
meanlog |
Mean of the log of the random variable; single numeric. |
sdlog |
Standard deviation of the log of the random variable; single positive numeric. |
skew |
Skewness of the log of the random variable; single numeric. |
A Log Pearson Type III distribution.
dst_lp3(0, 1, 1)dst_lp3(0, 1, 1)
Makes a Negative Binomial distribution, corresponding to the number of failures in a sequence of independent trials until a given number of successes are observed.
dst_nbinom(size, prob)dst_nbinom(size, prob)
size |
Number of successful trials; single positive numeric. |
prob |
Probability of a successful trial; single numeric between 0 and 1. |
A Negative Binomial distribution.
d <- dst_nbinom(10, 0.5) # This version of the Negative Binomial distribution does not count # the successes. range(d)d <- dst_nbinom(10, 0.5) # This version of the Negative Binomial distribution does not count # the successes. range(d)
Makes a Normal (Gaussian) distribution.
dst_norm(mean, sd)dst_norm(mean, sd)
mean |
Mean of the distribution. Single numeric. |
sd |
Standard deviation of the distribution. Single positive numeric. |
A Normal distribution.
dst_norm(0, 1)dst_norm(0, 1)
Sometimes it's convenient to work with a distribution object that is
akin to a missing value. This is especially true when programmatically
outputting distributions, such as when a distribution fails to fit to
data. This function makes such a distribution object. It always evaluates
to NA.
dst_null()dst_null()
A Null distribution.
x <- dst_null() mean(x) eval_pmf(x, at = 1:10)x <- dst_null() mean(x) eval_pmf(x, at = 1:10)
Makes a Pearson Type III distribution, which is a Gamma distribution, but shifted.
dst_pearson3(location, scale, shape)dst_pearson3(location, scale, shape)
location |
Location parameter, specifying how to shift the Gamma distribution; single numeric. |
scale |
Scale parameter of the Gamma distribution; single positive numeric. |
shape |
Shape parameter of the Gamma distribution; single positive numeric. |
A Pearson Type III distribution.
dst_pearson3(1, 1, 1)dst_pearson3(1, 1, 1)
Makes a Poisson distribution.
dst_pois(lambda)dst_pois(lambda)
lambda |
Mean of the Poisson distribution; single positive numeric. |
A Poisson distribution.
dst_pois(1)dst_pois(1)
Makes a Student t distribution.
dst_t(df)dst_t(df)
df |
Degrees of freedom; single positive numeric. |
A Student t distribution.
dst_t(3)dst_t(3)
Makes a Uniform distribution.
dst_unif(min, max)dst_unif(min, max)
min, max
|
Minimum and maximum of the distribution. Single numerics. |
A Uniform distribution.
dst_unif(0, 1)dst_unif(0, 1)
Makes a Weibull distribution.
dst_weibull(shape, scale)dst_weibull(shape, scale)
shape |
Shape parameter; single positive numeric. |
scale |
Scale parameter; single positive numeric. |
A Weibull distribution.
dst_weibull(1, 1)dst_weibull(1, 1)
Access a distribution's cumulative distribution function (cdf).
eval_cdf(distribution, at) enframe_cdf(..., at, arg_name = ".arg", fn_prefix = "cdf", sep = "_")eval_cdf(distribution, at) enframe_cdf(..., at, arg_name = ".arg", fn_prefix = "cdf", sep = "_")
distribution, ...
|
A distribution, or possibly multiple
distributions in the case of |
at |
Vector of values to evaluate the representation at. |
arg_name |
For |
fn_prefix |
For |
sep |
When |
The evaluated representation in vector form (for eval_)
with length matching the length of at, and data frame
or tibble form (for enframe_) with number of rows matching the
length of at. The at input occupies the first column,
named .arg by default, or the specification in arg_name;
the evaluated representations for each distribution in ...
go in the subsequent columns (one column per distribution). For a
single distribution, this column is named according to the
representation by default (cdf, survival, quantile, etc.),
or the value in fn_prefix. For multiple distributions, unnamed
distributions are auto-named, and columns are named
<fn_prefix><sep><distribution_name> (e.g., cdf_distribution1).
Other distributional representations:
eval_chf(),
eval_density(),
eval_hazard(),
eval_odds(),
eval_pmf(),
eval_quantile(),
eval_return(),
eval_survival()
d1 <- dst_unif(0, 4) d2 <- dst_pois(1.1) eval_cdf(d1, at = 0:4) enframe_cdf(d1, at = 0:4) enframe_cdf(d1, d2, at = 0:4) enframe_cdf(model1 = d1, model2 = d2, at = 0:4) enframe_cdf( model1 = d1, model2 = d2, at = 0:4, arg_name = "value" )d1 <- dst_unif(0, 4) d2 <- dst_pois(1.1) eval_cdf(d1, at = 0:4) enframe_cdf(d1, at = 0:4) enframe_cdf(d1, d2, at = 0:4) enframe_cdf(model1 = d1, model2 = d2, at = 0:4) enframe_cdf( model1 = d1, model2 = d2, at = 0:4, arg_name = "value" )
Access a distribution's cumulative hazard function (chf).
eval_chf(distribution, at) enframe_chf(..., at, arg_name = ".arg", fn_prefix = "chf", sep = "_")eval_chf(distribution, at) enframe_chf(..., at, arg_name = ".arg", fn_prefix = "chf", sep = "_")
distribution, ...
|
A distribution, or possibly multiple
distributions in the case of |
at |
Vector of values to evaluate the representation at. |
arg_name |
For |
fn_prefix |
For |
sep |
When |
The evaluated representation in vector form (for eval_)
with length matching the length of at, and data frame
or tibble form (for enframe_) with number of rows matching the
length of at. The at input occupies the first column,
named .arg by default, or the specification in arg_name;
the evaluated representations for each distribution in ...
go in the subsequent columns (one column per distribution). For a
single distribution, this column is named according to the
representation by default (cdf, survival, quantile, etc.),
or the value in fn_prefix. For multiple distributions, unnamed
distributions are auto-named, and columns are named
<fn_prefix><sep><distribution_name> (e.g., cdf_distribution1).
Other distributional representations:
eval_cdf(),
eval_density(),
eval_hazard(),
eval_odds(),
eval_pmf(),
eval_quantile(),
eval_return(),
eval_survival()
d <- dst_unif(0, 4) eval_chf(d, at = 0:4) enframe_chf(d, at = 0:4)d <- dst_unif(0, 4) eval_chf(d, at = 0:4) enframe_chf(d, at = 0:4)
Access a distribution's probability density function (pdf).
eval_density(distribution, at) enframe_density(..., at, arg_name = ".arg", fn_prefix = "density", sep = "_")eval_density(distribution, at) enframe_density(..., at, arg_name = ".arg", fn_prefix = "density", sep = "_")
distribution, ...
|
A distribution, or possibly multiple
distributions in the case of |
at |
Vector of values to evaluate the representation at. |
arg_name |
For |
fn_prefix |
For |
sep |
When |
The evaluated representation in vector form (for eval_)
with length matching the length of at, and data frame
or tibble form (for enframe_) with number of rows matching the
length of at. The at input occupies the first column,
named .arg by default, or the specification in arg_name;
the evaluated representations for each distribution in ...
go in the subsequent columns (one column per distribution). For a
single distribution, this column is named according to the
representation by default (cdf, survival, quantile, etc.),
or the value in fn_prefix. For multiple distributions, unnamed
distributions are auto-named, and columns are named
<fn_prefix><sep><distribution_name> (e.g., cdf_distribution1).
Other distributional representations:
eval_cdf(),
eval_chf(),
eval_hazard(),
eval_odds(),
eval_pmf(),
eval_quantile(),
eval_return(),
eval_survival()
d <- dst_unif(0, 4) eval_density(d, at = 0:4) enframe_density(d, at = 0:4)d <- dst_unif(0, 4) eval_density(d, at = 0:4) enframe_density(d, at = 0:4)
Access a distribution's hazard function.
eval_hazard(distribution, at) enframe_hazard(..., at, arg_name = ".arg", fn_prefix = "hazard", sep = "_")eval_hazard(distribution, at) enframe_hazard(..., at, arg_name = ".arg", fn_prefix = "hazard", sep = "_")
distribution, ...
|
A distribution, or possibly multiple
distributions in the case of |
at |
Vector of values to evaluate the representation at. |
arg_name |
For |
fn_prefix |
For |
sep |
When |
The evaluated representation in vector form (for eval_)
with length matching the length of at, and data frame
or tibble form (for enframe_) with number of rows matching the
length of at. The at input occupies the first column,
named .arg by default, or the specification in arg_name;
the evaluated representations for each distribution in ...
go in the subsequent columns (one column per distribution). For a
single distribution, this column is named according to the
representation by default (cdf, survival, quantile, etc.),
or the value in fn_prefix. For multiple distributions, unnamed
distributions are auto-named, and columns are named
<fn_prefix><sep><distribution_name> (e.g., cdf_distribution1).
Other distributional representations:
eval_cdf(),
eval_chf(),
eval_density(),
eval_odds(),
eval_pmf(),
eval_quantile(),
eval_return(),
eval_survival()
d <- dst_unif(0, 4) eval_hazard(d, at = 0:4) enframe_hazard(d, at = 0:4)d <- dst_unif(0, 4) eval_hazard(d, at = 0:4) enframe_hazard(d, at = 0:4)
Access a distribution's odds function. The odds of an event having
probability p is p / (1 - p).
eval_odds(distribution, at) enframe_odds(..., at, arg_name = ".arg", fn_prefix = "odds", sep = "_")eval_odds(distribution, at) enframe_odds(..., at, arg_name = ".arg", fn_prefix = "odds", sep = "_")
distribution, ...
|
A distribution, or possibly multiple
distributions in the case of |
at |
Vector of values to evaluate the representation at. |
arg_name |
For |
fn_prefix |
For |
sep |
When |
The evaluated representation in vector form (for eval_)
with length matching the length of at, and data frame
or tibble form (for enframe_) with number of rows matching the
length of at. The at input occupies the first column,
named .arg by default, or the specification in arg_name;
the evaluated representations for each distribution in ...
go in the subsequent columns (one column per distribution). For a
single distribution, this column is named according to the
representation by default (cdf, survival, quantile, etc.),
or the value in fn_prefix. For multiple distributions, unnamed
distributions are auto-named, and columns are named
<fn_prefix><sep><distribution_name> (e.g., cdf_distribution1).
Other distributional representations:
eval_cdf(),
eval_chf(),
eval_density(),
eval_hazard(),
eval_pmf(),
eval_quantile(),
eval_return(),
eval_survival()
d <- dst_pois(1) eval_pmf(d, at = c(1, 2, 2.5)) eval_odds(d, at = c(1, 2, 2.5)) enframe_odds(d, at = 0:4)d <- dst_pois(1) eval_pmf(d, at = c(1, 2, 2.5)) eval_odds(d, at = c(1, 2, 2.5)) enframe_odds(d, at = 0:4)
Access a distribution's probability mass function (pmf).
eval_pmf(distribution, at) enframe_pmf(..., at, arg_name = ".arg", fn_prefix = "pmf", sep = "_")eval_pmf(distribution, at) enframe_pmf(..., at, arg_name = ".arg", fn_prefix = "pmf", sep = "_")
distribution, ...
|
A distribution, or possibly multiple
distributions in the case of |
at |
Vector of values to evaluate the representation at. |
arg_name |
For |
fn_prefix |
For |
sep |
When |
The evaluated representation in vector form (for eval_)
with length matching the length of at, and data frame
or tibble form (for enframe_) with number of rows matching the
length of at. The at input occupies the first column,
named .arg by default, or the specification in arg_name;
the evaluated representations for each distribution in ...
go in the subsequent columns (one column per distribution). For a
single distribution, this column is named according to the
representation by default (cdf, survival, quantile, etc.),
or the value in fn_prefix. For multiple distributions, unnamed
distributions are auto-named, and columns are named
<fn_prefix><sep><distribution_name> (e.g., cdf_distribution1).
Other distributional representations:
eval_cdf(),
eval_chf(),
eval_density(),
eval_hazard(),
eval_odds(),
eval_quantile(),
eval_return(),
eval_survival()
d <- dst_pois(5) eval_pmf(d, at = c(1, 2, 2.5)) enframe_pmf(d, at = 0:4) eval_pmf(dst_norm(0, 1), at = -3:3)d <- dst_pois(5) eval_pmf(d, at = c(1, 2, 2.5)) enframe_pmf(d, at = 0:4) eval_pmf(dst_norm(0, 1), at = -3:3)
Evaluate a distribution property. The distribution itself is first searched for the property, and if it can't be found, will attempt to calculate the property from other entries.
eval_property(distribution, entry, ...)eval_property(distribution, entry, ...)
distribution |
Distribution object. |
entry |
Name of the property, such as "cdf" or "mean". Length 1 character vector. |
... |
If the property is a function, arguments to the function go here. Need not be named; inserted in the order they appear. |
The distribution's property, evaluated. If cannot be
evaluated, returns NULL.
d <- distribution( cdf = \(x) (x > 0) * pmin(x^2, 1), g = 9.81, .vtype = "continuous" ) eval_property(d, "g") eval_property(d, "quantile", 1:9 / 10) eval_property(d, "mean") eval_property(d, "realise", 10) eval_property(d, "foofy") eval_property(d, "foofy", 1:10)d <- distribution( cdf = \(x) (x > 0) * pmin(x^2, 1), g = 9.81, .vtype = "continuous" ) eval_property(d, "g") eval_property(d, "quantile", 1:9 / 10) eval_property(d, "mean") eval_property(d, "realise", 10) eval_property(d, "foofy") eval_property(d, "foofy", 1:10)
Access a distribution's quantiles.
eval_quantile(distribution, at) enframe_quantile(..., at, arg_name = ".arg", fn_prefix = "quantile", sep = "_")eval_quantile(distribution, at) enframe_quantile(..., at, arg_name = ".arg", fn_prefix = "quantile", sep = "_")
distribution, ...
|
A distribution, or possibly multiple
distributions in the case of |
at |
Vector of values to evaluate the representation at. |
arg_name |
For |
fn_prefix |
For |
sep |
When |
When a quantile function does not exist, an algorithm is deployed that calculates the left inverse of the CDF. This algorithm works by progressively cutting the specified range in half, moving into the left or right half depending on where the solution is. The algorithm is not currently fast and is subject to improvement, and is a simple idea that has been passed around on the internet here and there. Tolerance is less than 1e-9, unless the maximum number of iterations (200) is reached.
The algorithm is not new, and is rather simple. The algorithm works by progressively cutting an initially wide range in half, moving into the left or right half depending on where the solution is. I found the idea on Stack Overflow somewhere, but unfortunately cannot find the location anymore.
The evaluated representation in vector form (for eval_)
with length matching the length of at, and data frame
or tibble form (for enframe_) with number of rows matching the
length of at. The at input occupies the first column,
named .arg by default, or the specification in arg_name;
the evaluated representations for each distribution in ...
go in the subsequent columns (one column per distribution). For a
single distribution, this column is named according to the
representation by default (cdf, survival, quantile, etc.),
or the value in fn_prefix. For multiple distributions, unnamed
distributions are auto-named, and columns are named
<fn_prefix><sep><distribution_name> (e.g., cdf_distribution1).
Other distributional representations:
eval_cdf(),
eval_chf(),
eval_density(),
eval_hazard(),
eval_odds(),
eval_pmf(),
eval_return(),
eval_survival()
d <- dst_unif(0, 4) eval_quantile(d, at = 1:9 / 10) enframe_quantile(d, at = 1:9 / 10)d <- dst_unif(0, 4) eval_quantile(d, at = 1:9 / 10) enframe_quantile(d, at = 1:9 / 10)
Compute return levels (quantiles) from a distribution by inputting return periods. The return periods correspond to events that are exceedances of a quantile, not non-exceedances.
eval_return(distribution, at) enframe_return(..., at, arg_name = ".arg", fn_prefix = "return", sep = "_")eval_return(distribution, at) enframe_return(..., at, arg_name = ".arg", fn_prefix = "return", sep = "_")
distribution, ...
|
A distribution, or possibly multiple
distributions in the case of |
at |
Vector of return periods >=1. |
arg_name |
For |
fn_prefix |
For |
sep |
When |
This function is simply the quantile
function evaluated at 1 - 1 / at.
The evaluated representation in vector form (for eval_)
with length matching the length of at, and data frame
or tibble form (for enframe_) with number of rows matching the
length of at. The at input occupies the first column,
named .arg by default, or the specification in arg_name;
the evaluated representations for each distribution in ...
go in the subsequent columns (one column per distribution). For a
single distribution, this column is named according to the
representation by default (cdf, survival, quantile, etc.),
or the value in fn_prefix. For multiple distributions, unnamed
distributions are auto-named, and columns are named
<fn_prefix><sep><distribution_name> (e.g., cdf_distribution1).
Other distributional representations:
eval_cdf(),
eval_chf(),
eval_density(),
eval_hazard(),
eval_odds(),
eval_pmf(),
eval_quantile(),
eval_survival()
d <- dst_gp(24, 0.3) eval_return(d, at = c(2, 25, 100, 200))d <- dst_gp(24, 0.3) eval_return(d, at = c(2, 25, 100, 200))
Access a distribution's survival function.
eval_survival(distribution, at) enframe_survival(..., at, arg_name = ".arg", fn_prefix = "survival", sep = "_")eval_survival(distribution, at) enframe_survival(..., at, arg_name = ".arg", fn_prefix = "survival", sep = "_")
distribution, ...
|
A distribution, or possibly multiple
distributions in the case of |
at |
Vector of values to evaluate the representation at. |
arg_name |
For |
fn_prefix |
For |
sep |
When |
The evaluated representation in vector form (for eval_)
with length matching the length of at, and data frame
or tibble form (for enframe_) with number of rows matching the
length of at. The at input occupies the first column,
named .arg by default, or the specification in arg_name;
the evaluated representations for each distribution in ...
go in the subsequent columns (one column per distribution). For a
single distribution, this column is named according to the
representation by default (cdf, survival, quantile, etc.),
or the value in fn_prefix. For multiple distributions, unnamed
distributions are auto-named, and columns are named
<fn_prefix><sep><distribution_name> (e.g., cdf_distribution1).
Other distributional representations:
eval_cdf(),
eval_chf(),
eval_density(),
eval_hazard(),
eval_odds(),
eval_pmf(),
eval_quantile(),
eval_return()
d <- dst_unif(0, 4) eval_survival(d, at = 0:4) enframe_survival(d, at = 0:4)d <- dst_unif(0, 4) eval_survival(d, at = 0:4) enframe_survival(d, at = 0:4)
Get common moment-related quantities of a
distribution: mean, variance, standard deviation (stdev),
skewness, and kurtosis or excess kurtosis (kurtosis_exc).
If these quantities are not supplied in the
distribution's definition, a numerical algorithm may be used.
kurtosis(distribution) kurtosis_exc(distribution) ## S3 method for class 'dst' mean(x, ...) skewness(distribution) stdev(distribution) variance(distribution)kurtosis(distribution) kurtosis_exc(distribution) ## S3 method for class 'dst' mean(x, ...) skewness(distribution) stdev(distribution) variance(distribution)
x, distribution
|
Distribution to evaluate. |
... |
When calculating the mean via integration of the quantile
function, arguments passed to |
If there is no method associated with a subclass of
x, then moments are calculated using
stats::integrate() from the density function.
A single numeric.
Beware that if a quantity is being calculated numerically for a non-continuous (e.g., discrete) distribution, the calculation could be highly approximate. An upcoming version of distionary will resolve this issue.
a <- dst_gp(1, 0.5) b <- dst_unif(0, 1) c <- dst_norm(3, 4) mean(a) variance(b) kurtosis(c) kurtosis_exc(c)a <- dst_gp(1, 0.5) b <- dst_unif(0, 1) c <- dst_norm(3, 4) mean(a) variance(b) kurtosis(c) kurtosis_exc(c)
Finds the median of a distribution.
## S3 method for class 'dst' median(x, ...)## S3 method for class 'dst' median(x, ...)
x |
Distribution to calculate median from. |
... |
Not used. |
Median is calculated as the 0.5-quantile when not found in the distribution. So, when the median is non-unique, the smallest of the possibilities is taken.
Median of a distribution; single numeric.
d <- dst_gamma(3, 3) median(d)d <- dst_gamma(3, 3) median(d)
Get or set the parameters of a distribution, if applicable. See details.
parameters(distribution) parameters(distribution) <- valueparameters(distribution) parameters(distribution) <- value
distribution |
Distribution. |
value |
A list of named parameter values, or |
If a distribution is made by specifying parameter values
(e.g., mean and variance for a Normal distribution; shape parameters
for a Beta distribution), it is useful to keep track of what
these parameters are. This is done by adding parameters
to the list of objects defining the distribution; for instance,
distribution(parameters = c(shape1 = 1.4, shape2 = 3.4)).
Note that no checks are made to ensure the parameters are valid.
It's important to note that, in this version of distionary,
manually changing the parameters after the distribution has been
created will not change the functionality of the distribution,
because the parameters are never referred to when making calculations.
A list of the distribution parameters. More specifically,
returns the "parameters" entry of the list making up the
probability distribution.
a <- dst_beta(1, 2) parameters(a) b <- distribution(mean = 5) parameters(b) parameters(b) <- list(t = 7) parameters(b)a <- dst_beta(1, 2) parameters(a) b <- distribution(mean = 5) parameters(b) parameters(b) <- list(t = 7) parameters(b)
Representations of the Generalized Extreme Value Distribution
pgev(q, location, scale, shape) qgev(p, location, scale, shape) dgev(x, location, scale, shape)pgev(q, location, scale, shape) qgev(p, location, scale, shape) dgev(x, location, scale, shape)
location |
Location parameter; numeric vector. |
scale |
Scale parameter; positive numeric vector. |
shape |
Shape parameter; numeric vector.
This is also the extreme value index,
so that |
p |
Vector of probabilities. |
x, q
|
Vector of quantiles. |
Vector of evaluated GEV distribution, with length
equal to the recycled lengths of q/x/p, location, scale, and
shape.
pgev(1:10, 0, 1, 1) dgev(1:10, 1:10, 2, 0) qgev(1:9 / 10, 2, 10, -2)pgev(1:10, 0, 1, 1) dgev(1:10, 1:10, 2, 0) qgev(1:9 / 10, 2, 10, -2)
Representations of the Generalized Pareto Distribution
pgp(q, scale, shape, lower.tail = TRUE) qgp(p, scale, shape) dgp(x, scale, shape)pgp(q, scale, shape, lower.tail = TRUE) qgp(p, scale, shape) dgp(x, scale, shape)
scale |
Vector of scale parameters; positive numeric. |
shape |
Vector of shape parameters; positive numeric. |
lower.tail |
Single logical. If |
p |
Vector of probabilities. |
x, q
|
Vector of quantiles. |
Vector of evaluated GP distribution, with length
equal to the recycled lengths of q/x/p, scale, and shape.
pgp(1:10, 1, 1) dgp(1:10, 2, 0) qgp(1:9 / 10, 10, -2)pgp(1:10, 1, 1) dgp(1:10, 2, 0) qgp(1:9 / 10, 10, -2)
Plot a distribution's representation.
## S3 method for class 'dst' plot( x, what = c("density", "pmf", "cdf", "survival", "quantile", "hazard", "chf"), ... )## S3 method for class 'dst' plot( x, what = c("density", "pmf", "cdf", "survival", "quantile", "hazard", "chf"), ... )
x |
Distribution object |
what |
Name of the representation to plot. |
... |
Other arguments to pass to the |
This function is run for its graphics byproduct, and therefore returns the original distribution, invisibly.
d <- dst_norm(0, 1) plot(d, from = -4, to = 4) plot(d, "cdf", n = 1000) plot(d, "survival") plot(d, "quantile") plot(d, "hazard") plot(d, "chf") p <- dst_pois(4) plot(p)d <- dst_norm(0, 1) plot(d, from = -4, to = 4) plot(d, "cdf", n = 1000) plot(d, "survival") plot(d, "quantile") plot(d, "hazard") plot(d, "chf") p <- dst_pois(4) plot(p)
Print the name of a distribution, possibly with parameters.
pretty_name(distribution, param_digits = 0)pretty_name(distribution, param_digits = 0)
distribution |
Distribution object. |
param_digits |
How many significant digits to include when displaying
the parameters? |
A character containing the distribution's name, possibly followed by parameters in brackets.
d <- dst_norm(0.3552, 1.1453) pretty_name(d) pretty_name(d, 2)d <- dst_norm(0.3552, 1.1453) pretty_name(d) pretty_name(d, 2)
Probability to the left or right of a number, inclusive or not.
prob_left() is a more general cdf defined using either < or <=, and
prob_right() is a more general survival function defined using either
> or >=.
prob_left(distribution, of, inclusive) prob_right(distribution, of, inclusive)prob_left(distribution, of, inclusive) prob_right(distribution, of, inclusive)
distribution |
Distribution to find probabilities of. |
of |
Find the probability to the left or right of this number. Could be a vector. |
inclusive |
Should |
A vector of probabilities.
d <- dst_pois(5) prob_left(d, of = 3, inclusive = TRUE) prob_left(d, of = 3, inclusive = FALSE) prob_right(d, of = 0:3, inclusive = TRUE)d <- dst_pois(5) prob_left(d, of = 3, inclusive = TRUE) prob_left(d, of = 3, inclusive = FALSE) prob_right(d, of = 0:3, inclusive = TRUE)
Range returns a vector of length two, with the minimum and maximum values of the (support of the) distribution.
## S3 method for class 'dst' range(distribution, ...)## S3 method for class 'dst' range(distribution, ...)
distribution |
Distribution to compute range from. |
... |
Not used; vestige of the |
If there are no methods for the distribution's class,
the range is calculated
using eval_quantile() at 0 and at 1.
Vector of length two, containing the minimum and maximum values of a distribution.
a <- dst_gp(1, 0.5) b <- dst_unif(0, 1) c <- dst_norm(3, 4) range(a) range(b) range(c)a <- dst_gp(1, 0.5) b <- dst_unif(0, 1) c <- dst_norm(3, 4) range(a) range(b) range(c)
Draw n independent observations from a distribution.
realise(distribution, n = 1) realize(distribution, n = 1)realise(distribution, n = 1) realize(distribution, n = 1)
distribution |
Distribution object. |
n |
Number of observations to generate. |
Vector of independent values drawn from the inputted distribution.
realise() and realize() are aliases and do the same thing.
d <- dst_pois(5) set.seed(2) realise(d, n = 10)d <- dst_pois(5) set.seed(2) realise(d, n = 10)
Retrieve the variable type of a distribution, such as "continuous" or "discrete".
vtype(distribution)vtype(distribution)
distribution |
Distribution object. |
Single character with the variable type.
vtype(dst_beta(1, 2)) vtype(dst_bern(0.4)) vtype(distribution())vtype(dst_beta(1, 2)) vtype(dst_bern(0.4)) vtype(distribution())