--- title: "Evaluate a Distribution" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Evaluate a Distribution} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(distionary) ``` This vignette covers the second goal of `distionary`: to evaluate probability distributions, even when that property is not specified in the distribution's definition. ## Distributional Representations A _distributional representation_ is a mathematical function that completely defines a probability distribution. Unlike a simple property (such as the mean or variance), a representation contains enough information that any other property or representation can be calculated from it. The key innovation in `distionary` is that these representations are interconnected through a network of relationships, allowing you to specify a distribution using any available representation and automatically derive others as needed. For example, if you specify only a CDF, `distionary` can compute the quantile function, mean, variance, and other properties. Here is a list of representations recognised by `distionary`, and the functions for accessing them. | Representation | `distionary` Functions | |----------------------------------|-----------------------------------------| | Cumulative Distribution Function | `eval_cdf()`, `enframe_cdf()` | | Survival Function | `eval_survival()`, `enframe_survival()` | | Quantile Function | `eval_quantile()`, `enframe_quantile()` | | Hazard Function | `eval_hazard()`, `enframe_hazard()` | | Cumulative Hazard Function | `eval_chf()`, `enframe_chf()` | | Probability density Function | `eval_density()`, `enframe_density()` | | Probability mass Function (PMF) | `eval_pmf()`, `enframe_pmf()` | | Odds Function | `eval_odds()`, `enframe_odds()` | | Return Level Function | `eval_return()`, `enframe_return()` | All representations can either be accessed by the `eval_*()` family of functions, providing a vector of the evaluated representation. ```{r} d1 <- dst_geom(0.6) eval_pmf(d1, at = 0:5) ``` Alternatively, the `enframe_*()` family of functions provides the results in a tibble or data frame paired with the inputs, useful in a data wrangling workflow. ```{r} enframe_pmf(d1, at = 0:5) ``` The `enframe_*()` functions allow for insertion of multiple distributions, placing a column for each distribution. The column names can be changed in three ways: 1. The input column `.arg` can be renamed with the `arg_name` argument. 2. The `pmf` prefix on the evaluation columns can be changed with the `fn_prefix` argument. 3. The distribution names can be changed by assigning name-value pairs for the input distributions. Let's practice this with the addition of a second distribution. ```{r} d2 <- dst_geom(0.4) enframe_pmf( model1 = d1, model2 = d2, at = 0:5, arg_name = "num_failures", fn_prefix = "probability" ) ``` ## Drawing a random sample To draw a random sample from a distribution, use the `realise()` or `realize()` function: ```{r} set.seed(42) realise(d1, n = 5) ``` You can read this call as "realise distribution `d` five times". By default, `n` is set to 1, so that realising converts a distribution to a numeric draw: ```{r} realise(d1) ``` While random sampling falls into the same family as the `p*/d*/q*/r*` functions from the `stats` package (e.g., `rnorm()`), this function is not a distributional representation, hence does not have a `eval_*()` or `enframe_*()` counterpart. This is because it's impossible to perfectly describe a distribution based on a sample. ## Properties of Distributions `distionary` distinguishes between _distributional representations_ (which fully define a distribution) and _distributional properties_ (which are characteristics that can be computed from representations). A distribution _property_ is any measurable characteristic that can be calculated from a distribution's representation. Unlike representations, properties do not contain enough information to fully reconstruct the distribution. For example, knowing the mean and variance of a distribution doesn't tell you whether it's a Normal, Gamma, or some other distribution family. Properties include statistical moments and other summary measures. Below is a table of the properties incorporated in `distionary`, and the corresponding functions for accessing them. | Property | `distionary` Function | |----------|---------------------| | Mean | `mean()` | | Median | `median()` | | Variance | `variance()` | | Standard Deviation | `sd()` | | Skewness | `skewness()` | | Excess Kurtosis | `kurtosis_exc()` | | Kurtosis | `kurtosis()` | | Range | `range()` | Here's the mean and variance of our original distribution. ```{r} mean(d1) variance(d1) ```