Stats

General statistics

ArviZ.ArviZStats.hdiFunction
Note

This function is forwarded to Python's arviz.hdi. The docstring of that function is included below.

    Calculate highest density interval (HDI) of array for given probability.

    The HDI is the minimum width Bayesian credible interval (BCI).

    Parameters
    ----------
    ary: obj
        object containing posterior samples.
        Any object that can be converted to an :class:`arviz.InferenceData` object.
        Refer to documentation of :func:`arviz.convert_to_dataset` for details.
    hdi_prob: float, optional
        Prob for which the highest density interval will be computed. Defaults to
        ``stats.hdi_prob`` rcParam.
    circular: bool, optional
        Whether to compute the hdi taking into account `x` is a circular variable
        (in the range [-np.pi, np.pi]) or not. Defaults to False (i.e non-circular variables).
        Only works if multimodal is False.
    multimodal: bool, optional
        If true it may compute more than one hdi if the distribution is multimodal and the
        modes are well separated.
    skipna: bool, optional
        If true ignores nan values when computing the hdi. Defaults to false.
    group: str, optional
        Specifies which InferenceData group should be used to calculate hdi.
        Defaults to 'posterior'
    var_names: list, optional
        Names of variables to include in the hdi report. Prefix the variables by ``~``
        when you want to exclude them from the report: `["~beta"]` instead of `["beta"]`
        (see :func:`arviz.summary` for more details).
    filter_vars: {None, "like", "regex"}, optional, default=None
        If `None` (default), interpret var_names as the real variables names. If "like",
        interpret var_names as substrings of the real variables names. If "regex",
        interpret var_names as regular expressions on the real variables names. A la
        ``pandas.filter``.
    coords: mapping, optional
        Specifies the subset over to calculate hdi.
    max_modes: int, optional
        Specifies the maximum number of modes for multimodal case.
    dask_kwargs : dict, optional
        Dask related kwargs passed to :func:`~arviz.wrap_xarray_ufunc`.
    kwargs: dict, optional
        Additional keywords passed to :func:`~arviz.wrap_xarray_ufunc`.

    Returns
    -------
    np.ndarray or xarray.Dataset, depending upon input
        lower(s) and upper(s) values of the interval(s).

    See Also
    --------
    plot_hdi : Plot highest density intervals for regression data.
    xarray.Dataset.quantile : Calculate quantiles of array for given probabilities.

    Examples
    --------
    Calculate the HDI of a Normal random variable:

    .. ipython::

        In [1]: import arviz as az
           ...: import numpy as np
           ...: data = np.random.normal(size=2000)
           ...: az.hdi(data, hdi_prob=.68)

    Calculate the HDI of a dataset:

    .. ipython::

        In [1]: import arviz as az
           ...: data = az.load_arviz_data('centered_eight')
           ...: az.hdi(data)

    We can also calculate the HDI of some of the variables of dataset:

    .. ipython::

        In [1]: az.hdi(data, var_names=["mu", "theta"])

    By default, ``hdi`` is calculated over the ``chain`` and ``draw`` dimensions. We can use the
    ``input_core_dims`` argument of :func:`~arviz.wrap_xarray_ufunc` to change this. In this example
    we calculate the HDI also over the ``school`` dimension:

    .. ipython::

        In [1]: az.hdi(data, var_names="theta", input_core_dims = [["chain","draw", "school"]])

    We can also calculate the hdi over a particular selection:

    .. ipython::

        In [1]: az.hdi(data, coords={"chain":[0, 1, 3]}, input_core_dims = [["draw"]])

    
source
ArviZ.ArviZStats.summaryFunction
summary(
    data; group = :posterior, coords dims, kwargs...,
) -> Union{Dataset,DataFrames.DataFrame}

Compute summary statistics on any object that can be passed to convert_to_dataset.

Keywords

  • coords: Map from named dimension to named indices.
  • dims: Map from variable name to names of its dimensions.
  • kwargs: Keyword arguments passed to summarystats.
source
StatsBase.summarystatsFunction
summarystats(
    data::InferenceData;
    group = :posterior,
    kwargs...,
) -> Union{Dataset,DataFrames.DataFrame}
summarystats(data::Dataset; kwargs...) -> Union{Dataset,DataFrames.DataFrame}

Compute summary statistics on data.

Arguments

  • data::Union{Dataset,InferenceData}: The data on which to compute summary statistics. If data is an InferenceData, only the dataset corresponding to group is used.

Keywords

  • var_names: Collection of names of variables as Symbols to include in summary
  • include_circ::Bool=false: Whether to include circular statistics
  • digits::Int: Number of decimals used to round results. If not provided, numbers are not rounded.
  • stat_funcs::Union{Dict{String,Function},Vector{Function}}=nothing: A vector of functions or a dict of functions with function names as keys used to calculate statistics. By default, the mean, standard deviation, simulation standard error, and highest posterior density intervals are included. The functions will be given one argument, the samples for a variable as an array, The functions should operate on an array, returning a single number. For example, Statistics.mean, or Statistics.var would both work.
  • extend::Bool=true: If true, use the statistics returned by stat_funcs in addition to, rather than in place of, the default statistics. This is only meaningful when stat_funcs is not nothing.
  • hdi_prob::Real=0.94: HDI interval to compute. This is only meaningful when stat_funcs is nothing.
  • skipna::Bool=false: If true, ignores NaN values when computing the summary statistics. It does not affect the behaviour of the functions passed to stat_funcs.

Returns

  • DataFrames.DataFrame: Summary statistics for each variable. Default statistics are:
    • mean
    • sd
    • hdi_3%
    • hdi_97%
    • mcse_mean
    • mcse_sd
    • ess_bulk
    • ess_tail
    • r_hat (only computed for traces with 2 or more chains)

Examples

using ArviZ
idata = load_example_data("centered_eight")
summarystats(idata; var_names=(:mu, :tau))

Other statistics can be calculated by passing a list of functions or a dictionary with key, function pairs:

using Statistics
function median_sd(x)
    med = median(x)
    sd = sqrt(mean((x .- med).^2))
    return sd
end

func_dict = Dict(
    "std" => x -> std(x; corrected = false),
    "median_std" => median_sd,
    "5%" => x -> quantile(x, 0.05),
    "median" => median,
    "95%" => x -> quantile(x, 0.95),
)

summarystats(idata; var_names = (:mu, :tau), stat_funcs = func_dict, extend = false)
source
ArviZ.ArviZStats.r2_scoreFunction

R² for Bayesian regression models. Only valid for linear models.

Note

This function is forwarded to Python's arviz.r2_score. The docstring of that function is included below.


    Parameters
    ----------
    y_true: array-like of shape = (n_outputs,)
        Ground truth (correct) target values.
    y_pred: array-like of shape = (n_posterior_samples, n_outputs)
        Estimated target values.

    Returns
    -------
    Pandas Series with the following indices:
    r2: Bayesian R²
    r2_std: standard deviation of the Bayesian R².

    See Also
    --------
    plot_lm : Posterior predictive and mean plots for regression-like data.

    Examples
    --------
    Calculate R² for Bayesian regression models :

    .. ipython::

        In [1]: import arviz as az
           ...: data = az.load_arviz_data('regression1d')
           ...: y_true = data.observed_data["y"].values
           ...: y_pred = data.posterior_predictive.stack(sample=("chain", "draw"))["y"].values.T
           ...: az.r2_score(y_true, y_pred)

    
source

Pareto-smoothed importance sampling

PSIS.PSISResultType
PSISResult

Result of Pareto-smoothed importance sampling (PSIS) using psis.

Properties

  • log_weights: un-normalized Pareto-smoothed log weights
  • weights: normalized Pareto-smoothed weights (allocates a copy)
  • pareto_shape: Pareto $k=ξ$ shape parameter
  • nparams: number of parameters in log_weights
  • ndraws: number of draws in log_weights
  • nchains: number of chains in log_weights
  • reff: the ratio of the effective sample size of the unsmoothed importance ratios and the actual sample size.
  • ess: estimated effective sample size of estimate of mean using smoothed importance samples (see ess_is)
  • tail_length: length of the upper tail of log_weights that was smoothed
  • tail_dist: the generalized Pareto distribution that was fit to the tail of log_weights. Note that the tail weights are scaled to have a maximum of 1, so tail_dist * exp(maximum(log_ratios)) is the corresponding fit directly to the tail of log_ratios.
  • normalized::Bool:indicates whether log_weights are log-normalized along the sample dimensions.

Diagnostic

The pareto_shape parameter $k=ξ$ of the generalized Pareto distribution tail_dist can be used to diagnose reliability and convergence of estimates using the importance weights [VehtariSimpson2021].

  • if $k < \frac{1}{3}$, importance sampling is stable, and importance sampling (IS) and PSIS both are reliable.
  • if $k ≤ \frac{1}{2}$, then the importance ratio distributon has finite variance, and the central limit theorem holds. As $k$ approaches the upper bound, IS becomes less reliable, while PSIS still works well but with a higher RMSE.
  • if $\frac{1}{2} < k ≤ 0.7$, then the variance is infinite, and IS can behave quite poorly. However, PSIS works well in this regime.
  • if $0.7 < k ≤ 1$, then it quickly becomes impractical to collect enough importance weights to reliably compute estimates, and importance sampling is not recommended.
  • if $k > 1$, then neither the variance nor the mean of the raw importance ratios exists. The convergence rate is close to zero, and bias can be large with practical sample sizes.

See PSISPlots.paretoshapeplot for a diagnostic plot.

PSIS.psisFunction
psis(log_ratios, reff = 1.0; kwargs...) -> PSISResult
psis!(log_ratios, reff = 1.0; kwargs...) -> PSISResult

Compute Pareto smoothed importance sampling (PSIS) log weights [VehtariSimpson2021].

While psis computes smoothed log weights out-of-place, psis! smooths them in-place.

Arguments

  • log_ratios: an array of logarithms of importance ratios, with size (draws, [chains, [parameters...]]), where chains>1 would be used when chains are generated using Markov chain Monte Carlo.
  • reff::Union{Real,AbstractArray}: the ratio(s) of effective sample size of log_ratios and the actual sample size reff = ess/(draws * chains), used to account for autocorrelation, e.g. due to Markov chain Monte Carlo. If an array, it must have the size (parameters...,) to match log_ratios.

Keywords

  • warn=true: If true, warning messages are delivered
  • normalize=true: If true, the log-weights will be log-normalized so that exp.(log_weights) sums to 1 along the sample dimensions.

Returns

  • result: a PSISResult object containing the results of the Pareto-smoothing.

A warning is raised if the Pareto shape parameter $k ≥ 0.7$. See PSISResult for details and PSISPlots.paretoshapeplot for a diagnostic plot.

PSIS.psis!Function
psis(log_ratios, reff = 1.0; kwargs...) -> PSISResult
psis!(log_ratios, reff = 1.0; kwargs...) -> PSISResult

Compute Pareto smoothed importance sampling (PSIS) log weights [VehtariSimpson2021].

While psis computes smoothed log weights out-of-place, psis! smooths them in-place.

Arguments

  • log_ratios: an array of logarithms of importance ratios, with size (draws, [chains, [parameters...]]), where chains>1 would be used when chains are generated using Markov chain Monte Carlo.
  • reff::Union{Real,AbstractArray}: the ratio(s) of effective sample size of log_ratios and the actual sample size reff = ess/(draws * chains), used to account for autocorrelation, e.g. due to Markov chain Monte Carlo. If an array, it must have the size (parameters...,) to match log_ratios.

Keywords

  • warn=true: If true, warning messages are delivered
  • normalize=true: If true, the log-weights will be log-normalized so that exp.(log_weights) sums to 1 along the sample dimensions.

Returns

  • result: a PSISResult object containing the results of the Pareto-smoothing.

A warning is raised if the Pareto shape parameter $k ≥ 0.7$. See PSISResult for details and PSISPlots.paretoshapeplot for a diagnostic plot.

Model assessment and selection

LOO and WAIC

ArviZ.ArviZStats.AbstractELPDResultType
abstract type AbstractELPDResult

An abstract type representing the result of an ELPD computation.

Every subtype stores estimates of both the expected log predictive density (elpd) and the effective number of parameters p, as well as standard errors and pointwise estimates of each, from which other relevant estimates can be computed.

Subtypes implement the following functions:

source
ArviZ.ArviZStats.PSISLOOResultType

Results of Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO).

See also: loo, AbstractELPDResult

  • estimates: Estimates of the expected log pointwise predictive density (ELPD) and effective number of parameters (p)

  • pointwise: Pointwise estimates

  • psis_result: Pareto-smoothed importance sampling (PSIS) results

source
ArviZ.ArviZStats.WAICResultType

Results of computing the widely applicable information criterion (WAIC).

See also: waic, AbstractELPDResult

  • estimates: Estimates of the expected log pointwise predictive density (ELPD) and effective number of parameters (p)

  • pointwise: Pointwise estimates

source
ArviZ.ArviZStats.information_criterionFunction
information_criterion(elpd, scale::Symbol)

Compute the information criterion for the given scale from the elpd estimate.

scale must be one of (:deviance, :log, :negative_log).

See also: effective_number_of_parameters, loo, waic

source
information_criterion(result::AbstractELPDResult, scale::Symbol; pointwise=false)

Compute information criterion for the given scale from the existing ELPD result.

scale must be one of (:deviance, :log, :negative_log).

If pointwise=true, then pointwise estimates are returned.

source
ArviZ.ArviZStats.looFunction
loo(log_likelihood; reff=nothing, kwargs...) -> PSISLOOResult{<:NamedTuple,<:NamedTuple}

Compute the Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO). [Vehtari2017][LOOFAQ]

log_likelihood must be an array of log-likelihood values with shape (chains, draws[, params...]).

Keywords

  • reff::Union{Real,AbstractArray{<:Real}}: The relative effective sample size(s) of the likelihood values. If an array, it must have the same data dimensions as the corresponding log-likelihood variable. If not provided, then this is estimated using ess.
  • kwargs: Remaining keywords are forwarded to psis.

See also: PSISLOOResult, waic

source
loo(data::Dataset; [var_name::Symbol,] kwargs...) -> PSISLOOResult{<:NamedTuple,<:Dataset}
loo(data::InferenceData; [var_name::Symbol,] kwargs...) -> PSISLOOResult{<:NamedTuple,<:Dataset}

Compute PSIS-LOO from log-likelihood values in data.

If more than one log-likelihood variable is present, then var_name must be provided.

source
ArviZ.ArviZStats.waicFunction
waic(log_likelihood::AbstractArray) -> WAICResult{<:NamedTuple,<:NamedTuple}

Compute the widely applicable information criterion (WAIC).[Watanabe2010][Vehtari2017][LOOFAQ]

log_likelihood must be an array of log-likelihood values with shape (chains, draws[, params...]).

See also: WAICResult, loo

source
waic(data::Dataset; [var_name::Symbol]) -> WAICResult{<:NamedTuple,<:Dataset}
waic(data::InferenceData; [var_name::Symbol]) -> WAICResult{<:NamedTuple,<:Dataset}

Compute WAIC from log-likelihood values in data.

If more than one log-likelihood variable is present, then var_name must be provided.

source

Others

ArviZ.ArviZStats.compareFunction

Compare models based on their expected log pointwise predictive density (ELPD).

Note

This function is forwarded to Python's arviz.compare. The docstring of that function is included below.


    The ELPD is estimated either by Pareto smoothed importance sampling leave-one-out
    cross-validation (LOO) or using the widely applicable information criterion (WAIC).
    We recommend loo. Read more theory here - in a paper by some of the
    leading authorities on model comparison dx.doi.org/10.1111/1467-9868.00353

    Parameters
    ----------
    compare_dict: dict of {str: InferenceData or ELPDData}
        A dictionary of model names and :class:`arviz.InferenceData` or ``ELPDData``.
    ic: str, optional
        Method to estimate the ELPD, available options are "loo" or "waic". Defaults to
        ``rcParams["stats.information_criterion"]``.
    method: str, optional
        Method used to estimate the weights for each model. Available options are:

        - 'stacking' : stacking of predictive distributions.
        - 'BB-pseudo-BMA' : pseudo-Bayesian Model averaging using Akaike-type
          weighting. The weights are stabilized using the Bayesian bootstrap.
        - 'pseudo-BMA': pseudo-Bayesian Model averaging using Akaike-type
          weighting, without Bootstrap stabilization (not recommended).

        For more information read https://arxiv.org/abs/1704.02030
    b_samples: int, optional default = 1000
        Number of samples taken by the Bayesian bootstrap estimation.
        Only useful when method = 'BB-pseudo-BMA'.
        Defaults to ``rcParams["stats.ic_compare_method"]``.
    alpha: float, optional
        The shape parameter in the Dirichlet distribution used for the Bayesian bootstrap. Only
        useful when method = 'BB-pseudo-BMA'. When alpha=1 (default), the distribution is uniform
        on the simplex. A smaller alpha will keeps the final weights more away from 0 and 1.
    seed: int or np.random.RandomState instance, optional
        If int or RandomState, use it for seeding Bayesian bootstrap. Only
        useful when method = 'BB-pseudo-BMA'. Default None the global
        :mod:`numpy.random` state is used.
    scale: str, optional
        Output scale for IC. Available options are:

        - `log` : (default) log-score (after Vehtari et al. (2017))
        - `negative_log` : -1 * (log-score)
        - `deviance` : -2 * (log-score)

        A higher log-score (or a lower deviance) indicates a model with better predictive
        accuracy.
    var_name: str, optional
        If there is more than a single observed variable in the ``InferenceData``, which
        should be used as the basis for comparison.

    Returns
    -------
    A DataFrame, ordered from best to worst model (measured by the ELPD).
    The index reflects the key with which the models are passed to this function. The columns are:
    rank: The rank-order of the models. 0 is the best.
    elpd: ELPD estimated either using (PSIS-LOO-CV `elpd_loo` or WAIC `elpd_waic`).
        Higher ELPD indicates higher out-of-sample predictive fit ("better" model).
        If `scale` is `deviance` or `negative_log` smaller values indicates
        higher out-of-sample predictive fit ("better" model).
    pIC: Estimated effective number of parameters.
    elpd_diff: The difference in ELPD between two models.
        If more than two models are compared, the difference is computed relative to the
        top-ranked model, that always has a elpd_diff of 0.
    weight: Relative weight for each model.
        This can be loosely interpreted as the probability of each model (among the compared model)
        given the data. By default the uncertainty in the weights estimation is considered using
        Bayesian bootstrap.
    SE: Standard error of the ELPD estimate.
        If method = BB-pseudo-BMA these values are estimated using Bayesian bootstrap.
    dSE: Standard error of the difference in ELPD between each model and the top-ranked model.
        It's always 0 for the top-ranked model.
    warning: A value of 1 indicates that the computation of the ELPD may not be reliable.
        This could be indication of WAIC/LOO starting to fail see
        http://arxiv.org/abs/1507.04544 for details.
    scale: Scale used for the ELPD.

    Examples
    --------
    Compare the centered and non centered models of the eight school problem:

    .. ipython::

        In [1]: import arviz as az
           ...: data1 = az.load_arviz_data("non_centered_eight")
           ...: data2 = az.load_arviz_data("centered_eight")
           ...: compare_dict = {"non centered": data1, "centered": data2}
           ...: az.compare(compare_dict)

    Compare the models using PSIS-LOO-CV, returning the ELPD in log scale and calculating the
    weights using the stacking method.

    .. ipython::

        In [1]: az.compare(compare_dict, ic="loo", method="stacking", scale="log")

    See Also
    --------
    loo :
        Compute the ELPD using the Pareto smoothed importance sampling Leave-one-out
        cross-validation method.
    waic : Compute the ELPD using the widely applicable information criterion.
    plot_compare : Summary plot for model comparison.

    References
    ----------
    .. [1] Vehtari, A., Gelman, A. & Gabry, J. Practical Bayesian model evaluation using
        leave-one-out cross-validation and WAIC. Stat Comput 27, 1413–1432 (2017)
        see https://doi.org/10.1007/s11222-016-9696-4

    
source
ArviZ.ArviZStats.loo_pitFunction

Compute leave one out (PSIS-LOO) probability integral transform (PIT) values.

Note

This function is forwarded to Python's arviz.loo_pit. The docstring of that function is included below.


    Parameters
    ----------
    idata: InferenceData
        :class:`arviz.InferenceData` object.
    y: array, DataArray or str
        Observed data. If str, ``idata`` must be present and contain the observed data group
    y_hat: array, DataArray or str
        Posterior predictive samples for ``y``. It must have the same shape as y plus an
        extra dimension at the end of size n_samples (chains and draws stacked). If str or
        None, ``idata`` must contain the posterior predictive group. If None, y_hat is taken
        equal to y, thus, y must be str too.
    log_weights: array or DataArray
        Smoothed log_weights. It must have the same shape as ``y_hat``
    dask_kwargs : dict, optional
        Dask related kwargs passed to :func:`~arviz.wrap_xarray_ufunc`.

    Returns
    -------
    loo_pit: array or DataArray
        Value of the LOO-PIT at each observed data point.

    See Also
    --------
    plot_loo_pit : Plot Leave-One-Out probability integral transformation (PIT) predictive checks.
    loo : Compute Pareto-smoothed importance sampling leave-one-out
          cross-validation (PSIS-LOO-CV).
    plot_elpd : Plot pointwise elpd differences between two or more models.
    plot_khat : Plot Pareto tail indices for diagnosing convergence.

    Examples
    --------
    Calculate LOO-PIT values using as test quantity the observed values themselves.

    .. ipython::

        In [1]: import arviz as az
           ...: data = az.load_arviz_data("centered_eight")
           ...: az.loo_pit(idata=data, y="obs")

    Calculate LOO-PIT values using as test quantity the square of the difference between
    each observation and `mu`. Both ``y`` and ``y_hat`` inputs will be array-like,
    but ``idata`` will still be passed in order to calculate the ``log_weights`` from
    there.

    .. ipython::

        In [1]: T = data.observed_data.obs - data.posterior.mu.median(dim=("chain", "draw"))
           ...: T_hat = data.posterior_predictive.obs - data.posterior.mu
           ...: T_hat = T_hat.stack(__sample__=("chain", "draw"))
           ...: az.loo_pit(idata=data, y=T**2, y_hat=T_hat**2)

    
source
  • VehtariSimpson2021Vehtari A, Simpson D, Gelman A, Yao Y, Gabry J. (2021). Pareto smoothed importance sampling. arXiv:1507.02646v7 [stat.CO]
  • VehtariSimpson2021Vehtari A, Simpson D, Gelman A, Yao Y, Gabry J. (2021). Pareto smoothed importance sampling. arXiv:1507.02646v7 [stat.CO]
  • VehtariSimpson2021Vehtari A, Simpson D, Gelman A, Yao Y, Gabry J. (2021). Pareto smoothed importance sampling. arXiv:1507.02646v7 [stat.CO]
  • Vehtari2017Vehtari, A., Gelman, A. & Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27, 1413–1432 (2017). doi: 10.1007/s11222-016-9696-4 arXiv: 1507.04544
  • LOOFAQAki Vehtari. Cross-validation FAQ. https://mc-stan.org/loo/articles/online-only/faq.html
  • Watanabe2010Watanabe, S. Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. 11(116):3571−3594, 2010. https://jmlr.csail.mit.edu/papers/v11/watanabe10a.html
  • Vehtari2017Vehtari, A., Gelman, A. & Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27, 1413–1432 (2017). doi: 10.1007/s11222-016-9696-4 arXiv: 1507.04544
  • LOOFAQAki Vehtari. Cross-validation FAQ. https://mc-stan.org/loo/articles/online-only/faq.html