Stats

General statistics

ArviZ.hdiFunction
Note

This function is forwarded to Python's arviz.hdi. The docstring of that function is included below.

    Calculate highest density interval (HDI) of array for given probability.

    The HDI is the minimum width Bayesian credible interval (BCI).

    Parameters
    ----------
    ary: obj
        object containing posterior samples.
        Any object that can be converted to an :class:`arviz.InferenceData` object.
        Refer to documentation of :func:`arviz.convert_to_dataset` for details.
    hdi_prob: float, optional
        Prob for which the highest density interval will be computed. Defaults to
        ``stats.hdi_prob`` rcParam.
    circular: bool, optional
        Whether to compute the hdi taking into account `x` is a circular variable
        (in the range [-np.pi, np.pi]) or not. Defaults to False (i.e non-circular variables).
        Only works if multimodal is False.
    multimodal: bool, optional
        If true it may compute more than one hdi if the distribution is multimodal and the
        modes are well separated.
    skipna: bool, optional
        If true ignores nan values when computing the hdi. Defaults to false.
    group: str, optional
        Specifies which InferenceData group should be used to calculate hdi.
        Defaults to 'posterior'
    var_names: list, optional
        Names of variables to include in the hdi report. Prefix the variables by ``~``
        when you want to exclude them from the report: `["~beta"]` instead of `["beta"]`
        (see :func:`arviz.summary` for more details).
    filter_vars: {None, "like", "regex"}, optional, default=None
        If `None` (default), interpret var_names as the real variables names. If "like",
        interpret var_names as substrings of the real variables names. If "regex",
        interpret var_names as regular expressions on the real variables names. A la
        ``pandas.filter``.
    coords: mapping, optional
        Specifies the subset over to calculate hdi.
    max_modes: int, optional
        Specifies the maximum number of modes for multimodal case.
    dask_kwargs : dict, optional
        Dask related kwargs passed to :func:`~arviz.wrap_xarray_ufunc`.
    kwargs: dict, optional
        Additional keywords passed to :func:`~arviz.wrap_xarray_ufunc`.

    Returns
    -------
    np.ndarray or xarray.Dataset, depending upon input
        lower(s) and upper(s) values of the interval(s).

    See Also
    --------
    plot_hdi : Plot highest density intervals for regression data.
    xarray.Dataset.quantile : Calculate quantiles of array for given probabilities.

    Examples
    --------
    Calculate the HDI of a Normal random variable:

    .. ipython::

        In [1]: import arviz as az
           ...: import numpy as np
           ...: data = np.random.normal(size=2000)
           ...: az.hdi(data, hdi_prob=.68)

    Calculate the HDI of a dataset:

    .. ipython::

        In [1]: import arviz as az
           ...: data = az.load_arviz_data('centered_eight')
           ...: az.hdi(data)

    We can also calculate the HDI of some of the variables of dataset:

    .. ipython::

        In [1]: az.hdi(data, var_names=["mu", "theta"])

    By default, ``hdi`` is calculated over the ``chain`` and ``draw`` dimensions. We can use the
    ``input_core_dims`` argument of :func:`~arviz.wrap_xarray_ufunc` to change this. In this example
    we calculate the HDI also over the ``school`` dimension:

    .. ipython::

        In [1]: az.hdi(data, var_names="theta", input_core_dims = [["chain","draw", "school"]])

    We can also calculate the hdi over a particular selection:

    .. ipython::

        In [1]: az.hdi(data, coords={"chain":[0, 1, 3]}, input_core_dims = [["draw"]])

    
source
ArviZ.summaryFunction
summary(
    data; group = :posterior, coords dims, kwargs...,
) -> Union{Dataset,DataFrames.DataFrame}

Compute summary statistics on any object that can be passed to convert_to_dataset.

Keywords

  • coords: Map from named dimension to named indices.
  • dims: Map from variable name to names of its dimensions.
  • kwargs: Keyword arguments passed to summarystats.
source
StatsBase.summarystatsFunction
summarystats(
    data::InferenceData;
    group = :posterior,
    kwargs...,
) -> Union{Dataset,DataFrames.DataFrame}
summarystats(data::Dataset; kwargs...) -> Union{Dataset,DataFrames.DataFrame}

Compute summary statistics on data.

Arguments

  • data::Union{Dataset,InferenceData}: The data on which to compute summary statistics. If data is an InferenceData, only the dataset corresponding to group is used.

Keywords

  • var_names: Collection of names of variables as Symbols to include in summary
  • include_circ::Bool=false: Whether to include circular statistics
  • digits::Int: Number of decimals used to round results. If not provided, numbers are not rounded.
  • stat_funcs::Union{Dict{String,Function},Vector{Function}}=nothing: A vector of functions or a dict of functions with function names as keys used to calculate statistics. By default, the mean, standard deviation, simulation standard error, and highest posterior density intervals are included. The functions will be given one argument, the samples for a variable as an array, The functions should operate on an array, returning a single number. For example, Statistics.mean, or Statistics.var would both work.
  • extend::Bool=true: If true, use the statistics returned by stat_funcs in addition to, rather than in place of, the default statistics. This is only meaningful when stat_funcs is not nothing.
  • hdi_prob::Real=0.94: HDI interval to compute. This is only meaningful when stat_funcs is nothing.
  • skipna::Bool=false: If true, ignores NaN values when computing the summary statistics. It does not affect the behaviour of the functions passed to stat_funcs.

Returns

  • DataFrames.DataFrame: Summary statistics for each variable. Default statistics are:
    • mean
    • sd
    • hdi_3%
    • hdi_97%
    • mcse_mean
    • mcse_sd
    • ess_bulk
    • ess_tail
    • r_hat (only computed for traces with 2 or more chains)

Examples

using ArviZ
idata = load_example_data("centered_eight")
summarystats(idata; var_names=(:mu, :tau))

Other statistics can be calculated by passing a list of functions or a dictionary with key, function pairs:

using Statistics
function median_sd(x)
    med = median(x)
    sd = sqrt(mean((x .- med).^2))
    return sd
end

func_dict = Dict(
    "std" => x -> std(x; corrected = false),
    "median_std" => median_sd,
    "5%" => x -> quantile(x, 0.05),
    "median" => median,
    "95%" => x -> quantile(x, 0.95),
)

summarystats(idata; var_names = (:mu, :tau), stat_funcs = func_dict, extend = false)
source
ArviZ.r2_scoreFunction

R² for Bayesian regression models. Only valid for linear models.

Note

This function is forwarded to Python's arviz.r2_score. The docstring of that function is included below.


    Parameters
    ----------
    y_true: array-like of shape = (n_outputs,)
        Ground truth (correct) target values.
    y_pred: array-like of shape = (n_posterior_samples, n_outputs)
        Estimated target values.

    Returns
    -------
    Pandas Series with the following indices:
    r2: Bayesian R²
    r2_std: standard deviation of the Bayesian R².

    See Also
    --------
    plot_lm : Posterior predictive and mean plots for regression-like data.

    Examples
    --------
    Calculate R² for Bayesian regression models :

    .. ipython::

        In [1]: import arviz as az
           ...: data = az.load_arviz_data('regression1d')
           ...: y_true = data.observed_data["y"].values
           ...: y_pred = data.posterior_predictive.stack(sample=("chain", "draw"))["y"].values.T
           ...: az.r2_score(y_true, y_pred)

    
source

Pareto-smoothed importance sampling

PSIS.PSISResultType
PSISResult

Result of Pareto-smoothed importance sampling (PSIS) using psis.

Properties

  • log_weights: un-normalized Pareto-smoothed log weights
  • weights: normalized Pareto-smoothed weights (allocates a copy)
  • pareto_shape: Pareto $k=ξ$ shape parameter
  • nparams: number of parameters in log_weights
  • ndraws: number of draws in log_weights
  • nchains: number of chains in log_weights
  • reff: the ratio of the effective sample size of the unsmoothed importance ratios and the actual sample size.
  • ess: estimated effective sample size of estimate of mean using smoothed importance samples (see ess_is)
  • log_weights_norm: the logarithm of the normalization constant of log_weights
  • tail_length: length of the upper tail of log_weights that was smoothed
  • tail_dist: the generalized Pareto distribution that was fit to the tail of log_weights. Note that the tail weights are scaled to have a maximum of 1, so tail_dist * exp(maximum(log_ratios)) is the corresponding fit directly to the tail of log_ratios.

Diagnostic

The pareto_shape parameter $k=ξ$ of the generalized Pareto distribution tail_dist can be used to diagnose reliability and convergence of estimates using the importance weights [VehtariSimpson2021].

  • if $k < \frac{1}{3}$, importance sampling is stable, and importance sampling (IS) and PSIS both are reliable.
  • if $k ≤ \frac{1}{2}$, then the importance ratio distributon has finite variance, and the central limit theorem holds. As $k$ approaches the upper bound, IS becomes less reliable, while PSIS still works well but with a higher RMSE.
  • if $\frac{1}{2} < k ≤ 0.7$, then the variance is infinite, and IS can behave quite poorly. However, PSIS works well in this regime.
  • if $0.7 < k ≤ 1$, then it quickly becomes impractical to collect enough importance weights to reliably compute estimates, and importance sampling is not recommended.
  • if $k > 1$, then neither the variance nor the mean of the raw importance ratios exists. The convergence rate is close to zero, and bias can be large with practical sample sizes.

See PSISPlots.paretoshapeplot for a diagnostic plot.

PSIS.psisFunction
psis(log_ratios, reff = 1.0; kwargs...) -> PSISResult
psis!(log_ratios, reff = 1.0; kwargs...) -> PSISResult

Compute Pareto smoothed importance sampling (PSIS) log weights [VehtariSimpson2021].

While psis computes smoothed log weights out-of-place, psis! smooths them in-place.

Arguments

  • log_ratios: an array of logarithms of importance ratios, with one of the following sizes:

    • (ndraws,): a vector of draws for a single parameter from a single chain
    • (ndraws, nparams): a matrix of draws for a multiple parameter from a single chain
    • (ndraws, nchains, nparams): an array of draws for multiple parameters from multiple chains, e.g. as might be generated with Markov chain Monte Carlo.
  • reff::Union{Real,AbstractVector}: the ratio(s) of effective sample size of log_ratios and the actual sample size reff = ess/(ndraws * nchains), used to account for autocorrelation, e.g. due to Markov chain Monte Carlo.

Keywords

Returns

  • result: a PSISResult object containing the results of the Pareto-smoothing.

A warning is raised if the Pareto shape parameter $k ≥ 0.7$. See PSISResult for details and PSISPlots.paretoshapeplot for a diagnostic plot.

PSIS.psis!Function
psis(log_ratios, reff = 1.0; kwargs...) -> PSISResult
psis!(log_ratios, reff = 1.0; kwargs...) -> PSISResult

Compute Pareto smoothed importance sampling (PSIS) log weights [VehtariSimpson2021].

While psis computes smoothed log weights out-of-place, psis! smooths them in-place.

Arguments

  • log_ratios: an array of logarithms of importance ratios, with one of the following sizes:

    • (ndraws,): a vector of draws for a single parameter from a single chain
    • (ndraws, nparams): a matrix of draws for a multiple parameter from a single chain
    • (ndraws, nchains, nparams): an array of draws for multiple parameters from multiple chains, e.g. as might be generated with Markov chain Monte Carlo.
  • reff::Union{Real,AbstractVector}: the ratio(s) of effective sample size of log_ratios and the actual sample size reff = ess/(ndraws * nchains), used to account for autocorrelation, e.g. due to Markov chain Monte Carlo.

Keywords

Returns

  • result: a PSISResult object containing the results of the Pareto-smoothing.

A warning is raised if the Pareto shape parameter $k ≥ 0.7$. See PSISResult for details and PSISPlots.paretoshapeplot for a diagnostic plot.

Model assessment and selection

ArviZ.compareFunction

Compare models based on their expected log pointwise predictive density (ELPD).

Note

This function is forwarded to Python's arviz.compare. The docstring of that function is included below.


    The ELPD is estimated either by Pareto smoothed importance sampling leave-one-out
    cross-validation (LOO) or using the widely applicable information criterion (WAIC).
    We recommend loo. Read more theory here - in a paper by some of the
    leading authorities on model comparison dx.doi.org/10.1111/1467-9868.00353

    Parameters
    ----------
    compare_dict: dict of {str: InferenceData or ELPDData}
        A dictionary of model names and :class:`arviz.InferenceData` or ``ELPDData``.
    ic: str, optional
        Method to estimate the ELPD, available options are "loo" or "waic". Defaults to
        ``rcParams["stats.information_criterion"]``.
    method: str, optional
        Method used to estimate the weights for each model. Available options are:

        - 'stacking' : stacking of predictive distributions.
        - 'BB-pseudo-BMA' : pseudo-Bayesian Model averaging using Akaike-type
          weighting. The weights are stabilized using the Bayesian bootstrap.
        - 'pseudo-BMA': pseudo-Bayesian Model averaging using Akaike-type
          weighting, without Bootstrap stabilization (not recommended).

        For more information read https://arxiv.org/abs/1704.02030
    b_samples: int, optional default = 1000
        Number of samples taken by the Bayesian bootstrap estimation.
        Only useful when method = 'BB-pseudo-BMA'.
        Defaults to ``rcParams["stats.ic_compare_method"]``.
    alpha: float, optional
        The shape parameter in the Dirichlet distribution used for the Bayesian bootstrap. Only
        useful when method = 'BB-pseudo-BMA'. When alpha=1 (default), the distribution is uniform
        on the simplex. A smaller alpha will keeps the final weights more away from 0 and 1.
    seed: int or np.random.RandomState instance, optional
        If int or RandomState, use it for seeding Bayesian bootstrap. Only
        useful when method = 'BB-pseudo-BMA'. Default None the global
        :mod:`numpy.random` state is used.
    scale: str, optional
        Output scale for IC. Available options are:

        - `log` : (default) log-score (after Vehtari et al. (2017))
        - `negative_log` : -1 * (log-score)
        - `deviance` : -2 * (log-score)

        A higher log-score (or a lower deviance) indicates a model with better predictive
        accuracy.
    var_name: str, optional
        If there is more than a single observed variable in the ``InferenceData``, which
        should be used as the basis for comparison.

    Returns
    -------
    A DataFrame, ordered from best to worst model (measured by the ELPD).
    The index reflects the key with which the models are passed to this function. The columns are:
    rank: The rank-order of the models. 0 is the best.
    elpd: ELPD estimated either using (PSIS-LOO-CV `elpd_loo` or WAIC `elpd_waic`).
        Higher ELPD indicates higher out-of-sample predictive fit ("better" model).
        If `scale` is `deviance` or `negative_log` smaller values indicates
        higher out-of-sample predictive fit ("better" model).
    pIC: Estimated effective number of parameters.
    elpd_diff: The difference in ELPD between two models.
        If more than two models are compared, the difference is computed relative to the
        top-ranked model, that always has a elpd_diff of 0.
    weight: Relative weight for each model.
        This can be loosely interpreted as the probability of each model (among the compared model)
        given the data. By default the uncertainty in the weights estimation is considered using
        Bayesian bootstrap.
    SE: Standard error of the ELPD estimate.
        If method = BB-pseudo-BMA these values are estimated using Bayesian bootstrap.
    dSE: Standard error of the difference in ELPD between each model and the top-ranked model.
        It's always 0 for the top-ranked model.
    warning: A value of 1 indicates that the computation of the ELPD may not be reliable.
        This could be indication of WAIC/LOO starting to fail see
        http://arxiv.org/abs/1507.04544 for details.
    scale: Scale used for the ELPD.

    Examples
    --------
    Compare the centered and non centered models of the eight school problem:

    .. ipython::

        In [1]: import arviz as az
           ...: data1 = az.load_arviz_data("non_centered_eight")
           ...: data2 = az.load_arviz_data("centered_eight")
           ...: compare_dict = {"non centered": data1, "centered": data2}
           ...: az.compare(compare_dict)

    Compare the models using PSIS-LOO-CV, returning the ELPD in log scale and calculating the
    weights using the stacking method.

    .. ipython::

        In [1]: az.compare(compare_dict, ic="loo", method="stacking", scale="log")

    See Also
    --------
    loo :
        Compute the ELPD using the Pareto smoothed importance sampling Leave-one-out
        cross-validation method.
    waic : Compute the ELPD using the widely applicable information criterion.
    plot_compare : Summary plot for model comparison.

    References
    ----------
    .. [1] Vehtari, A., Gelman, A. & Gabry, J. Practical Bayesian model evaluation using
        leave-one-out cross-validation and WAIC. Stat Comput 27, 1413–1432 (2017)
        see https://doi.org/10.1007/s11222-016-9696-4

    
source
ArviZ.looFunction

Compute Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO-CV).

Note

This function is forwarded to Python's arviz.loo. The docstring of that function is included below.


    Estimates the expected log pointwise predictive density (elpd) using Pareto-smoothed
    importance sampling leave-one-out cross-validation (PSIS-LOO-CV). Also calculates LOO's
    standard error and the effective number of parameters. Read more theory here
    https://arxiv.org/abs/1507.04544 and here https://arxiv.org/abs/1507.02646

    Parameters
    ----------
    data: obj
        Any object that can be converted to an :class:`arviz.InferenceData` object.
        Refer to documentation of
        :func:`arviz.convert_to_dataset` for details.
    pointwise: bool, optional
        If True the pointwise predictive accuracy will be returned. Defaults to
        ``stats.ic_pointwise`` rcParam.
    var_name : str, optional
        The name of the variable in log_likelihood groups storing the pointwise log
        likelihood data to use for loo computation.
    reff: float, optional
        Relative MCMC efficiency, ``ess / n`` i.e. number of effective samples divided by the number
        of actual samples. Computed from trace by default.
    scale: str
        Output scale for loo. Available options are:

        - ``log`` : (default) log-score
        - ``negative_log`` : -1 * log-score
        - ``deviance`` : -2 * log-score

        A higher log-score (or a lower deviance or negative log_score) indicates a model with
        better predictive accuracy.

    Returns
    -------
    ELPDData object (inherits from :class:`pandas.Series`) with the following row/attributes:
    elpd: approximated expected log pointwise predictive density (elpd)
    se: standard error of the elpd
    p_loo: effective number of parameters
    shape_warn: bool
        True if the estimated shape parameter of
        Pareto distribution is greater than 0.7 for one or more samples
    loo_i: array of pointwise predictive accuracy, only if pointwise True
    pareto_k: array of Pareto shape values, only if pointwise True
    scale: scale of the elpd

        The returned object has a custom print method that overrides pd.Series method.

    See Also
    --------
    compare : Compare models based on PSIS-LOO loo or WAIC waic cross-validation.
    waic : Compute the widely applicable information criterion.
    plot_compare : Summary plot for model comparison.
    plot_elpd : Plot pointwise elpd differences between two or more models.
    plot_khat : Plot Pareto tail indices for diagnosing convergence.

    Examples
    --------
    Calculate LOO of a model:

    .. ipython::

        In [1]: import arviz as az
           ...: data = az.load_arviz_data("centered_eight")
           ...: az.loo(data)

    Calculate LOO of a model and return the pointwise values:

    .. ipython::

        In [2]: data_loo = az.loo(data, pointwise=True)
           ...: data_loo.loo_i
    
source
ArviZ.loo_pitFunction

Compute leave one out (PSIS-LOO) probability integral transform (PIT) values.

Note

This function is forwarded to Python's arviz.loo_pit. The docstring of that function is included below.


    Parameters
    ----------
    idata: InferenceData
        :class:`arviz.InferenceData` object.
    y: array, DataArray or str
        Observed data. If str, ``idata`` must be present and contain the observed data group
    y_hat: array, DataArray or str
        Posterior predictive samples for ``y``. It must have the same shape as y plus an
        extra dimension at the end of size n_samples (chains and draws stacked). If str or
        None, ``idata`` must contain the posterior predictive group. If None, y_hat is taken
        equal to y, thus, y must be str too.
    log_weights: array or DataArray
        Smoothed log_weights. It must have the same shape as ``y_hat``
    dask_kwargs : dict, optional
        Dask related kwargs passed to :func:`~arviz.wrap_xarray_ufunc`.

    Returns
    -------
    loo_pit: array or DataArray
        Value of the LOO-PIT at each observed data point.

    See Also
    --------
    plot_loo_pit : Plot Leave-One-Out probability integral transformation (PIT) predictive checks.
    loo : Compute Pareto-smoothed importance sampling leave-one-out
          cross-validation (PSIS-LOO-CV).
    plot_elpd : Plot pointwise elpd differences between two or more models.
    plot_khat : Plot Pareto tail indices for diagnosing convergence.

    Examples
    --------
    Calculate LOO-PIT values using as test quantity the observed values themselves.

    .. ipython::

        In [1]: import arviz as az
           ...: data = az.load_arviz_data("centered_eight")
           ...: az.loo_pit(idata=data, y="obs")

    Calculate LOO-PIT values using as test quantity the square of the difference between
    each observation and `mu`. Both ``y`` and ``y_hat`` inputs will be array-like,
    but ``idata`` will still be passed in order to calculate the ``log_weights`` from
    there.

    .. ipython::

        In [1]: T = data.observed_data.obs - data.posterior.mu.median(dim=("chain", "draw"))
           ...: T_hat = data.posterior_predictive.obs - data.posterior.mu
           ...: T_hat = T_hat.stack(__sample__=("chain", "draw"))
           ...: az.loo_pit(idata=data, y=T**2, y_hat=T_hat**2)

    
source
ArviZ.waicFunction

Compute the widely applicable information criterion.

Note

This function is forwarded to Python's arviz.waic. The docstring of that function is included below.


    Estimates the expected log pointwise predictive density (elpd) using WAIC. Also calculates the
    WAIC's standard error and the effective number of parameters.
    Read more theory here https://arxiv.org/abs/1507.04544 and here https://arxiv.org/abs/1004.2316

    Parameters
    ----------
    data: obj
        Any object that can be converted to an :class:`arviz.InferenceData` object.
        Refer to documentation of :func:`arviz.convert_to_inference_data` for details.
    pointwise: bool
        If True the pointwise predictive accuracy will be returned. Defaults to
        ``stats.ic_pointwise`` rcParam.
    var_name : str, optional
        The name of the variable in log_likelihood groups storing the pointwise log
        likelihood data to use for waic computation.
    scale: str
        Output scale for WAIC. Available options are:

        - `log` : (default) log-score
        - `negative_log` : -1 * log-score
        - `deviance` : -2 * log-score

        A higher log-score (or a lower deviance or negative log_score) indicates a model with
        better predictive accuracy.
    dask_kwargs : dict, optional
        Dask related kwargs passed to :func:`~arviz.wrap_xarray_ufunc`.

    Returns
    -------
    ELPDData object (inherits from :class:`pandas.Series`) with the following row/attributes:
    elpd_waic: approximated expected log pointwise predictive density (elpd)
    se: standard error of the elpd
    p_waic: effective number parameters
    var_warn: bool
        True if posterior variance of the log predictive densities exceeds 0.4
    waic_i: :class:`~xarray.DataArray` with the pointwise predictive accuracy,
            only if pointwise=True
    scale: scale of the elpd

        The returned object has a custom print method that overrides pd.Series method.

    See Also
    --------
    loo : Compute Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO-CV).
    compare : Compare models based on PSIS-LOO-CV or WAIC.
    plot_compare : Summary plot for model comparison.

    Examples
    --------
    Calculate WAIC of a model:

    .. ipython::

        In [1]: import arviz as az
           ...: data = az.load_arviz_data("centered_eight")
           ...: az.waic(data)

    Calculate WAIC of a model and return the pointwise values:

    .. ipython::

        In [2]: data_waic = az.waic(data, pointwise=True)
           ...: data_waic.waic_i
    
source