Stats
ArviZ.ArviZStats.AbstractELPDResult
ArviZ.ArviZStats.PSISLOOResult
ArviZ.ArviZStats.WAICResult
PSIS.PSISResult
ArviZ.ArviZStats.compare
ArviZ.ArviZStats.elpd_estimates
ArviZ.ArviZStats.hdi
ArviZ.ArviZStats.information_criterion
ArviZ.ArviZStats.loo
ArviZ.ArviZStats.loo_pit
ArviZ.ArviZStats.r2_score
ArviZ.ArviZStats.summary
ArviZ.ArviZStats.waic
PSIS.psis
PSIS.psis!
StatsBase.summarystats
General statistics
ArviZ.ArviZStats.hdi
— FunctionThis function is forwarded to Python's arviz.hdi
. The docstring of that function is included below.
Calculate highest density interval (HDI) of array for given probability.
The HDI is the minimum width Bayesian credible interval (BCI).
Parameters
----------
ary: obj
object containing posterior samples.
Any object that can be converted to an :class:`arviz.InferenceData` object.
Refer to documentation of :func:`arviz.convert_to_dataset` for details.
hdi_prob: float, optional
Prob for which the highest density interval will be computed. Defaults to
``stats.hdi_prob`` rcParam.
circular: bool, optional
Whether to compute the hdi taking into account `x` is a circular variable
(in the range [-np.pi, np.pi]) or not. Defaults to False (i.e non-circular variables).
Only works if multimodal is False.
multimodal: bool, optional
If true it may compute more than one hdi if the distribution is multimodal and the
modes are well separated.
skipna: bool, optional
If true ignores nan values when computing the hdi. Defaults to false.
group: str, optional
Specifies which InferenceData group should be used to calculate hdi.
Defaults to 'posterior'
var_names: list, optional
Names of variables to include in the hdi report. Prefix the variables by ``~``
when you want to exclude them from the report: `["~beta"]` instead of `["beta"]`
(see :func:`arviz.summary` for more details).
filter_vars: {None, "like", "regex"}, optional, default=None
If `None` (default), interpret var_names as the real variables names. If "like",
interpret var_names as substrings of the real variables names. If "regex",
interpret var_names as regular expressions on the real variables names. A la
``pandas.filter``.
coords: mapping, optional
Specifies the subset over to calculate hdi.
max_modes: int, optional
Specifies the maximum number of modes for multimodal case.
dask_kwargs : dict, optional
Dask related kwargs passed to :func:`~arviz.wrap_xarray_ufunc`.
kwargs: dict, optional
Additional keywords passed to :func:`~arviz.wrap_xarray_ufunc`.
Returns
-------
np.ndarray or xarray.Dataset, depending upon input
lower(s) and upper(s) values of the interval(s).
See Also
--------
plot_hdi : Plot highest density intervals for regression data.
xarray.Dataset.quantile : Calculate quantiles of array for given probabilities.
Examples
--------
Calculate the HDI of a Normal random variable:
.. ipython::
In [1]: import arviz as az
...: import numpy as np
...: data = np.random.normal(size=2000)
...: az.hdi(data, hdi_prob=.68)
Calculate the HDI of a dataset:
.. ipython::
In [1]: import arviz as az
...: data = az.load_arviz_data('centered_eight')
...: az.hdi(data)
We can also calculate the HDI of some of the variables of dataset:
.. ipython::
In [1]: az.hdi(data, var_names=["mu", "theta"])
By default, ``hdi`` is calculated over the ``chain`` and ``draw`` dimensions. We can use the
``input_core_dims`` argument of :func:`~arviz.wrap_xarray_ufunc` to change this. In this example
we calculate the HDI also over the ``school`` dimension:
.. ipython::
In [1]: az.hdi(data, var_names="theta", input_core_dims = [["chain","draw", "school"]])
We can also calculate the hdi over a particular selection:
.. ipython::
In [1]: az.hdi(data, coords={"chain":[0, 1, 3]}, input_core_dims = [["draw"]])
ArviZ.ArviZStats.summary
— Functionsummary(
data; group = :posterior, coords dims, kwargs...,
) -> Union{Dataset,DataFrames.DataFrame}
Compute summary statistics on any object that can be passed to convert_to_dataset
.
Keywords
coords
: Map from named dimension to named indices.dims
: Map from variable name to names of its dimensions.kwargs
: Keyword arguments passed tosummarystats
.
StatsBase.summarystats
— Functionsummarystats(
data::InferenceData;
group = :posterior,
kwargs...,
) -> Union{Dataset,DataFrames.DataFrame}
summarystats(data::Dataset; kwargs...) -> Union{Dataset,DataFrames.DataFrame}
Compute summary statistics on data
.
Arguments
data::Union{Dataset,InferenceData}
: The data on which to compute summary statistics. Ifdata
is anInferenceData
, only the dataset corresponding togroup
is used.
Keywords
var_names
: Collection of names of variables asSymbol
s to include in summaryinclude_circ::Bool=false
: Whether to include circular statisticsdigits::Int
: Number of decimals used to round results. If not provided, numbers are not rounded.stat_funcs::Union{Dict{String,Function},Vector{Function}}=nothing
: A vector of functions or a dict of functions with function names as keys used to calculate statistics. By default, the mean, standard deviation, simulation standard error, and highest posterior density intervals are included. The functions will be given one argument, the samples for a variable as an array, The functions should operate on an array, returning a single number. For example,Statistics.mean
, orStatistics.var
would both work.extend::Bool=true
: Iftrue
, use the statistics returned bystat_funcs
in addition to, rather than in place of, the default statistics. This is only meaningful whenstat_funcs
is notnothing
.hdi_prob::Real=0.94
: HDI interval to compute. This is only meaningful whenstat_funcs
isnothing
.skipna::Bool=false
: Iftrue
, ignoresNaN
values when computing the summary statistics. It does not affect the behaviour of the functions passed tostat_funcs
.
Returns
DataFrames.DataFrame
: Summary statistics for each variable. Default statistics are:mean
sd
hdi_3%
hdi_97%
mcse_mean
mcse_sd
ess_bulk
ess_tail
r_hat
(only computed for traces with 2 or more chains)
Examples
using ArviZ
idata = load_example_data("centered_eight")
summarystats(idata; var_names=(:mu, :tau))
Other statistics can be calculated by passing a list of functions or a dictionary with key, function pairs:
using Statistics
function median_sd(x)
med = median(x)
sd = sqrt(mean((x .- med).^2))
return sd
end
func_dict = Dict(
"std" => x -> std(x; corrected = false),
"median_std" => median_sd,
"5%" => x -> quantile(x, 0.05),
"median" => median,
"95%" => x -> quantile(x, 0.95),
)
summarystats(idata; var_names = (:mu, :tau), stat_funcs = func_dict, extend = false)
ArviZ.ArviZStats.r2_score
— FunctionR² for Bayesian regression models. Only valid for linear models.
This function is forwarded to Python's arviz.r2_score
. The docstring of that function is included below.
Parameters
----------
y_true: array-like of shape = (n_outputs,)
Ground truth (correct) target values.
y_pred: array-like of shape = (n_posterior_samples, n_outputs)
Estimated target values.
Returns
-------
Pandas Series with the following indices:
r2: Bayesian R²
r2_std: standard deviation of the Bayesian R².
See Also
--------
plot_lm : Posterior predictive and mean plots for regression-like data.
Examples
--------
Calculate R² for Bayesian regression models :
.. ipython::
In [1]: import arviz as az
...: data = az.load_arviz_data('regression1d')
...: y_true = data.observed_data["y"].values
...: y_pred = data.posterior_predictive.stack(sample=("chain", "draw"))["y"].values.T
...: az.r2_score(y_true, y_pred)
Pareto-smoothed importance sampling
PSIS.PSISResult
— TypePSISResult
Result of Pareto-smoothed importance sampling (PSIS) using psis
.
Properties
log_weights
: un-normalized Pareto-smoothed log weightsweights
: normalized Pareto-smoothed weights (allocates a copy)pareto_shape
: Pareto $k=ξ$ shape parameternparams
: number of parameters inlog_weights
ndraws
: number of draws inlog_weights
nchains
: number of chains inlog_weights
reff
: the ratio of the effective sample size of the unsmoothed importance ratios and the actual sample size.ess
: estimated effective sample size of estimate of mean using smoothed importance samples (seeess_is
)tail_length
: length of the upper tail oflog_weights
that was smoothedtail_dist
: the generalized Pareto distribution that was fit to the tail oflog_weights
. Note that the tail weights are scaled to have a maximum of 1, sotail_dist * exp(maximum(log_ratios))
is the corresponding fit directly to the tail oflog_ratios
.normalized::Bool
:indicates whetherlog_weights
are log-normalized along the sample dimensions.
Diagnostic
The pareto_shape
parameter $k=ξ$ of the generalized Pareto distribution tail_dist
can be used to diagnose reliability and convergence of estimates using the importance weights [VehtariSimpson2021].
- if $k < \frac{1}{3}$, importance sampling is stable, and importance sampling (IS) and PSIS both are reliable.
- if $k ≤ \frac{1}{2}$, then the importance ratio distributon has finite variance, and the central limit theorem holds. As $k$ approaches the upper bound, IS becomes less reliable, while PSIS still works well but with a higher RMSE.
- if $\frac{1}{2} < k ≤ 0.7$, then the variance is infinite, and IS can behave quite poorly. However, PSIS works well in this regime.
- if $0.7 < k ≤ 1$, then it quickly becomes impractical to collect enough importance weights to reliably compute estimates, and importance sampling is not recommended.
- if $k > 1$, then neither the variance nor the mean of the raw importance ratios exists. The convergence rate is close to zero, and bias can be large with practical sample sizes.
See PSISPlots.paretoshapeplot
for a diagnostic plot.
PSIS.psis
— Functionpsis(log_ratios, reff = 1.0; kwargs...) -> PSISResult
psis!(log_ratios, reff = 1.0; kwargs...) -> PSISResult
Compute Pareto smoothed importance sampling (PSIS) log weights [VehtariSimpson2021].
While psis
computes smoothed log weights out-of-place, psis!
smooths them in-place.
Arguments
log_ratios
: an array of logarithms of importance ratios, with size(draws, [chains, [parameters...]])
, wherechains>1
would be used when chains are generated using Markov chain Monte Carlo.reff::Union{Real,AbstractArray}
: the ratio(s) of effective sample size oflog_ratios
and the actual sample sizereff = ess/(draws * chains)
, used to account for autocorrelation, e.g. due to Markov chain Monte Carlo. If an array, it must have the size(parameters...,)
to matchlog_ratios
.
Keywords
warn=true
: Iftrue
, warning messages are deliverednormalize=true
: Iftrue
, the log-weights will be log-normalized so thatexp.(log_weights)
sums to 1 along the sample dimensions.
Returns
result
: aPSISResult
object containing the results of the Pareto-smoothing.
A warning is raised if the Pareto shape parameter $k ≥ 0.7$. See PSISResult
for details and PSISPlots.paretoshapeplot
for a diagnostic plot.
PSIS.psis!
— Functionpsis(log_ratios, reff = 1.0; kwargs...) -> PSISResult
psis!(log_ratios, reff = 1.0; kwargs...) -> PSISResult
Compute Pareto smoothed importance sampling (PSIS) log weights [VehtariSimpson2021].
While psis
computes smoothed log weights out-of-place, psis!
smooths them in-place.
Arguments
log_ratios
: an array of logarithms of importance ratios, with size(draws, [chains, [parameters...]])
, wherechains>1
would be used when chains are generated using Markov chain Monte Carlo.reff::Union{Real,AbstractArray}
: the ratio(s) of effective sample size oflog_ratios
and the actual sample sizereff = ess/(draws * chains)
, used to account for autocorrelation, e.g. due to Markov chain Monte Carlo. If an array, it must have the size(parameters...,)
to matchlog_ratios
.
Keywords
warn=true
: Iftrue
, warning messages are deliverednormalize=true
: Iftrue
, the log-weights will be log-normalized so thatexp.(log_weights)
sums to 1 along the sample dimensions.
Returns
result
: aPSISResult
object containing the results of the Pareto-smoothing.
A warning is raised if the Pareto shape parameter $k ≥ 0.7$. See PSISResult
for details and PSISPlots.paretoshapeplot
for a diagnostic plot.
Model assessment and selection
LOO and WAIC
ArviZ.ArviZStats.AbstractELPDResult
— Typeabstract type AbstractELPDResult
An abstract type representing the result of an ELPD computation.
Every subtype stores estimates of both the expected log predictive density (elpd
) and the effective number of parameters p
, as well as standard errors and pointwise estimates of each, from which other relevant estimates can be computed.
Subtypes implement the following functions:
ArviZ.ArviZStats.PSISLOOResult
— TypeResults of Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO).
See also: loo
, AbstractELPDResult
estimates
: Estimates of the expected log pointwise predictive density (ELPD) and effective number of parameters (p)pointwise
: Pointwise estimatespsis_result
: Pareto-smoothed importance sampling (PSIS) results
ArviZ.ArviZStats.WAICResult
— TypeResults of computing the widely applicable information criterion (WAIC).
See also: waic
, AbstractELPDResult
estimates
: Estimates of the expected log pointwise predictive density (ELPD) and effective number of parameters (p)pointwise
: Pointwise estimates
ArviZ.ArviZStats.elpd_estimates
— Functionelpd_estimates(result::AbstractELPDResult; pointwise=false) -> (; elpd, elpd_mcse, lpd)
Return the (E)LPD estimates from the result
.
ArviZ.ArviZStats.information_criterion
— Functioninformation_criterion(elpd, scale::Symbol)
Compute the information criterion for the given scale
from the elpd
estimate.
scale
must be one of (:deviance, :log, :negative_log).
See also: effective_number_of_parameters
, loo
, waic
information_criterion(result::AbstractELPDResult, scale::Symbol; pointwise=false)
Compute information criterion for the given scale
from the existing ELPD result
.
scale
must be one of (:deviance, :log, :negative_log).
If pointwise=true
, then pointwise estimates are returned.
ArviZ.ArviZStats.loo
— Functionloo(log_likelihood; reff=nothing, kwargs...) -> PSISLOOResult{<:NamedTuple,<:NamedTuple}
Compute the Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO). [Vehtari2017][LOOFAQ]
log_likelihood
must be an array of log-likelihood values with shape (chains, draws[, params...])
.
Keywords
reff::Union{Real,AbstractArray{<:Real}}
: The relative effective sample size(s) of the likelihood values. If an array, it must have the same data dimensions as the corresponding log-likelihood variable. If not provided, then this is estimated usingess
.kwargs
: Remaining keywords are forwarded topsis
.
See also: PSISLOOResult
, waic
loo(data::Dataset; [var_name::Symbol,] kwargs...) -> PSISLOOResult{<:NamedTuple,<:Dataset}
loo(data::InferenceData; [var_name::Symbol,] kwargs...) -> PSISLOOResult{<:NamedTuple,<:Dataset}
Compute PSIS-LOO from log-likelihood values in data
.
If more than one log-likelihood variable is present, then var_name
must be provided.
ArviZ.ArviZStats.waic
— Functionwaic(log_likelihood::AbstractArray) -> WAICResult{<:NamedTuple,<:NamedTuple}
Compute the widely applicable information criterion (WAIC).[Watanabe2010][Vehtari2017][LOOFAQ]
log_likelihood
must be an array of log-likelihood values with shape (chains, draws[, params...])
.
See also: WAICResult
, loo
waic(data::Dataset; [var_name::Symbol]) -> WAICResult{<:NamedTuple,<:Dataset}
waic(data::InferenceData; [var_name::Symbol]) -> WAICResult{<:NamedTuple,<:Dataset}
Compute WAIC from log-likelihood values in data
.
If more than one log-likelihood variable is present, then var_name
must be provided.
Others
ArviZ.ArviZStats.compare
— FunctionCompare models based on their expected log pointwise predictive density (ELPD).
This function is forwarded to Python's arviz.compare
. The docstring of that function is included below.
The ELPD is estimated either by Pareto smoothed importance sampling leave-one-out
cross-validation (LOO) or using the widely applicable information criterion (WAIC).
We recommend loo. Read more theory here - in a paper by some of the
leading authorities on model comparison dx.doi.org/10.1111/1467-9868.00353
Parameters
----------
compare_dict: dict of {str: InferenceData or ELPDData}
A dictionary of model names and :class:`arviz.InferenceData` or ``ELPDData``.
ic: str, optional
Method to estimate the ELPD, available options are "loo" or "waic". Defaults to
``rcParams["stats.information_criterion"]``.
method: str, optional
Method used to estimate the weights for each model. Available options are:
- 'stacking' : stacking of predictive distributions.
- 'BB-pseudo-BMA' : pseudo-Bayesian Model averaging using Akaike-type
weighting. The weights are stabilized using the Bayesian bootstrap.
- 'pseudo-BMA': pseudo-Bayesian Model averaging using Akaike-type
weighting, without Bootstrap stabilization (not recommended).
For more information read https://arxiv.org/abs/1704.02030
b_samples: int, optional default = 1000
Number of samples taken by the Bayesian bootstrap estimation.
Only useful when method = 'BB-pseudo-BMA'.
Defaults to ``rcParams["stats.ic_compare_method"]``.
alpha: float, optional
The shape parameter in the Dirichlet distribution used for the Bayesian bootstrap. Only
useful when method = 'BB-pseudo-BMA'. When alpha=1 (default), the distribution is uniform
on the simplex. A smaller alpha will keeps the final weights more away from 0 and 1.
seed: int or np.random.RandomState instance, optional
If int or RandomState, use it for seeding Bayesian bootstrap. Only
useful when method = 'BB-pseudo-BMA'. Default None the global
:mod:`numpy.random` state is used.
scale: str, optional
Output scale for IC. Available options are:
- `log` : (default) log-score (after Vehtari et al. (2017))
- `negative_log` : -1 * (log-score)
- `deviance` : -2 * (log-score)
A higher log-score (or a lower deviance) indicates a model with better predictive
accuracy.
var_name: str, optional
If there is more than a single observed variable in the ``InferenceData``, which
should be used as the basis for comparison.
Returns
-------
A DataFrame, ordered from best to worst model (measured by the ELPD).
The index reflects the key with which the models are passed to this function. The columns are:
rank: The rank-order of the models. 0 is the best.
elpd: ELPD estimated either using (PSIS-LOO-CV `elpd_loo` or WAIC `elpd_waic`).
Higher ELPD indicates higher out-of-sample predictive fit ("better" model).
If `scale` is `deviance` or `negative_log` smaller values indicates
higher out-of-sample predictive fit ("better" model).
pIC: Estimated effective number of parameters.
elpd_diff: The difference in ELPD between two models.
If more than two models are compared, the difference is computed relative to the
top-ranked model, that always has a elpd_diff of 0.
weight: Relative weight for each model.
This can be loosely interpreted as the probability of each model (among the compared model)
given the data. By default the uncertainty in the weights estimation is considered using
Bayesian bootstrap.
SE: Standard error of the ELPD estimate.
If method = BB-pseudo-BMA these values are estimated using Bayesian bootstrap.
dSE: Standard error of the difference in ELPD between each model and the top-ranked model.
It's always 0 for the top-ranked model.
warning: A value of 1 indicates that the computation of the ELPD may not be reliable.
This could be indication of WAIC/LOO starting to fail see
http://arxiv.org/abs/1507.04544 for details.
scale: Scale used for the ELPD.
Examples
--------
Compare the centered and non centered models of the eight school problem:
.. ipython::
In [1]: import arviz as az
...: data1 = az.load_arviz_data("non_centered_eight")
...: data2 = az.load_arviz_data("centered_eight")
...: compare_dict = {"non centered": data1, "centered": data2}
...: az.compare(compare_dict)
Compare the models using PSIS-LOO-CV, returning the ELPD in log scale and calculating the
weights using the stacking method.
.. ipython::
In [1]: az.compare(compare_dict, ic="loo", method="stacking", scale="log")
See Also
--------
loo :
Compute the ELPD using the Pareto smoothed importance sampling Leave-one-out
cross-validation method.
waic : Compute the ELPD using the widely applicable information criterion.
plot_compare : Summary plot for model comparison.
References
----------
.. [1] Vehtari, A., Gelman, A. & Gabry, J. Practical Bayesian model evaluation using
leave-one-out cross-validation and WAIC. Stat Comput 27, 1413–1432 (2017)
see https://doi.org/10.1007/s11222-016-9696-4
ArviZ.ArviZStats.loo_pit
— FunctionCompute leave one out (PSIS-LOO) probability integral transform (PIT) values.
This function is forwarded to Python's arviz.loo_pit
. The docstring of that function is included below.
Parameters
----------
idata: InferenceData
:class:`arviz.InferenceData` object.
y: array, DataArray or str
Observed data. If str, ``idata`` must be present and contain the observed data group
y_hat: array, DataArray or str
Posterior predictive samples for ``y``. It must have the same shape as y plus an
extra dimension at the end of size n_samples (chains and draws stacked). If str or
None, ``idata`` must contain the posterior predictive group. If None, y_hat is taken
equal to y, thus, y must be str too.
log_weights: array or DataArray
Smoothed log_weights. It must have the same shape as ``y_hat``
dask_kwargs : dict, optional
Dask related kwargs passed to :func:`~arviz.wrap_xarray_ufunc`.
Returns
-------
loo_pit: array or DataArray
Value of the LOO-PIT at each observed data point.
See Also
--------
plot_loo_pit : Plot Leave-One-Out probability integral transformation (PIT) predictive checks.
loo : Compute Pareto-smoothed importance sampling leave-one-out
cross-validation (PSIS-LOO-CV).
plot_elpd : Plot pointwise elpd differences between two or more models.
plot_khat : Plot Pareto tail indices for diagnosing convergence.
Examples
--------
Calculate LOO-PIT values using as test quantity the observed values themselves.
.. ipython::
In [1]: import arviz as az
...: data = az.load_arviz_data("centered_eight")
...: az.loo_pit(idata=data, y="obs")
Calculate LOO-PIT values using as test quantity the square of the difference between
each observation and `mu`. Both ``y`` and ``y_hat`` inputs will be array-like,
but ``idata`` will still be passed in order to calculate the ``log_weights`` from
there.
.. ipython::
In [1]: T = data.observed_data.obs - data.posterior.mu.median(dim=("chain", "draw"))
...: T_hat = data.posterior_predictive.obs - data.posterior.mu
...: T_hat = T_hat.stack(__sample__=("chain", "draw"))
...: az.loo_pit(idata=data, y=T**2, y_hat=T_hat**2)
- VehtariSimpson2021Vehtari A, Simpson D, Gelman A, Yao Y, Gabry J. (2021). Pareto smoothed importance sampling. arXiv:1507.02646v7 [stat.CO]
- VehtariSimpson2021Vehtari A, Simpson D, Gelman A, Yao Y, Gabry J. (2021). Pareto smoothed importance sampling. arXiv:1507.02646v7 [stat.CO]
- VehtariSimpson2021Vehtari A, Simpson D, Gelman A, Yao Y, Gabry J. (2021). Pareto smoothed importance sampling. arXiv:1507.02646v7 [stat.CO]
- Vehtari2017Vehtari, A., Gelman, A. & Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27, 1413–1432 (2017). doi: 10.1007/s11222-016-9696-4 arXiv: 1507.04544
- LOOFAQAki Vehtari. Cross-validation FAQ. https://mc-stan.org/loo/articles/online-only/faq.html
- Watanabe2010Watanabe, S. Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. 11(116):3571−3594, 2010. https://jmlr.csail.mit.edu/papers/v11/watanabe10a.html
- Vehtari2017Vehtari, A., Gelman, A. & Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27, 1413–1432 (2017). doi: 10.1007/s11222-016-9696-4 arXiv: 1507.04544
- LOOFAQAki Vehtari. Cross-validation FAQ. https://mc-stan.org/loo/articles/online-only/faq.html