arviz.compare

arviz.compare(dataset_dict, ic='waic', method='stacking', b_samples=1000, alpha=1, seed=None, scale='deviance')[source]

Compare models based on WAIC or LOO cross validation.

WAIC is Widely applicable information criterion, and LOO is leave-one-out (LOO) cross-validation. Read more theory here - in a paper by some of the leading authorities on model selection - dx.doi.org/10.1111/1467-9868.00353

Parameters:
dataset_dict : dict[str] -> InferenceData

A dictionary of model names and InferenceData objects

ic : str

Information Criterion (WAIC or LOO) used to compare models. Default WAIC.

method : str

Method used to estimate the weights for each model. Available options are:

  • ‘stacking’ : (default) stacking of predictive distributions.
  • ‘BB-pseudo-BMA’ : pseudo-Bayesian Model averaging using Akaike-type
    weighting. The weights are stabilized using the Bayesian bootstrap
  • ‘pseudo-BMA’: pseudo-Bayesian Model averaging using Akaike-type
    weighting, without Bootstrap stabilization (not recommended)

For more information read https://arxiv.org/abs/1704.02030

b_samples: int

Number of samples taken by the Bayesian bootstrap estimation. Only useful when method = ‘BB-pseudo-BMA’.

alpha : float

The shape parameter in the Dirichlet distribution used for the Bayesian bootstrap. Only useful when method = ‘BB-pseudo-BMA’. When alpha=1 (default), the distribution is uniform on the simplex. A smaller alpha will keeps the final weights more away from 0 and 1.

seed : int or np.random.RandomState instance

If int or RandomState, use it for seeding Bayesian bootstrap. Only useful when method = ‘BB-pseudo-BMA’. Default None the global np.random state is used.

scale : str

Output scale for IC. Available options are:

  • deviance : (default) -2 * (log-score)
  • log : 1 * log-score (after Vehtari et al. (2017))
  • negative_log : -1 * (log-score)
Returns:
A DataFrame, ordered from lowest to highest IC. The index reflects the order in which the
models are passed to this function. The columns are:
IC : Information Criteria (WAIC or LOO).

Smaller IC indicates higher out-of-sample predictive fit (“better” model). Default WAIC. If scale == log higher IC indicates higher out-of-sample predictive fit (“better” model).

pIC : Estimated effective number of parameters.
dIC : Relative difference between each IC (WAIC or LOO)
and the lowest IC (WAIC or LOO).

It’s always 0 for the top-ranked model.

weight: Relative weight for each model.

This can be loosely interpreted as the probability of each model (among the compared model) given the data. By default the uncertainty in the weights estimation is considered using Bayesian bootstrap.

SE : Standard error of the IC estimate.

If method = BB-pseudo-BMA these values are estimated using Bayesian bootstrap.

dSE : Standard error of the difference in IC between each model and
the top-ranked model.

It’s always 0 for the top-ranked model.

warning : A value of 1 indicates that the computation of the IC may not be reliable. This could

be indication of WAIC/LOO starting to fail see http://arxiv.org/abs/1507.04544 for details.

scale : Scale used for the IC.