Inference Data Cookbook

InferenceData is the central data format for ArviZ. InferenceData itself is just a container that maintains references to one or more xarray.Dataset. Below are various ways to generate an InferenceData object. See here for more on xarray.

In [3]:
import arviz as az
import numpy as np

From 1d numpy array

In [6]:
size = 100
dataset = az.convert_to_inference_data(np.random.randn(size))
print(dataset)
dataset.posterior
Inference data with groups:
        > posterior
Out[6]:
<xarray.Dataset>
Dimensions:  (chain: 1, draw: 100)
Coordinates:
  * chain    (chain) int64 0
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
Data variables:
    x        (chain, draw) float64 -0.8813 0.3384 1.384 1.241 0.5025 -0.564 ...

From nd numpy array

In [21]:
shape = (1, 2, 3, 4, 5)
dataset = az.convert_to_inference_data(np.random.randn(*shape))
print(dataset)
dataset.posterior
Inference data with groups:
        > posterior
Out[21]:
<xarray.Dataset>
Dimensions:  (chain: 1, draw: 2, x_dim_0: 3, x_dim_1: 4, x_dim_2: 5)
Coordinates:
  * chain    (chain) int64 0
  * draw     (draw) int64 0 1
  * x_dim_0  (x_dim_0) int64 0 1 2
  * x_dim_1  (x_dim_1) int64 0 1 2 3
  * x_dim_2  (x_dim_2) int64 0 1 2 3 4
Data variables:
    x        (chain, draw, x_dim_0, x_dim_1, x_dim_2) float64 0.2366 -2.062 ...

From a dictionary

In [25]:
datadict = {
    'a': np.random.randn(100),
    'b': np.random.randn(1, 100, 10),
    'c': np.random.randn(1, 100, 3, 4),
}
dataset = az.convert_to_inference_data(datadict)
print(dataset)
dataset.posterior
Inference data with groups:
        > posterior
Out[25]:
<xarray.Dataset>
Dimensions:  (b_dim_0: 10, c_dim_0: 3, c_dim_1: 4, chain: 1, draw: 100)
Coordinates:
  * chain    (chain) int64 0
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
  * b_dim_0  (b_dim_0) int64 0 1 2 3 4 5 6 7 8 9
  * c_dim_0  (c_dim_0) int64 0 1 2
  * c_dim_1  (c_dim_1) int64 0 1 2 3
Data variables:
    a        (chain, draw) float64 -3.45 -0.9621 0.5625 -0.9328 -0.1887 ...
    b        (chain, draw, b_dim_0) float64 0.7919 0.2132 0.6567 0.3348 ...
    c        (chain, draw, c_dim_0, c_dim_1) float64 0.2097 -0.4977 0.8485 ...

From dictionary with coords and dims

In [24]:
datadict = {
    'a': np.random.randn(100),
    'b': np.random.randn(1, 100, 10),
    'c': np.random.randn(1, 100, 3, 4),
}
coords = {'c1' : np.arange(3), 'c2' : np.arange(4), 'b1' : np.arange(10)}
dims = {'b' : ['b1'], 'c' : ['c1', 'c2']}

dataset = az.convert_to_inference_data(datadict, coords=coords, dims=dims)
print(dataset)
dataset.posterior
Inference data with groups:
        > posterior
Out[24]:
<xarray.Dataset>
Dimensions:  (b1: 10, c1: 3, c2: 4, chain: 1, draw: 100)
Coordinates:
  * chain    (chain) int64 0
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
  * b1       (b1) int64 0 1 2 3 4 5 6 7 8 9
  * c1       (c1) int64 0 1 2
  * c2       (c2) int64 0 1 2 3
Data variables:
    a        (chain, draw) float64 -1.477 0.7551 0.2976 -0.5388 -0.05706 ...
    b        (chain, draw, b1) float64 -1.669 -0.8185 0.4427 -1.23 0.8002 ...
    c        (chain, draw, c1, c2) float64 -0.5959 -0.8583 0.5428 1.132 ...

From pymc3

In [41]:
import pymc3 as pm
draws = 500
chains = 2

eight_school_data = {'J': 8,
                     'y': np.array([28., 8., -3., 7., -1., 1., 18., 12.]),
                     'sigma': np.array([15., 10., 16., 11., 9., 11., 10., 18.])
                    }

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sd=5)
    tau = pm.HalfCauchy('tau', beta=5)
    theta_tilde = pm.Normal('theta_tilde', mu=0, sd=1, shape=eight_school_data['J'])
    theta = pm.Deterministic('theta', mu + tau * theta_tilde)
    pm.Normal('obs', mu=theta, sd=eight_school_data['sigma'], observed=eight_school_data['y'])

    trace = pm.sample(draws, chains=chains)
    prior = pm.sample_prior_predictive()
    posterior_predictive = pm.sample_posterior_predictive(trace, 500, model)

    data = az.from_pymc3(
            trace=trace,
            prior=prior,
            posterior_predictive=posterior_predictive,
            coords={'school': np.arange(eight_school_data['J'])},
            dims={'theta': ['school'], 'theta_tilde': ['school']},
        )
data
Auto-assigning NUTS sampler...
INFO:pymc3:Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
INFO:pymc3:Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
INFO:pymc3:Multiprocess sampling (2 chains in 4 jobs)
NUTS: [theta_tilde, tau, mu]
INFO:pymc3:NUTS: [theta_tilde, tau, mu]
Sampling 2 chains: 100%|██████████| 2000/2000 [00:00<00:00, 2253.54draws/s]
There were 3 divergences after tuning. Increase `target_accept` or reparameterize.
ERROR:pymc3:There were 3 divergences after tuning. Increase `target_accept` or reparameterize.
100%|██████████| 500/500 [00:00<00:00, 3327.88it/s]
Out[41]:
Inference data with groups:
    > posterior
    > sample_stats
    > posterior_predictive
    > prior
    > observed_data

From pystan

In [43]:
import pystan
schools_code = '''
        data {
            int<lower=0> J;
            real y[J];
            real<lower=0> sigma[J];
        }

        parameters {
            real mu;
            real<lower=0> tau;
            real theta_tilde[J];
        }

        transformed parameters {
            real theta[J];
            for (j in 1:J)
                theta[j] = mu + tau * theta_tilde[j];
        }

        model {
            mu ~ normal(0, 5);
            tau ~ cauchy(0, 5);
            theta_tilde ~ normal(0, 1);
            y ~ normal(theta, sigma);
        }

        generated quantities {
            vector[J] log_lik;
            vector[J] y_hat;
            for (j in 1:J) {
                log_lik[j] = normal_lpdf(y[j] | theta[j], sigma[j]);
                y_hat[j] = normal_rng(theta[j], sigma[j]);
            }
        }
    '''
stan_model = pystan.StanModel(model_code=schools_code)
fit = stan_model.sampling(data=eight_school_data,
                          iter=draws,
                          warmup=0,
                          chains=chains)

data = az.from_pystan(fit=fit,
                      posterior_predictive='y_hat',
                      observed_data=['y'],
                      log_likelihood='log_lik',
                      coords={'school': np.arange(eight_school_data['J'])},
                      dims={'theta': ['school'],
                             'y': ['school'],
                             'log_lik': ['school'],
                             'y_hat': ['school'],
                             'theta_tilde': ['school']
                            }
                     )
data
WARNING:pystan:64 of 1000 iterations ended with a divergence (6.4%).
WARNING:pystan:Try running with adapt_delta larger than 0.8 to remove the divergences.
Out[43]:
Inference data with groups:
    > posterior
    > sample_stats
    > posterior_predictive
    > observed_data

From pyro

See from_pyro for details. Cookbook documentation coming soon.

From emcee

See from_emcee for details. Cookbook documentation coming soon.

From cmdstan

See from_cmdstan for details. Cookbook documentation coming soon.