Data
ArviZ.concatArviZ.extractArviZ.from_cmdstanArviZ.from_jsonArviZ.from_mcmcchainsArviZ.from_samplechainsArviZExampleData.describe_example_dataArviZExampleData.load_example_dataInferenceObjectsNetCDF.from_netcdfInferenceObjectsNetCDF.to_netcdf
Inference library converters
ArviZ.from_cmdstan — FunctionConvert CmdStan data into an InferenceData object.
This function is forwarded to Python's arviz.from_cmdstan. The docstring of that function is included below.
For a usage example read the
:ref:`Creating InferenceData section on from_cmdstan <creating_InferenceData>`
Parameters
----------
posterior : str or list of str, optional
List of paths to output.csv files.
posterior_predictive : str or list of str, optional
Posterior predictive samples for the fit. If endswith ".csv" assumes file.
predictions : str or list of str, optional
Out of sample predictions samples for the fit. If endswith ".csv" assumes file.
prior : str or list of str, optional
List of paths to output.csv files
prior_predictive : str or list of str, optional
Prior predictive samples for the fit. If endswith ".csv" assumes file.
observed_data : str, optional
Observed data used in the sampling. Path to data file in Rdump or JSON format.
observed_data_var : str or list of str, optional
Variable(s) used for slicing observed_data. If not defined, all
data variables are imported.
constant_data : str, optional
Constant data used in the sampling. Path to data file in Rdump or JSON format.
constant_data_var : str or list of str, optional
Variable(s) used for slicing constant_data. If not defined, all
data variables are imported.
predictions_constant_data : str, optional
Constant data for predictions used in the sampling.
Path to data file in Rdump or JSON format.
predictions_constant_data_var : str or list of str, optional
Variable(s) used for slicing predictions_constant_data.
If not defined, all data variables are imported.
log_likelihood : dict of {str: str}, list of str or str, optional
Pointwise log_likelihood for the data. log_likelihood is extracted from the
posterior. It is recommended to use this argument as a dictionary whose keys
are observed variable names and its values are the variables storing log
likelihood arrays in the Stan code. In other cases, a dictionary with keys
equal to its values is used. By default, if a variable ``log_lik`` is
present in the Stan model, it will be retrieved as pointwise log
likelihood values. Use ``False`` to avoid this behaviour.
index_origin : int, optional
Starting value of integer coordinate values. Defaults to the value in rcParam
``data.index_origin``.
coords : dict of {str: array_like}, optional
A dictionary containing the values that are used as index. The key
is the name of the dimension, the values are the index values.
dims : dict of {str: list of str}, optional
A mapping from variables to a list of coordinate names for the variable.
disable_glob : bool
Don't use glob for string input. This means that all string input is
assumed to be variable names (samples) or a path (data).
save_warmup : bool
Save warmup iterations into InferenceData object, if found in the input files.
If not defined, use default defined by the rcParams.
dtypes : dict or str
A dictionary containing dtype information (int, float) for parameters.
If input is a string, it is assumed to be a model code or path to model code file.
Returns
-------
InferenceData object
ArviZ.from_mcmcchains — Functionfrom_mcmcchains(posterior::MCMCChains.Chains; kwargs...) -> InferenceData
from_mcmcchains(; kwargs...) -> InferenceData
from_mcmcchains(
posterior::MCMCChains.Chains,
posterior_predictive,
predictions,
log_likelihood;
kwargs...
) -> InferenceDataConvert data in an MCMCChains.Chains format into an InferenceData.
Any keyword argument below without an an explicitly annotated type above is allowed, so long as it can be passed to convert_to_inference_data.
Arguments
posterior::MCMCChains.Chains: Draws from the posterior
Keywords
posterior_predictive::Any=nothing: Draws from the posterior predictive distribution or name(s) of predictive variables inposteriorpredictions: Out-of-sample predictions for the posterior.prior: Draws from the priorprior_predictive: Draws from the prior predictive distribution or name(s) of predictive variables inpriorobserved_data: Observed data on which theposterioris conditional. It should only contain data which is modeled as a random variable. Keys are parameter names and values.constant_data: Model constants, data included in the model that are not modeled as random variables. Keys are parameter names.predictions_constant_data: Constants relevant to the model predictions (i.e. newxvalues in a linear regression).log_likelihood: Pointwise log-likelihood for the data. It is recommended to use this argument as a named tuple whose keys are observed variable names and whose values are log likelihood arrays. Alternatively, provide the name of variable inposteriorcontaining log likelihoods.library=MCMCChains: Name of library that generated the chainscoords: Map from named dimension to named indicesdims: Map from variable name to names of its dimensionseltypes: Map from variable names to eltypes. This is primarily used to assign discrete eltypes to discrete variables that were stored inChainsas floats.
Returns
InferenceData: The data with groups corresponding to the provided data
ArviZ.from_samplechains — Functionfrom_samplechains(
posterior=nothing;
prior=nothing,
library=SampleChains,
kwargs...,
) -> InferenceDataConvert SampleChains samples to an InferenceData.
Either posterior or prior may be a SampleChains.AbstractChain or SampleChains.MultiChain object.
For descriptions of remaining kwargs, see from_namedtuple.
IO / Conversion
ArviZ.from_json — FunctionInitialize object from a json file.
This function is forwarded to Python's arviz.from_json. The docstring of that function is included below.
Will use the faster `ujson` (https://github.com/ultrajson/ultrajson) if it is available.
Parameters
----------
filename : str
location of json file
Returns
-------
InferenceData object
InferenceObjectsNetCDF.from_netcdf — Functionfrom_netcdf(path::AbstractString; kwargs...) -> InferenceDataLoad an InferenceData from an unopened NetCDF file.
Remaining kwargs are passed to NCDatasets.NCDataset. This method loads data eagerly. To instead load data lazily, pass an opened NCDataset to from_netcdf.
Examples
julia> idata = from_netcdf("centered_eight.nc")
InferenceData with groups:
> posterior
> posterior_predictive
> sample_stats
> prior
> observed_datafrom_netcdf(ds::NCDatasets.NCDataset; load_mode) -> InferenceDataLoad an InferenceData from an opened NetCDF file.
load_mode defaults to :lazy, which avoids reading variables into memory. Operations on these arrays will be slow. load_mode can also be :eager, which copies all variables into memory. It is then safe to close ds. If load_mode is :lazy and ds is closed after constructing InferenceData, using the variable arrays will have undefined behavior.
Examples
Here is how we might load an InferenceData from an InferenceData lazily from a web-hosted NetCDF file.
julia> using HTTP, NCDatasets
julia> resp = HTTP.get("https://github.com/arviz-devs/arviz_example_data/blob/main/data/centered_eight.nc?raw=true");
julia> ds = NCDataset("centered_eight", "r"; memory = resp.body);
julia> idata = from_netcdf(ds)
InferenceData with groups:
> posterior
> posterior_predictive
> sample_stats
> prior
> observed_data
julia> idata_copy = copy(idata); # disconnect from the loaded dataset
julia> close(ds);InferenceObjectsNetCDF.to_netcdf — Functionto_netcdf(data, dest::AbstractString; group::Symbol=:posterior, kwargs...)
to_netcdf(data, dest::NCDatasets.NCDataset; group::Symbol=:posterior)Write data to a NetCDF file.
data is any type that can be converted to an InferenceData using convert_to_inference_data. If not an InferenceData, then group specifies which group the data represents.
dest specifies either the path to the NetCDF file or an opened NetCDF file. If dest is a path, remaining kwargs are passed to NCDatasets.NCDataset.
Examples
julia> using NCDatasets
julia> idata = from_namedtuple((; x = randn(4, 100, 3), z = randn(4, 100)))
InferenceData with groups:
> posterior
julia> to_netcdf(idata, "data.nc")
"data.nc"General functions
ArviZ.concat — FunctionConcatenate InferenceData objects.
This function is forwarded to Python's arviz.concat. The docstring of that function is included below.
Concatenates over `group`, `chain` or `draw`.
By default concatenates over unique groups.
To concatenate over `chain` or `draw` function
needs identical groups and variables.
The `variables` in the `data` -group are merged if `dim` are not found.
Parameters
----------
*args : InferenceData
Variable length InferenceData list or
Sequence of InferenceData.
dim : str, optional
If defined, concatenated over the defined dimension.
Dimension which is concatenated. If None, concatenates over
unique groups.
copy : bool
If True, groups are copied to the new InferenceData object.
Used only if `dim` is None.
inplace : bool
If True, merge args to first object.
reset_dim : bool
Valid only if dim is not None.
Returns
-------
InferenceData
A new InferenceData object by default.
When `inplace==True` merge args to first arg and return `None`
See Also
--------
add_groups : Add new groups to InferenceData object.
extend : Extend InferenceData with groups from another InferenceData.
Examples
--------
Use ``concat`` method to concatenate InferenceData objects. This will concatenates over
unique groups by default. We first create an ``InferenceData`` object:
.. ipython::
In [1]: import arviz as az
...: import numpy as np
...: data = {
...: "a": np.random.normal(size=(4, 100, 3)),
...: "b": np.random.normal(size=(4, 100)),
...: }
...: coords = {"a_dim": ["x", "y", "z"]}
...: dataA = az.from_dict(data, coords=coords, dims={"a": ["a_dim"]})
...: dataA
We have created an ``InferenceData`` object with default group 'posterior'. Now, we will
create another ``InferenceData`` object:
.. ipython::
In [1]: dataB = az.from_dict(prior=data, coords=coords, dims={"a": ["a_dim"]})
...: dataB
We have created another ``InferenceData`` object with group 'prior'. Now, we will concatenate
these two ``InferenceData`` objects:
.. ipython::
In [1]: az.concat(dataA, dataB)
Now, we will concatenate over chain (or draw). It requires identical groups and variables.
Here we are concatenating two identical ``InferenceData`` objects over dimension chain:
.. ipython::
In [1]: az.concat(dataA, dataA, dim="chain")
It will create an ``InferenceData`` with the original group 'posterior'. In similar way,
we can also concatenate over draws.
ArviZ.extract — FunctionExtract an InferenceData group or subset of it.
This function is forwarded to Python's arviz.extract. The docstring of that function is included below.
Parameters
----------
idata : InferenceData or InferenceData_like
InferenceData from which to extract the data.
group : str, optional
Which InferenceData data group to extract data from.
combined : bool, optional
Combine ``chain`` and ``draw`` dimensions into ``sample``. Won't work if
a dimension named ``sample`` already exists.
var_names : str or list of str, optional
Variables to be extracted. Prefix the variables by `~` when you want to exclude them.
filter_vars: {None, "like", "regex"}, optional
If `None` (default), interpret var_names as the real variables names. If "like",
interpret var_names as substrings of the real variables names. If "regex",
interpret var_names as regular expressions on the real variables names. A la
`pandas.filter`.
Like with plotting, sometimes it's easier to subset saying what to exclude
instead of what to include
num_samples : int, optional
Extract only a subset of the samples. Only valid if ``combined=True``
keep_dataset : bool, optional
If true, always return a DataSet. If false (default) return a DataArray
when there is a single variable.
rng : bool, int, numpy.Generator, optional
Shuffle the samples, only valid if ``combined=True``. By default,
samples are shuffled if ``num_samples`` is not ``None``, and are left
in the same order otherwise. This ensures that subsetting the samples doesn't return
only samples from a single chain and consecutive draws.
Returns
-------
xarray.DataArray or xarray.Dataset
Examples
--------
The default behaviour is to return the posterior group after stacking the chain and
draw dimensions.
.. jupyter-execute::
import arviz as az
idata = az.load_arviz_data("centered_eight")
az.extract(idata)
You can also indicate a subset to be returned, but in variables and in samples:
.. jupyter-execute::
az.extract(idata, var_names="theta", num_samples=100)
To keep the chain and draw dimensions, use ``combined=False``.
.. jupyter-execute::
az.extract(idata, group="prior", combined=False)
Example data
ArviZExampleData.describe_example_data — Functiondescribe_example_data() -> StringReturn a string containing descriptions of all available datasets.
Examples
julia> describe_example_data("radon") |> println
radon
=====
Radon is a radioactive gas that enters homes through contact points with the ground. It is a carcinogen that is the primary cause of lung cancer in non-smokers. Radon levels vary greatly from household to household.
This example uses an EPA study of radon levels in houses in Minnesota to construct a model with a hierarchy over households within a county. The model includes estimates (gamma) for contextual effects of the uranium per household.
See Gelman and Hill (2006) for details on the example, or https://docs.pymc.io/notebooks/multilevel_modeling.html by Chris Fonnesbeck for details on this implementation.
remote: http://ndownloader.figshare.com/files/24067472ArviZExampleData.load_example_data — Functionload_example_data(name; kwargs...) -> InferenceObjects.InferenceData
load_example_data() -> Dict{String,AbstractFileMetadata}Load a local or remote pre-made dataset.
kwargs are forwarded to InferenceObjects.from_netcdf.
Pass no parameters to get a Dict listing all available datasets.
Data files are handled by DataDeps.jl. A file is downloaded only when it is requested and then cached for future use.
Examples
julia> keys(load_example_data())
KeySet for a OrderedCollections.OrderedDict{String, ArviZExampleData.AbstractFileMetadata} with 9 entries. Keys:
"centered_eight"
"non_centered_eight"
"radon"
"rugby"
"regression1d"
"regression10d"
"classification1d"
"classification10d"
"glycan_torsion_angles"
julia> load_example_data("centered_eight")
InferenceData with groups:
> posterior
> posterior_predictive
> log_likelihood
> sample_stats
> prior
> prior_predictive
> observed_data
> constant_data