Data
ArviZ.concat
ArviZ.extract
ArviZ.from_cmdstan
ArviZ.from_json
ArviZ.from_mcmcchains
ArviZ.from_samplechains
ArviZExampleData.describe_example_data
ArviZExampleData.load_example_data
InferenceObjectsNetCDF.from_netcdf
InferenceObjectsNetCDF.to_netcdf
Inference library converters
ArviZ.from_cmdstan
— FunctionConvert CmdStan data into an InferenceData object.
This function is forwarded to Python's arviz.from_cmdstan
. The docstring of that function is included below.
For a usage example read the
:ref:`Creating InferenceData section on from_cmdstan <creating_InferenceData>`
Parameters
----------
posterior : str or list of str, optional
List of paths to output.csv files.
posterior_predictive : str or list of str, optional
Posterior predictive samples for the fit. If endswith ".csv" assumes file.
predictions : str or list of str, optional
Out of sample predictions samples for the fit. If endswith ".csv" assumes file.
prior : str or list of str, optional
List of paths to output.csv files
prior_predictive : str or list of str, optional
Prior predictive samples for the fit. If endswith ".csv" assumes file.
observed_data : str, optional
Observed data used in the sampling. Path to data file in Rdump or JSON format.
observed_data_var : str or list of str, optional
Variable(s) used for slicing observed_data. If not defined, all
data variables are imported.
constant_data : str, optional
Constant data used in the sampling. Path to data file in Rdump or JSON format.
constant_data_var : str or list of str, optional
Variable(s) used for slicing constant_data. If not defined, all
data variables are imported.
predictions_constant_data : str, optional
Constant data for predictions used in the sampling.
Path to data file in Rdump or JSON format.
predictions_constant_data_var : str or list of str, optional
Variable(s) used for slicing predictions_constant_data.
If not defined, all data variables are imported.
log_likelihood : dict of {str: str}, list of str or str, optional
Pointwise log_likelihood for the data. log_likelihood is extracted from the
posterior. It is recommended to use this argument as a dictionary whose keys
are observed variable names and its values are the variables storing log
likelihood arrays in the Stan code. In other cases, a dictionary with keys
equal to its values is used. By default, if a variable ``log_lik`` is
present in the Stan model, it will be retrieved as pointwise log
likelihood values. Use ``False`` to avoid this behaviour.
index_origin : int, optional
Starting value of integer coordinate values. Defaults to the value in rcParam
``data.index_origin``.
coords : dict of {str: array_like}, optional
A dictionary containing the values that are used as index. The key
is the name of the dimension, the values are the index values.
dims : dict of {str: list of str}, optional
A mapping from variables to a list of coordinate names for the variable.
disable_glob : bool
Don't use glob for string input. This means that all string input is
assumed to be variable names (samples) or a path (data).
save_warmup : bool
Save warmup iterations into InferenceData object, if found in the input files.
If not defined, use default defined by the rcParams.
dtypes : dict or str
A dictionary containing dtype information (int, float) for parameters.
If input is a string, it is assumed to be a model code or path to model code file.
Returns
-------
InferenceData object
ArviZ.from_mcmcchains
— Functionfrom_mcmcchains(posterior::MCMCChains.Chains; kwargs...) -> InferenceData
from_mcmcchains(; kwargs...) -> InferenceData
from_mcmcchains(
posterior::MCMCChains.Chains,
posterior_predictive,
predictions,
log_likelihood;
kwargs...
) -> InferenceData
Convert data in an MCMCChains.Chains
format into an InferenceData
.
Any keyword argument below without an an explicitly annotated type above is allowed, so long as it can be passed to convert_to_inference_data
.
Arguments
posterior::MCMCChains.Chains
: Draws from the posterior
Keywords
posterior_predictive::Any=nothing
: Draws from the posterior predictive distribution or name(s) of predictive variables inposterior
predictions
: Out-of-sample predictions for the posterior.prior
: Draws from the priorprior_predictive
: Draws from the prior predictive distribution or name(s) of predictive variables inprior
observed_data
: Observed data on which theposterior
is conditional. It should only contain data which is modeled as a random variable. Keys are parameter names and values.constant_data
: Model constants, data included in the model that are not modeled as random variables. Keys are parameter names.predictions_constant_data
: Constants relevant to the model predictions (i.e. newx
values in a linear regression).log_likelihood
: Pointwise log-likelihood for the data. It is recommended to use this argument as a named tuple whose keys are observed variable names and whose values are log likelihood arrays. Alternatively, provide the name of variable inposterior
containing log likelihoods.library=MCMCChains
: Name of library that generated the chainscoords
: Map from named dimension to named indicesdims
: Map from variable name to names of its dimensionseltypes
: Map from variable names to eltypes. This is primarily used to assign discrete eltypes to discrete variables that were stored inChains
as floats.
Returns
InferenceData
: The data with groups corresponding to the provided data
ArviZ.from_samplechains
— Functionfrom_samplechains(
posterior=nothing;
prior=nothing,
library=SampleChains,
kwargs...,
) -> InferenceData
Convert SampleChains samples to an InferenceData
.
Either posterior
or prior
may be a SampleChains.AbstractChain
or SampleChains.MultiChain
object.
For descriptions of remaining kwargs
, see from_namedtuple
.
IO / Conversion
ArviZ.from_json
— FunctionInitialize object from a json file.
This function is forwarded to Python's arviz.from_json
. The docstring of that function is included below.
Will use the faster `ujson` (https://github.com/ultrajson/ultrajson) if it is available.
Parameters
----------
filename : str
location of json file
Returns
-------
InferenceData object
InferenceObjectsNetCDF.from_netcdf
— Functionfrom_netcdf(path::AbstractString; kwargs...) -> InferenceData
Load an InferenceData
from an unopened NetCDF file.
Remaining kwargs
are passed to NCDatasets.NCDataset
. This method loads data eagerly. To instead load data lazily, pass an opened NCDataset
to from_netcdf
.
Examples
julia> idata = from_netcdf("centered_eight.nc")
InferenceData with groups:
> posterior
> posterior_predictive
> sample_stats
> prior
> observed_data
from_netcdf(ds::NCDatasets.NCDataset; load_mode) -> InferenceData
Load an InferenceData
from an opened NetCDF file.
load_mode
defaults to :lazy
, which avoids reading variables into memory. Operations on these arrays will be slow. load_mode
can also be :eager
, which copies all variables into memory. It is then safe to close ds
. If load_mode
is :lazy
and ds
is closed after constructing InferenceData
, using the variable arrays will have undefined behavior.
Examples
Here is how we might load an InferenceData
from an InferenceData
lazily from a web-hosted NetCDF file.
julia> using HTTP, NCDatasets
julia> resp = HTTP.get("https://github.com/arviz-devs/arviz_example_data/blob/main/data/centered_eight.nc?raw=true");
julia> ds = NCDataset("centered_eight", "r"; memory = resp.body);
julia> idata = from_netcdf(ds)
InferenceData with groups:
> posterior
> posterior_predictive
> sample_stats
> prior
> observed_data
julia> idata_copy = copy(idata); # disconnect from the loaded dataset
julia> close(ds);
InferenceObjectsNetCDF.to_netcdf
— Functionto_netcdf(data, dest::AbstractString; group::Symbol=:posterior, kwargs...)
to_netcdf(data, dest::NCDatasets.NCDataset; group::Symbol=:posterior)
Write data
to a NetCDF file.
data
is any type that can be converted to an InferenceData
using convert_to_inference_data
. If not an InferenceData
, then group
specifies which group the data represents.
dest
specifies either the path to the NetCDF file or an opened NetCDF file. If dest
is a path, remaining kwargs
are passed to NCDatasets.NCDataset
.
Examples
julia> using NCDatasets
julia> idata = from_namedtuple((; x = randn(4, 100, 3), z = randn(4, 100)))
InferenceData with groups:
> posterior
julia> to_netcdf(idata, "data.nc")
"data.nc"
General functions
ArviZ.concat
— FunctionConcatenate InferenceData objects.
This function is forwarded to Python's arviz.concat
. The docstring of that function is included below.
Concatenates over `group`, `chain` or `draw`.
By default concatenates over unique groups.
To concatenate over `chain` or `draw` function
needs identical groups and variables.
The `variables` in the `data` -group are merged if `dim` are not found.
Parameters
----------
*args : InferenceData
Variable length InferenceData list or
Sequence of InferenceData.
dim : str, optional
If defined, concatenated over the defined dimension.
Dimension which is concatenated. If None, concatenates over
unique groups.
copy : bool
If True, groups are copied to the new InferenceData object.
Used only if `dim` is None.
inplace : bool
If True, merge args to first object.
reset_dim : bool
Valid only if dim is not None.
Returns
-------
InferenceData
A new InferenceData object by default.
When `inplace==True` merge args to first arg and return `None`
See Also
--------
add_groups : Add new groups to InferenceData object.
extend : Extend InferenceData with groups from another InferenceData.
Examples
--------
Use ``concat`` method to concatenate InferenceData objects. This will concatenates over
unique groups by default. We first create an ``InferenceData`` object:
.. ipython::
In [1]: import arviz as az
...: import numpy as np
...: data = {
...: "a": np.random.normal(size=(4, 100, 3)),
...: "b": np.random.normal(size=(4, 100)),
...: }
...: coords = {"a_dim": ["x", "y", "z"]}
...: dataA = az.from_dict(data, coords=coords, dims={"a": ["a_dim"]})
...: dataA
We have created an ``InferenceData`` object with default group 'posterior'. Now, we will
create another ``InferenceData`` object:
.. ipython::
In [1]: dataB = az.from_dict(prior=data, coords=coords, dims={"a": ["a_dim"]})
...: dataB
We have created another ``InferenceData`` object with group 'prior'. Now, we will concatenate
these two ``InferenceData`` objects:
.. ipython::
In [1]: az.concat(dataA, dataB)
Now, we will concatenate over chain (or draw). It requires identical groups and variables.
Here we are concatenating two identical ``InferenceData`` objects over dimension chain:
.. ipython::
In [1]: az.concat(dataA, dataA, dim="chain")
It will create an ``InferenceData`` with the original group 'posterior'. In similar way,
we can also concatenate over draws.
ArviZ.extract
— FunctionExtract an InferenceData group or subset of it.
This function is forwarded to Python's arviz.extract
. The docstring of that function is included below.
Parameters
----------
idata : InferenceData or InferenceData_like
InferenceData from which to extract the data.
group : str, optional
Which InferenceData data group to extract data from.
combined : bool, optional
Combine ``chain`` and ``draw`` dimensions into ``sample``. Won't work if
a dimension named ``sample`` already exists.
var_names : str or list of str, optional
Variables to be extracted. Prefix the variables by `~` when you want to exclude them.
filter_vars: {None, "like", "regex"}, optional
If `None` (default), interpret var_names as the real variables names. If "like",
interpret var_names as substrings of the real variables names. If "regex",
interpret var_names as regular expressions on the real variables names. A la
`pandas.filter`.
Like with plotting, sometimes it's easier to subset saying what to exclude
instead of what to include
num_samples : int, optional
Extract only a subset of the samples. Only valid if ``combined=True``
keep_dataset : bool, optional
If true, always return a DataSet. If false (default) return a DataArray
when there is a single variable.
rng : bool, int, numpy.Generator, optional
Shuffle the samples, only valid if ``combined=True``. By default,
samples are shuffled if ``num_samples`` is not ``None``, and are left
in the same order otherwise. This ensures that subsetting the samples doesn't return
only samples from a single chain and consecutive draws.
Returns
-------
xarray.DataArray or xarray.Dataset
Examples
--------
The default behaviour is to return the posterior group after stacking the chain and
draw dimensions.
.. jupyter-execute::
import arviz as az
idata = az.load_arviz_data("centered_eight")
az.extract(idata)
You can also indicate a subset to be returned, but in variables and in samples:
.. jupyter-execute::
az.extract(idata, var_names="theta", num_samples=100)
To keep the chain and draw dimensions, use ``combined=False``.
.. jupyter-execute::
az.extract(idata, group="prior", combined=False)
Example data
ArviZExampleData.describe_example_data
— Functiondescribe_example_data() -> String
Return a string containing descriptions of all available datasets.
Examples
julia> describe_example_data("radon") |> println
radon
=====
Radon is a radioactive gas that enters homes through contact points with the ground. It is a carcinogen that is the primary cause of lung cancer in non-smokers. Radon levels vary greatly from household to household.
This example uses an EPA study of radon levels in houses in Minnesota to construct a model with a hierarchy over households within a county. The model includes estimates (gamma) for contextual effects of the uranium per household.
See Gelman and Hill (2006) for details on the example, or https://docs.pymc.io/notebooks/multilevel_modeling.html by Chris Fonnesbeck for details on this implementation.
remote: http://ndownloader.figshare.com/files/24067472
ArviZExampleData.load_example_data
— Functionload_example_data(name; kwargs...) -> InferenceObjects.InferenceData
load_example_data() -> Dict{String,AbstractFileMetadata}
Load a local or remote pre-made dataset.
kwargs
are forwarded to InferenceObjects.from_netcdf
.
Pass no parameters to get a Dict
listing all available datasets.
Data files are handled by DataDeps.jl. A file is downloaded only when it is requested and then cached for future use.
Examples
julia> keys(load_example_data())
KeySet for a OrderedCollections.OrderedDict{String, ArviZExampleData.AbstractFileMetadata} with 9 entries. Keys:
"centered_eight"
"non_centered_eight"
"radon"
"rugby"
"regression1d"
"regression10d"
"classification1d"
"classification10d"
"glycan_torsion_angles"
julia> load_example_data("centered_eight")
InferenceData with groups:
> posterior
> posterior_predictive
> log_likelihood
> sample_stats
> prior
> prior_predictive
> observed_data
> constant_data