Label guide

Basic labelling

All ArviZ plotting functions and some stats functions can take an optional labeller argument. By default, labels show the variable name. Multidimensional variables also show the coordinate value.

Example: Default labelling

In [1]: import arviz as az
   ...: schools = az.load_arviz_data("centered_eight")
   ...: az.summary(schools)
   ...: 
Out[1]: 
                          mean     sd  hdi_3%  ...  ess_bulk  ess_tail  r_hat
mu                       4.093  3.372  -2.118  ...     250.0     643.0   1.03
theta[Choate]            6.026  5.782  -3.707  ...     348.0     743.0   1.02
theta[Deerfield]         4.724  4.736  -4.039  ...     471.0    1018.0   1.02
theta[Phillips Andover]  3.576  5.559  -6.779  ...     463.0     674.0   1.01
theta[Phillips Exeter]   4.478  4.939  -5.528  ...     503.0     666.0   1.01
theta[Hotchkiss]         3.064  4.642  -5.972  ...     380.0     833.0   1.02
theta[Lawrenceville]     3.821  4.979  -5.507  ...     516.0    1104.0   1.02
theta[St. Paul's]        6.250  5.436  -3.412  ...     402.0    1026.0   1.02
theta[Mt. Hermon]        4.544  5.521  -5.665  ...     449.0    1084.0   1.01
tau                      4.089  3.001   0.569  ...      79.0      54.0   1.07

[10 rows x 9 columns]

ArviZ supports label based indexing powered by xarray. Through label based indexing, you can use labels to plot a subset of selected variables.

Example: Label based indexing

For a case where the coordinate values shown for the theta variable coordinate to the school dimension, you can indicate ArviZ to plot tau by including it in the var_names argument to inspect its 1.03 rhat() value. To inspect the theta values for the Choate and St. Paul's coordinates, you can include theta in var_names and use the coords argument to select only these two coordinate values. You can generate this plot with the following command:

In [2]: az.plot_trace(schools, var_names=["tau", "theta"], coords={"school": ["Choate", "St. Paul's"]}, compact=False);
../_images/label_guide_plot_trace.png

Using the above command, you can now identify issues for low tau values.

Example: Using the labeller argument

You can use the labeller argument to customize labels. Unlike the default labels that show theta, not \(\theta\) (generated from $\theta$ using \(\LaTeX\)), the labeller argument presents the labels with proper math notation.

You can use MapLabeller to rename the variable theta to $\theta$, as shown in the following example:

In [3]: import arviz.labels as azl
   ...: labeller = azl.MapLabeller(var_name_map={"theta": r"$\theta$"})
   ...: coords = {"school": ["Deerfield", "Hotchkiss", "Lawrenceville"]}
   ...: 

In [4]: az.plot_posterior(schools, var_names="theta", coords=coords, labeller=labeller, ref_val=5);
../_images/label_guide_plot_posterior.png

See also

For a list of labellers available in ArviZ, see the the API reference page.

Sorting labels

ArviZ allows labels to be sorted in two ways:

  1. Using the arguments passed to ArviZ plotting functions

  2. Sorting the underlying xarray.Dataset

The first option is more suitable for single time ordering whereas the second option is more suitable for sorting plots consistently.

Note

Both ways are limited. Multidimensional variables can not be separated. For example, it is possible to sort theta, mu, or tau in any order, and within theta to sort the schools in any order, but it is not possible to sort half of the schools, then mu and tau and then the rest of the schools.

Sorting variable names

In [5]: var_order = ["theta", "mu", "tau"]

For variable names to appear sorted when calling ArviZ functions, pass a sorted list of the variable names.

In [6]: az.summary(schools, var_names=var_order)
Out[6]: 
                          mean     sd  hdi_3%  ...  ess_bulk  ess_tail  r_hat
theta[Choate]            6.026  5.782  -3.707  ...     348.0     743.0   1.02
theta[Deerfield]         4.724  4.736  -4.039  ...     471.0    1018.0   1.02
theta[Phillips Andover]  3.576  5.559  -6.779  ...     463.0     674.0   1.01
theta[Phillips Exeter]   4.478  4.939  -5.528  ...     503.0     666.0   1.01
theta[Hotchkiss]         3.064  4.642  -5.972  ...     380.0     833.0   1.02
theta[Lawrenceville]     3.821  4.979  -5.507  ...     516.0    1104.0   1.02
theta[St. Paul's]        6.250  5.436  -3.412  ...     402.0    1026.0   1.02
theta[Mt. Hermon]        4.544  5.521  -5.665  ...     449.0    1084.0   1.01
mu                       4.093  3.372  -2.118  ...     250.0     643.0   1.03
tau                      4.089  3.001   0.569  ...      79.0      54.0   1.07

[10 rows x 9 columns]

In xarray, subsetting the Dataset with a sorted list of variable names will order the Dataset.

In [7]: schools.posterior = schools.posterior[var_order]
   ...: az.summary(schools)
   ...: 
Out[7]: 
                          mean     sd  hdi_3%  ...  ess_bulk  ess_tail  r_hat
theta[Choate]            6.026  5.782  -3.707  ...     348.0     743.0   1.02
theta[Deerfield]         4.724  4.736  -4.039  ...     471.0    1018.0   1.02
theta[Phillips Andover]  3.576  5.559  -6.779  ...     463.0     674.0   1.01
theta[Phillips Exeter]   4.478  4.939  -5.528  ...     503.0     666.0   1.01
theta[Hotchkiss]         3.064  4.642  -5.972  ...     380.0     833.0   1.02
theta[Lawrenceville]     3.821  4.979  -5.507  ...     516.0    1104.0   1.02
theta[St. Paul's]        6.250  5.436  -3.412  ...     402.0    1026.0   1.02
theta[Mt. Hermon]        4.544  5.521  -5.665  ...     449.0    1084.0   1.01
mu                       4.093  3.372  -2.118  ...     250.0     643.0   1.03
tau                      4.089  3.001   0.569  ...      79.0      54.0   1.07

[10 rows x 9 columns]

Sorting coordinate values

For sorting coordinate values, first, define the order, then store it, and use the result to sort the coordinate values. You can define the order by creating a list manually or by using xarray objects as illustrated in the below example “Sorting out the schools by mean”.

Example: Sorting the schools by mean

  • Locate the means of each school by using the following command:

In [8]: school_means = schools.posterior["theta"].mean(("chain", "draw"))
   ...: school_means
   ...: 
Out[8]: 
<xarray.DataArray 'theta' (school: 8)>
array([6.02582947, 4.72414999, 3.57636428, 4.47778158, 3.06403605,
       3.82103202, 6.25017863, 4.54440944])
Coordinates:
  * school   (school) object 'Choate' 'Deerfield' ... "St. Paul's" 'Mt. Hermon'
  • You can use the DataArray result to sort the coordinate values for theta.

There are two ways of sorting:

  1. Arviz args

  2. xarray

Sort the coordinate values to pass them as a coords argument and choose the order of the rows.

In [9]: sorted_schools = schools.posterior["school"].sortby(school_means)
   ...: az.summary(schools, var_names="theta", coords={"school": sorted_schools})
   ...: 
Out[9]: 
                          mean     sd  hdi_3%  ...  ess_bulk  ess_tail  r_hat
theta[Hotchkiss]         3.064  4.642  -5.972  ...     380.0     833.0   1.02
theta[Phillips Andover]  3.576  5.559  -6.779  ...     463.0     674.0   1.01
theta[Lawrenceville]     3.821  4.979  -5.507  ...     516.0    1104.0   1.02
theta[Phillips Exeter]   4.478  4.939  -5.528  ...     503.0     666.0   1.01
theta[Mt. Hermon]        4.544  5.521  -5.665  ...     449.0    1084.0   1.01
theta[Deerfield]         4.724  4.736  -4.039  ...     471.0    1018.0   1.02
theta[Choate]            6.026  5.782  -3.707  ...     348.0     743.0   1.02
theta[St. Paul's]        6.250  5.436  -3.412  ...     402.0    1026.0   1.02

[8 rows x 9 columns]

You can use the sortby() method to order our coordinate values directly at the source.

In [10]: schools.posterior = schools.posterior.sortby(school_means)
   ....: az.summary(schools, var_names="theta")
   ....: 
Out[10]: 
                          mean     sd  hdi_3%  ...  ess_bulk  ess_tail  r_hat
theta[Hotchkiss]         3.064  4.642  -5.972  ...     380.0     833.0   1.02
theta[Phillips Andover]  3.576  5.559  -6.779  ...     463.0     674.0   1.01
theta[Lawrenceville]     3.821  4.979  -5.507  ...     516.0    1104.0   1.02
theta[Phillips Exeter]   4.478  4.939  -5.528  ...     503.0     666.0   1.01
theta[Mt. Hermon]        4.544  5.521  -5.665  ...     449.0    1084.0   1.01
theta[Deerfield]         4.724  4.736  -4.039  ...     471.0    1018.0   1.02
theta[Choate]            6.026  5.782  -3.707  ...     348.0     743.0   1.02
theta[St. Paul's]        6.250  5.436  -3.412  ...     402.0    1026.0   1.02

[8 rows x 9 columns]

Sorting dimensions

In some cases, our multidimensional variables may not have only one more dimension (a length n dimension in addition to the chain and draw ones) but could have multiple more dimensions. Let’s imagine we have performed a set of fixed experiments on several days to multiple subjects, three data dimensions overall.

We will create fake inference data with data mimicking this situation to show how to sort dimensions. To keep things short and not clutter the guide too much with unnecessary output lines, we will stick to a posterior of a single variable and the dimension sizes will be 2, 3, 4.

In [11]: from numpy.random import default_rng
   ....: import pandas as pd
   ....: rng = default_rng()
   ....: samples = rng.normal(size=(4, 500, 2, 3, 4))
   ....: coords = {
   ....:     "subject": ["ecoli", "pseudomonas", "clostridium"],
   ....:     "date": ["1-3-2020", "2-4-2020", "1-5-2020", "1-6-2020"],
   ....:     "experiment": [1, 2]
   ....: }
   ....: experiments = az.from_dict(
   ....:     posterior={"b": samples}, dims={"b": ["experiment", "subject", "date"]}, coords=coords
   ....: )
   ....: experiments.posterior
   ....: 
Out[11]: 
<xarray.Dataset>
Dimensions:     (chain: 4, draw: 500, experiment: 2, subject: 3, date: 4)
Coordinates:
  * chain       (chain) int64 0 1 2 3
  * draw        (draw) int64 0 1 2 3 4 5 6 7 ... 492 493 494 495 496 497 498 499
  * experiment  (experiment) int64 1 2
  * subject     (subject) <U11 'ecoli' 'pseudomonas' 'clostridium'
  * date        (date) <U8 '1-3-2020' '2-4-2020' '1-5-2020' '1-6-2020'
Data variables:
    b           (chain, draw, experiment, subject, date) float64 0.6785 ... -...
Attributes:
    created_at:     2021-11-15T19:19:37.819793
    arviz_version:  0.11.4

Given how we have constructed our dataset, the default order is experiment, subject, date.

Click to see the default summary
In [12]: az.summary(experiments)
Out[12]: 
                              mean     sd  hdi_3%  ...  ess_bulk  ess_tail  r_hat
b[1, ecoli, 1-3-2020]       -0.006  0.978  -1.913  ...    1935.0    1811.0    1.0
b[1, ecoli, 2-4-2020]        0.013  1.002  -1.806  ...    1826.0    1876.0    1.0
b[1, ecoli, 1-5-2020]        0.007  1.007  -1.891  ...    1991.0    1949.0    1.0
b[1, ecoli, 1-6-2020]        0.033  1.020  -1.930  ...    1904.0    1884.0    1.0
b[1, pseudomonas, 1-3-2020]  0.037  0.998  -1.843  ...    2099.0    1753.0    1.0
b[1, pseudomonas, 2-4-2020] -0.001  1.001  -1.937  ...    1833.0    1822.0    1.0
b[1, pseudomonas, 1-5-2020] -0.024  1.005  -1.820  ...    1959.0    1881.0    1.0
b[1, pseudomonas, 1-6-2020] -0.015  0.996  -1.875  ...    2186.0    2059.0    1.0
b[1, clostridium, 1-3-2020]  0.002  1.012  -1.853  ...    1910.0    2022.0    1.0
b[1, clostridium, 2-4-2020] -0.034  1.001  -1.964  ...    1969.0    1932.0    1.0
b[1, clostridium, 1-5-2020] -0.020  1.011  -1.845  ...    1960.0    2053.0    1.0
b[1, clostridium, 1-6-2020] -0.009  0.988  -1.795  ...    2071.0    1927.0    1.0
b[2, ecoli, 1-3-2020]        0.027  0.994  -1.736  ...    1849.0    1912.0    1.0
b[2, ecoli, 2-4-2020]       -0.017  1.022  -1.996  ...    1955.0    1882.0    1.0
b[2, ecoli, 1-5-2020]        0.030  1.009  -1.820  ...    1804.0    1929.0    1.0
b[2, ecoli, 1-6-2020]        0.042  0.993  -1.839  ...    2046.0    1917.0    1.0
b[2, pseudomonas, 1-3-2020]  0.007  0.993  -1.703  ...    1872.0    1820.0    1.0
b[2, pseudomonas, 2-4-2020] -0.025  1.005  -1.851  ...    1923.0    1750.0    1.0
b[2, pseudomonas, 1-5-2020] -0.037  1.023  -1.957  ...    2023.0    2015.0    1.0
b[2, pseudomonas, 1-6-2020] -0.023  0.996  -1.839  ...    1821.0    1883.0    1.0
b[2, clostridium, 1-3-2020]  0.009  0.984  -1.892  ...    1981.0    2015.0    1.0
b[2, clostridium, 2-4-2020] -0.014  1.038  -1.997  ...    1928.0    1847.0    1.0
b[2, clostridium, 1-5-2020] -0.016  1.014  -1.841  ...    2012.0    2060.0    1.0
b[2, clostridium, 1-6-2020] -0.006  1.001  -1.936  ...    1883.0    1895.0    1.0

[24 rows x 9 columns]

However, the order we want is: subject, date, experiment. Now, to get the desired result, we need to modify the underlying xarray object.

In [13]: dim_order = ("chain", "draw", "subject", "date", "experiment")

In [14]: experiments = experiments.posterior.transpose(*dim_order)

In [15]: az.summary(experiments)
Out[15]: 
                              mean     sd  hdi_3%  ...  ess_bulk  ess_tail  r_hat
b[ecoli, 1-3-2020, 1]       -0.006  0.978  -1.913  ...    1935.0    1811.0    1.0
b[ecoli, 1-3-2020, 2]        0.027  0.994  -1.736  ...    1849.0    1912.0    1.0
b[ecoli, 2-4-2020, 1]        0.013  1.002  -1.806  ...    1826.0    1876.0    1.0
b[ecoli, 2-4-2020, 2]       -0.017  1.022  -1.996  ...    1955.0    1882.0    1.0
b[ecoli, 1-5-2020, 1]        0.007  1.007  -1.891  ...    1991.0    1949.0    1.0
b[ecoli, 1-5-2020, 2]        0.030  1.009  -1.820  ...    1804.0    1929.0    1.0
b[ecoli, 1-6-2020, 1]        0.033  1.020  -1.930  ...    1904.0    1884.0    1.0
b[ecoli, 1-6-2020, 2]        0.042  0.993  -1.839  ...    2046.0    1917.0    1.0
b[pseudomonas, 1-3-2020, 1]  0.037  0.998  -1.843  ...    2099.0    1753.0    1.0
b[pseudomonas, 1-3-2020, 2]  0.007  0.993  -1.703  ...    1872.0    1820.0    1.0
b[pseudomonas, 2-4-2020, 1] -0.001  1.001  -1.937  ...    1833.0    1822.0    1.0
b[pseudomonas, 2-4-2020, 2] -0.025  1.005  -1.851  ...    1923.0    1750.0    1.0
b[pseudomonas, 1-5-2020, 1] -0.024  1.005  -1.820  ...    1959.0    1881.0    1.0
b[pseudomonas, 1-5-2020, 2] -0.037  1.023  -1.957  ...    2023.0    2015.0    1.0
b[pseudomonas, 1-6-2020, 1] -0.015  0.996  -1.875  ...    2186.0    2059.0    1.0
b[pseudomonas, 1-6-2020, 2] -0.023  0.996  -1.839  ...    1821.0    1883.0    1.0
b[clostridium, 1-3-2020, 1]  0.002  1.012  -1.853  ...    1910.0    2022.0    1.0
b[clostridium, 1-3-2020, 2]  0.009  0.984  -1.892  ...    1981.0    2015.0    1.0
b[clostridium, 2-4-2020, 1] -0.034  1.001  -1.964  ...    1969.0    1932.0    1.0
b[clostridium, 2-4-2020, 2] -0.014  1.038  -1.997  ...    1928.0    1847.0    1.0
b[clostridium, 1-5-2020, 1] -0.020  1.011  -1.845  ...    1960.0    2053.0    1.0
b[clostridium, 1-5-2020, 2] -0.016  1.014  -1.841  ...    2012.0    2060.0    1.0
b[clostridium, 1-6-2020, 1] -0.009  0.988  -1.795  ...    2071.0    1927.0    1.0
b[clostridium, 1-6-2020, 2] -0.006  1.001  -1.936  ...    1883.0    1895.0    1.0

[24 rows x 9 columns]

Note

However, we don’t need to overwrite or store the modified xarray object. Doing az.summary(experiments.posterior.transpose(*dim_order)) would work just the same if we only want to use this order once.

Labeling with indexes

As you may have seen, there are some labellers with Idx in their name: IdxLabeller and DimIdxLabeller. They show the positional index of the values instead of their corresponding coordinate value.

We have seen before that we can use the coords argument or the sel() method to select data based on the coordinate values. Similarly, we can use the isel() method to select data based on positional indexes.

In [16]: az.summary(schools, labeller=azl.IdxLabeller())
Out[16]: 
           mean     sd  hdi_3%  hdi_97%  ...  mcse_sd  ess_bulk  ess_tail  r_hat
theta[0]  3.064  4.642  -5.972   11.547  ...    0.166     380.0     833.0   1.02
theta[1]  3.576  5.559  -6.779   13.838  ...    0.175     463.0     674.0   1.01
theta[2]  3.821  4.979  -5.507   13.232  ...    0.150     516.0    1104.0   1.02
theta[3]  4.478  4.939  -5.528   13.392  ...    0.141     503.0     666.0   1.01
theta[4]  4.544  5.521  -5.665   15.266  ...    0.163     449.0    1084.0   1.01
theta[5]  4.724  4.736  -4.039   13.999  ...    0.142     471.0    1018.0   1.02
theta[6]  6.026  5.782  -3.707   17.337  ...    0.206     348.0     743.0   1.02
theta[7]  6.250  5.436  -3.412   16.920  ...    0.168     402.0    1026.0   1.02
mu        4.093  3.372  -2.118   10.403  ...    0.152     250.0     643.0   1.03
tau       4.089  3.001   0.569    9.386  ...    0.178      79.0      54.0   1.07

[10 rows x 9 columns]

After seeing the above summary, let’s use isel method to generate the summary of a subset only.

In [17]: az.summary(schools.isel(school=[2, 5, 7]), labeller=azl.IdxLabeller())
Out[17]: 
           mean     sd  hdi_3%  hdi_97%  ...  mcse_sd  ess_bulk  ess_tail  r_hat
theta[0]  3.821  4.979  -5.507   13.232  ...    0.150     516.0    1104.0   1.02
theta[1]  4.724  4.736  -4.039   13.999  ...    0.142     471.0    1018.0   1.02
theta[2]  6.250  5.436  -3.412   16.920  ...    0.168     402.0    1026.0   1.02
mu        4.093  3.372  -2.118   10.403  ...    0.152     250.0     643.0   1.03
tau       4.089  3.001   0.569    9.386  ...    0.178      79.0      54.0   1.07

[5 rows x 9 columns]

Warning

Positional indexing is NOT label based indexing with numbers!

The positional indexes shown will correspond to the ordinal position in the subsetted object. If you are not subsetting the object, you can use these indexes with isel without problem. However, if you are subsetting the data (either directly or with the coords argument) and want to use the positional indexes shown, you need to use them on the corresponding subset.

Example: If you use a dict named coords when calling a plotting function, for isel to work it has to be called on original_idata.sel(**coords).isel(<desired positional idxs>) and not on original_idata.isel(<desired positional idxs>).

Labeller mixtures

In some cases, none of the available labellers do the right job. For example, one case where this is bound to happen is with plot_forest(). When setting legend=True it does not really make sense to add the model name to the tick labels. plot_forest knows that, and if no labeller is passed, it uses either BaseLabeller or NoModelLabeller depending on the value of legend. However, if we do want to use the labeller argument, we have to enforce this default ourselves:

In [18]: schools2 = az.load_arviz_data("non_centered_eight")

In [19]: az.plot_forest(
   ....:     (schools, schools2),
   ....:     model_names=("centered", "non_centered"),
   ....:     coords={"school": ["Deerfield", "Lawrenceville", "Mt. Hermon"]},
   ....:     figsize=(10,7),
   ....:     labeller=azl.DimCoordLabeller(),
   ....:     legend=True
   ....: );
   ....: 
../_images/default_plot_forest.png

There is a lot of repeated information now. The variable names, dims and coords are shown for both models. Moreover, the models are labeled both in the legend and in the labels of the y axis. For such cases, ArviZ provides a convenience function mix_labellers() that combines labeller classes for some extra customization.

Labeller classes aim to split labeling into atomic tasks and have a method per task to maximize extensibility. Thus, many new labellers can be created with this mixer function alone without needing to write a new class from scratch. There are more usage examples of mix_labellers() in its docstring page, click on it to go there.

In [20]: MixtureLabeller = azl.mix_labellers((azl.DimCoordLabeller, azl.NoModelLabeller))

In [21]: az.plot_forest(
   ....:     (schools, schools2),
   ....:     model_names=("centered", "non_centered"),
   ....:     coords={"school": ["Deerfield", "Lawrenceville", "Mt. Hermon"]},
   ....:     figsize=(10,7),
   ....:     labeller=MixtureLabeller(),
   ....:     legend=True
   ....: );
   ....: 
../_images/mixture_plot_forest.png

Custom labellers

So far we have managed to customize the labels in the plots without writing a new class from scratch. However, there could be cases where we have to customize our labels further than what these sample labellers allow. In such cases, we have to subclass one of the labellers in arviz.labels and override some of its methods.

One case where we might need to do use this approach is when non indexing coordinates are present. This happens for example after doing pointwise selection on multiple dimensions, but we can also add extra dimensions to our models manually, as shown in TBD. For this example, let’s use pointwise selection. Let’s say one of the variables in the posterior represents a covariance matrix, and we want to keep it as is for other post-processing tasks instead of extracting the sub diagonal triangular matrix with no repeated info as a flattened array. Or any other pointwise selection.

Here is our data:

In [22]: from numpy.random import default_rng

In [23]: import numpy as np

In [24]: import xarray as xr

In [25]: rng = default_rng()

In [26]: cov = rng.normal(size=(4, 500, 3, 3))

In [27]: cov = np.einsum("...ij,...kj", cov, cov)

In [28]: cov[:, :, [0, 1, 2], [0, 1, 2]] = 1

In [29]: subjects = ["ecoli", "pseudomonas", "clostridium"]

In [30]: idata = az.from_dict(
   ....:     {"cov": cov},
   ....:     dims={"cov": ["subject", "subject bis"]},
   ....:     coords={"subject": subjects, "subject bis": subjects}
   ....: )
   ....: 

In [31]: idata.posterior
Out[31]: 
<xarray.Dataset>
Dimensions:      (chain: 4, draw: 500, subject: 3, subject bis: 3)
Coordinates:
  * chain        (chain) int64 0 1 2 3
  * draw         (draw) int64 0 1 2 3 4 5 6 7 ... 493 494 495 496 497 498 499
  * subject      (subject) <U11 'ecoli' 'pseudomonas' 'clostridium'
  * subject bis  (subject bis) <U11 'ecoli' 'pseudomonas' 'clostridium'
Data variables:
    cov          (chain, draw, subject, subject bis) float64 1.0 ... 1.0
Attributes:
    created_at:     2021-11-15T19:19:40.031040
    arviz_version:  0.11.4

To select a non rectangular slice with xarray and to get the result flattened and without NaNs, we can use DataArray s indexed with a dimension that is not present in our current dataset:

In [32]: coords = {
   ....:     'subject': xr.DataArray(
   ....:         ["ecoli", "ecoli", "pseudomonas"], dims=['pointwise_sel']
   ....:     ),
   ....:     'subject bis': xr.DataArray(
   ....:         ["pseudomonas", "clostridium", "clostridium"], dims=['pointwise_sel']
   ....:     )
   ....: }
   ....: 

In [33]: idata.posterior.sel(coords)
Out[33]: 
<xarray.Dataset>
Dimensions:      (chain: 4, draw: 500, pointwise_sel: 3)
Coordinates:
  * chain        (chain) int64 0 1 2 3
  * draw         (draw) int64 0 1 2 3 4 5 6 7 ... 493 494 495 496 497 498 499
    subject      (pointwise_sel) <U11 'ecoli' 'ecoli' 'pseudomonas'
    subject bis  (pointwise_sel) <U11 'pseudomonas' 'clostridium' 'clostridium'
Dimensions without coordinates: pointwise_sel
Data variables:
    cov          (chain, draw, pointwise_sel) float64 -0.06554 1.949 ... 2.689
Attributes:
    created_at:     2021-11-15T19:19:40.031040
    arviz_version:  0.11.4

We see now that subject and subject bis are no longer indexing coordinates, and therefore won’t be available to the labeller:

In [34]: az.plot_posterior(idata, coords=coords);
../_images/default_plot_posterior.png

To get around this limitation, we will store the coords used for pointwise selection as a Dataset. We will pass this Dataset to the labeller so it can use the info it has available (pointwise_sel and its position in this case) to subset this coords Dataset and use that instead to label. One option is to format these non-indexing coordinates as a dictionary whose keys are dimension names and values are coordinate labels and pass that to the parent’s sel_to_str method:

In [35]: coords_ds = xr.Dataset(coords)

In [36]: class NonIdxCoordLabeller(azl.BaseLabeller):
   ....:     """Use non indexing coordinates as labels."""
   ....:     def __init__(self, coords_ds):
   ....:         self.coords_ds = coords_ds
   ....:     def sel_to_str(self, sel, isel):
   ....:         new_sel = {k: v.values for k, v in self.coords_ds.sel(sel).items()}
   ....:         return super().sel_to_str(new_sel, new_sel)
   ....: 

In [37]: labeller = NonIdxCoordLabeller(coords_ds)

In [38]: az.plot_posterior(idata, coords=coords, labeller=labeller);
../_images/custom_plot_posterior1.png

This has the following advantages:

  • It requires very little extra code.

  • It allows to combine our newly created NonIdxCoordLabeller with other labellers as we did in the previous section.

Another option is to go for a much more customized look, and handle everything on make_label_vert() to get labels like “Correlation between subjects x and y”.

In [39]: class NonIdxCoordLabeller(azl.BaseLabeller):
   ....:     """Use non indexing coordinates as labels."""
   ....:     def __init__(self, coords_ds):
   ....:         self.coords_ds = coords_ds
   ....:     def make_label_vert(self, var_name, sel, isel):
   ....:         coords_ds_subset = self.coords_ds.sel(sel)
   ....:         subj = coords_ds_subset["subject"].values
   ....:         subj_bis = coords_ds_subset["subject bis"].values
   ....:         return f"Correlation between subjects\n{subj} & {subj_bis}"
   ....: 

In [40]: labeller = NonIdxCoordLabeller(coords_ds)

In [41]: az.plot_posterior(idata, coords=coords, labeller=labeller);
../_images/custom_plot_posterior2.png

This won’t combine properly with other labellers, but it serves its function and achieves complete customization of the labels, so we probably won’t want to combine it with other labellers either. The main drawback is that we have only overridden make_label_vert, so functions like plot_forest or summary who use make_label_flat() will still fall back to the methods defined by BaseLabeller.