You're reading from Bayesian Analysis with Python A practical guide to probabilistic modeling

Product type Paperback

Published in Jan 2024

Publisher Packt

ISBN-13 9781805127161

Length 394 pages

Edition 3rd Edition

Languages

Python

Tools

PyCharm

Concepts

Machine Learning

Author (1):

Osvaldo Martin

View More author details

Table of Contents (15) Chapters

Preface

1. Chapter 1 Thinking Probabilistically

2. Chapter 2 Programming Probabilistically FREE CHAPTER

3. Chapter 3 Hierarchical Models

4. Chapter 4 Modeling with Lines

5. Chapter 5 Comparing Models

6. Chapter 6 Modeling with Bambi

7. Chapter 7 Mixture Models

8. Chapter 8 Gaussian Processes

9. Chapter 9 Bayesian Additive Regression Trees

10. Chapter 10 Inference Engines

11. Chapter 11 Where to Go Next

Join our community Discord space

12. Bibliography

13. Other Books You May Enjoy

14. Index

1.9 Communicating a Bayesian analysis

Creating reports and communicating results is central to the practice of statistics and data science. In this section, we will briefly discuss some of the peculiarities of this task when working with Bayesian models. In future chapters, we will keep looking at examples of this important matter.

1.9.1 Model notation and visualization

If you want to communicate the results of an analysis, you should also communicate the model you used. A common notation to succinctly represent probabilistic models is:

	θ ∼ Beta(α,β)
	y ∼ Bin(n = 1,p = θ)

This is just the model we use for the coin-flip example. As you may remember, the ∼ symbol indicates that the variable on the left of it is a random variable distributed according to the distribution on the right. In many contexts, this symbol is used to indicate that a variable takes approximately some value, but when talking about probabilistic models, we will read this symbol out loud, saying is distributed as. Thus, we can say θ is distributed as a Beta with parameters α and β, and y is distributed as a Binomial with parameters n = 1 and p = θ. The very same model can be represented graphically using Kruschke diagrams as in Figure 1.14.

Figure 1.14: A Kruschke diagram of a BetaBinomial model

On the first level, we have the prior that generates the values for θ, then the likelihood, and on the last line, the data, y. Arrows indicate the relationship between variables and the symbol ∼ indicates the stochastic nature of the variables. All Kruschke diagrams in the book were made using the templates provided by Rasmus Bååth ( http://www.sumsar.net/blog/2013/10/diy-kruschke-style-diagrams/).

1.9.2 Summarizing the posterior

The result of a Bayesian analysis is a posterior distribution, and all the information about the parameters (given a model and dataset) is contained in the posterior distribution. Thus, by summarizing the posterior, we are summarizing the logical consequences of a model and data. A common practice is to report, for each parameter, the mean (or mode or median) to have an idea of the location of the distribution and some measure of dispersion, such as the standard deviation, to have an idea of uncertainty in our estimates. The standard deviation works well for Normal-like distributions but can be misleading for other types of distributions, such as skewed ones.

A commonly used device to summarize the spread of a posterior distribution is to use a Highest-Density Interval (HDI). An HDI is the shortest interval containing a given portion of the probability density. If we say that the 95% HDI for some analysis is [2,5], we mean that according to our data and model, the parameter in question is between 2 and 5 with a probability of 0.95. There is nothing special about choosing 95%, 50%, or any other value. We are free to choose the 82% HDI interval if we like. Ideally, justifications should be context-dependent and not automatic, but it is okay to settle on some common value like 95%. As a friendly reminder of the arbitrary nature of this choice, the ArviZ default is 94%.

ArviZ is a Python package for exploratory analysis of Bayesian models, and it has many functions to help us summarize the posterior. One of those functions is az.plot_posterior, which we can use to generate a plot with the mean and HDI of θ. The distribution does not need to be a posterior distribution; any distribution will work. Figure 1.15 shows the result for a random sample from a Beta distribution:

Code 1.8

np.random.seed(1) 
az.plot_posterior({'θ':pz.Beta(4, 12).rvs(1000)})

Figure 1.15: A KDE of a sample from a Beta distribution with its mean and 94% HDI

Not Confidence Intervals

If you are familiar with the frequentist paradigm, please note that HDIs are not the same as confidence intervals. In the frequentist framework, parameters are fixed by design; a frequentist confidence interval either contains or does not contain the true value of a parameter. In the Bayesian framework, parameters are random variables, and thus we can talk about the probability of a parameter having specific values or being inside some interval. The unintuitive nature of confident intervals makes them easily misinterpreted and people often talk about frequentist confidence intervals as if they were Bayesian credible intervals.

The rest of the chapter is locked

You're reading from Bayesian Analysis with Python A practical guide to probabilistic modeling

Table of Contents (15) Chapters

1.9 Communicating a Bayesian analysis

1.9.1 Model notation and visualization

1.9.2 Summarizing the posterior

Authors (1)

Personalised recommendations for you

You're reading from Bayesian Analysis with Python A practical guide to probabilistic modeling

Table of Contents (15) Chapters

1.9 Communicating a Bayesian analysis

1.9.1 Model notation and visualization

1.9.2 Summarizing the posterior

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you