Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Bayesian Analysis with Python

You're reading from   Bayesian Analysis with Python A practical guide to probabilistic modeling

Arrow left icon
Product type Paperback
Published in Jan 2024
Publisher Packt
ISBN-13 9781805127161
Length 394 pages
Edition 3rd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Osvaldo Martin Osvaldo Martin
Author Profile Icon Osvaldo Martin
Osvaldo Martin
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface
1. Chapter 1 Thinking Probabilistically 2. Chapter 2 Programming Probabilistically FREE CHAPTER 3. Chapter 3 Hierarchical Models 4. Chapter 4 Modeling with Lines 5. Chapter 5 Comparing Models 6. Chapter 6 Modeling with Bambi 7. Chapter 7 Mixture Models 8. Chapter 8 Gaussian Processes 9. Chapter 9 Bayesian Additive Regression Trees 10. Chapter 10 Inference Engines 11. Chapter 11 Where to Go Next 12. Bibliography
13. Other Books You May Enjoy
14. Index

1.9 Communicating a Bayesian analysis

Creating reports and communicating results is central to the practice of statistics and data science. In this section, we will briefly discuss some of the peculiarities of this task when working with Bayesian models. In future chapters, we will keep looking at examples of this important matter.

1.9.1 Model notation and visualization

If you want to communicate the results of an analysis, you should also communicate the model you used. A common notation to succinctly represent probabilistic models is:

θ Beta(α,β)
y Bin(n = 1,p = θ)

This is just the model we use for the coin-flip example. As you may remember, the symbol indicates that the variable on the left of it is a random variable distributed according to the distribution on the right. In many contexts, this symbol is used to indicate that a variable takes approximately some value, but when talking about probabilistic models, we will read this symbol out loud, saying is distributed as. Thus, we can say θ is distributed as a Beta with parameters α and β, and y is distributed as a Binomial with parameters n = 1 and p = θ. The very same model can be represented graphically using Kruschke diagrams as in Figure 1.14.

PIC

Figure 1.14: A Kruschke diagram of a BetaBinomial model

On the first level, we have the prior that generates the values for θ, then the likelihood, and on the last line, the data, y. Arrows indicate the relationship between variables and the symbol indicates the stochastic nature of the variables. All Kruschke diagrams in the book were made using the templates provided by Rasmus Bååth ( http://www.sumsar.net/blog/2013/10/diy-kruschke-style-diagrams/).

1.9.2 Summarizing the posterior

The result of a Bayesian analysis is a posterior distribution, and all the information about the parameters (given a model and dataset) is contained in the posterior distribution. Thus, by summarizing the posterior, we are summarizing the logical consequences of a model and data. A common practice is to report, for each parameter, the mean (or mode or median) to have an idea of the location of the distribution and some measure of dispersion, such as the standard deviation, to have an idea of uncertainty in our estimates. The standard deviation works well for Normal-like distributions but can be misleading for other types of distributions, such as skewed ones.

A commonly used device to summarize the spread of a posterior distribution is to use a Highest-Density Interval (HDI). An HDI is the shortest interval containing a given portion of the probability density. If we say that the 95% HDI for some analysis is [2,5], we mean that according to our data and model, the parameter in question is between 2 and 5 with a probability of 0.95. There is nothing special about choosing 95%, 50%, or any other value. We are free to choose the 82% HDI interval if we like. Ideally, justifications should be context-dependent and not automatic, but it is okay to settle on some common value like 95%. As a friendly reminder of the arbitrary nature of this choice, the ArviZ default is 94%.

ArviZ is a Python package for exploratory analysis of Bayesian models, and it has many functions to help us summarize the posterior. One of those functions is az.plot_posterior, which we can use to generate a plot with the mean and HDI of θ. The distribution does not need to be a posterior distribution; any distribution will work. Figure 1.15 shows the result for a random sample from a Beta distribution:

Code 1.8

np.random.seed(1) 
az.plot_posterior({'θ':pz.Beta(4, 12).rvs(1000)})
PIC

Figure 1.15: A KDE of a sample from a Beta distribution with its mean and 94% HDI

Not Confidence Intervals

If you are familiar with the frequentist paradigm, please note that HDIs are not the same as confidence intervals. In the frequentist framework, parameters are fixed by design; a frequentist confidence interval either contains or does not contain the true value of a parameter. In the Bayesian framework, parameters are random variables, and thus we can talk about the probability of a parameter having specific values or being inside some interval. The unintuitive nature of confident intervals makes them easily misinterpreted and people often talk about frequentist confidence intervals as if they were Bayesian credible intervals.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime