Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Bayesian Analysis with Python

You're reading from   Bayesian Analysis with Python Introduction to statistical modeling and probabilistic programming using PyMC3 and ArviZ

Arrow left icon
Product type Paperback
Published in Dec 2018
Publisher Packt
ISBN-13 9781789341652
Length 356 pages
Edition 2nd Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Osvaldo Martin Osvaldo Martin
Author Profile Icon Osvaldo Martin
Osvaldo Martin
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Thinking Probabilistically 2. Programming Probabilistically FREE CHAPTER 3. Modeling with Linear Regression 4. Generalizing Linear Models 5. Model Comparison 6. Mixture Models 7. Gaussian Processes 8. Inference Engines 9. Where To Go Next?
10. Other Books You May Enjoy

Communicating a Bayesian analysis

Creating reports and communicating results is central to the practice of statistics and data science. In this section, we will briefly discuss some of the peculiarities of this task when working with Bayesian models. In future chapters, we will keep looking at examples about this important matter.

Model notation and visualization

If you want to communicate the results of an analysis, you should also communicate the model you used. A common notation to succinctly represent probabilistic models is:

This is just the model we use for the coin-flip example. As you may remember, the symbol indicates that the variable, on the left of it, is a random variable distributed according to the distribution on the right. In many contexts, this symbol is used to indicate that a variable takes approximately some value, but when talking about probabilistic models, we will read this symbol out loud saying is distributed as. Thus, we can say is distributed as a beta distribution with and parameters, and is distributed as a binomial with and parameters. The very same model can be represented graphically using Kruschke's diagrams:

Figure 1.6

On the first level, we have the prior that generates the values for , then the likelihood, and on the last line the data, . Arrows indicate the relationship between variables, and the symbol indicates the stochastic nature of the variables. All Kruschke's diagrams in the book were made using the templates provided by Rasmus Bååth (http://www.sumsar.net/blog/2013/10/diy-kruschke-style-diagrams/).

Summarizing the posterior

The result of a Bayesian analysis is a posterior distribution, and all the information about the parameters given a dataset and a model is contained in the posterior distribution. Thus, by summarizing the posterior, we are summarizing the logical consequences of a model and data. A common practice is to report, for each parameter, the mean (or mode or median) to have an idea of the location of the distribution and some measure, such as the standard deviation, to have an idea of the dispersion and hence the uncertainty in our estimate. The standard deviation works well for normal-like distributions but can be misleading for other type of distributions, such as skewed ones. So, an alternative is to use the following measure.

Highest-posterior density

A commonly-used device to summarize the spread of a posterior distribution is to use a Highest-Posterior Density (HPD) interval. An HPD is the shortest interval containing a given portion of the probability density. One of the most commonly-used is the 95% HPD, often accompanied by the 50% HPD. If we say that the 95% HPD for some analysis is [2-5], we mean that according to our data and model, we think the parameter in question is between 2 and 5 with a probability of 0.95.

There is nothing special about choosing 95%, 50%, or any other value. They are just arbitrary commonly-used values; we are free to choose the 91.37% HPD interval if we like. If you want to use the 95% value, that's OK; just remember it is a default value. Ideally, justifications should be context-dependent and not automatic.

ArviZ is a Python package for exploratory data analysis for Bayesian models. ArviZ has many functions to help us summarize the posterior, for example, az.plot_posterior can be used to generate a plot with the mean and HPD of a distribution. In the following example, instead of a posterior from a real analysis, we are generating a random sample from a beta distribution:

np.random.seed(1)
az.plot_posterior({'θ':stats.beta.rvs(5, 11, size=1000)})
Figure 1.7

Note that in Figure 1.7, the reported HPD is 94%. This is a friendly remainder of the arbitrary nature of the 95% value. Every time ArviZ computes and reports a HPD, it will use, by default, a value of 0.94 (corresponding to 94%). You can change this by passing a different value to the credible_interval argument.

If you are familiar with the frequentist paradigm, please note that HPD intervals are not the same as confidence intervals. The HPD has a very intuitive interpretation, to the point that people often misinterpret frequentist confidence intervals as if they were Bayesian credible intervals. Performing a fully-Bayesian analysis enables us to talk about the probability of a parameter having some value. This is not possible in the frequentist framework since parameters are fixed by design; a frequentist confidence interval contains or does not contain the true value of a parameter.
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image