You're reading from Bayesian Analysis with Python A practical guide to probabilistic modeling

Product type Paperback

Published in Jan 2024

Publisher Packt

ISBN-13 9781805127161

Length 394 pages

Edition 3rd Edition

Languages

Python

Tools

PyCharm

Concepts

Machine Learning

Author (1):

Osvaldo Martin

View More author details

Table of Contents (15) Chapters

Preface

1. Chapter 1 Thinking Probabilistically FREE CHAPTER

2. Chapter 2 Programming Probabilistically

3. Chapter 3 Hierarchical Models

4. Chapter 4 Modeling with Lines

5. Chapter 5 Comparing Models

6. Chapter 6 Modeling with Bambi

7. Chapter 7 Mixture Models

8. Chapter 8 Gaussian Processes

9. Chapter 9 Bayesian Additive Regression Trees

10. Chapter 10 Inference Engines

11. Chapter 11 Where to Go Next

Join our community Discord space

12. Bibliography

13. Other Books You May Enjoy

14. Index

1.2 Working with data

Data is an essential ingredient in statistics and data science. Data comes from several sources, such as experiments, computer simulations, surveys, and field observations. If we are the ones in charge of generating or gathering the data, it is always a good idea to first think carefully about the questions we want to answer and which methods we will use, and only then proceed to get the data. There is a whole branch of statistics dealing with data collection, known as experimental design. In the era of the data deluge, we can sometimes forget that gathering data is not always cheap. For example, while it is true that the Large Hadron Collider (LHC) produces hundreds of terabytes a day, its construction took years of manual and intellectual labor.

As a general rule, we can think of the process of generating the data as stochastic, because there is ontological, technical, and/or epistemic uncertainty, that is, the system is intrinsically stochastic, there are technical issues adding noise or restricting us from measuring with arbitrary precision, and/or there are conceptual limitations veiling details from us. For all these reasons, we always need to interpret data in the context of models, including mental and formal ones. Data does not speak but through models.

In this book, we will assume that we already have collected the data. Our data will also be clean and tidy, something that’s rarely true in the real world. We will make these assumptions to focus on the subject of this book. I just want to emphasize, especially for newcomers to data analysis, that even when not covered in this book, there are important skills that you should learn and practice to successfully work with data.

A very useful skill when analyzing data is knowing how to write code in a programming language, such as Python. Manipulating data is usually necessary given that we live in a messy world with even messier data, and coding helps to get things done. Even if you are lucky and your data is very clean and tidy, coding will still be very useful since modern Bayesian statistics is done mostly through programming languages such as Python or R. If you want to learn how to use Python for cleaning and manipulating data, you can find a good introduction in Python for Data Analysis by McKinney [2022].

You're reading from Bayesian Analysis with Python A practical guide to probabilistic modeling

Table of Contents (15) Chapters

1.2 Working with data

Authors (1)

Personalised recommendations for you