You're reading from Data Exploration and Preparation with BigQuery A practical guide to cleaning, transforming, and analyzing data for business insights

Product type Paperback

Published in Nov 2023

Publisher Packt

ISBN-13 9781805125266

Length 264 pages

Edition 1st Edition

Languages

SQL

Tools

BigQuery

Concepts

Data Analysis

Author (1):

Mike Kahn

View More author details

Table of Contents (21) Chapters

Preface

1. Part 1: Introduction to BigQuery FREE CHAPTER

2. Chapter 1: Introducing BigQuery and Its Components

3. Chapter 2: BigQuery Organization and Design

4. Part 2: Data Exploration with BigQuery

5. Chapter 3: Exploring Data in BigQuery

6. Chapter 4: Loading and Transforming Data

7. Chapter 5: Querying BigQuery Data

8. Chapter 6: Exploring Data with Notebooks

9. Chapter 7: Further Exploring and Visualizing Data

10. Part 3: Data Preparation with BigQuery

11. Chapter 8: An Overview of Data Preparation Tools

12. Chapter 9: Cleansing and Transforming Data

13. Chapter 10: Best Practices for Data Preparation, Optimization, and Cost Control

14. Part 4: Hands-On and Conclusion

15. Chapter 11: Hands-On Exercise – Analyzing Advertising Data

16. Chapter 12: Hands-On Exercise – Analyzing Transportation Data

17. Chapter 13: Hands-On Exercise – Analyzing Customer Support Data

18. Chapter 14: Summary and Future Directions

19. Index

Why subscribe?

20. Other Books You May Enjoy

Understanding data distributions

The distribution of a dataset is how the data is spread out or clustered around certain values or ranges. By understanding the distribution of a dataset, we can gain insights into the characteristics and patterns of the data, which can be useful in making informed decisions and predictions. Understanding the distribution of data is important for identifying patterns, detecting outliers, and making informed decisions about how to explore data.

Data distributions can be examined through descriptive statistics, which provide summary measures that help us understand the central tendency, variability, and shape of the data. Measures such as mean, median, mode, range, and standard deviation provide a snapshot of the dataset’s overall characteristics. BigQuery offers SQL functions that allow us to compute these descriptive statistics efficiently. Some of the most commonly used functions are as follows: