Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Exploration and Preparation with BigQuery

You're reading from   Data Exploration and Preparation with BigQuery A practical guide to cleaning, transforming, and analyzing data for business insights

Arrow left icon
Product type Paperback
Published in Nov 2023
Publisher Packt
ISBN-13 9781805125266
Length 264 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Mike Kahn Mike Kahn
Author Profile Icon Mike Kahn
Mike Kahn
Arrow right icon
View More author details
Toc

Table of Contents (21) Chapters Close

Preface 1. Part 1: Introduction to BigQuery FREE CHAPTER
2. Chapter 1: Introducing BigQuery and Its Components 3. Chapter 2: BigQuery Organization and Design 4. Part 2: Data Exploration with BigQuery
5. Chapter 3: Exploring Data in BigQuery 6. Chapter 4: Loading and Transforming Data 7. Chapter 5: Querying BigQuery Data 8. Chapter 6: Exploring Data with Notebooks 9. Chapter 7: Further Exploring and Visualizing Data 10. Part 3: Data Preparation with BigQuery
11. Chapter 8: An Overview of Data Preparation Tools 12. Chapter 9: Cleansing and Transforming Data 13. Chapter 10: Best Practices for Data Preparation, Optimization, and Cost Control 14. Part 4: Hands-On and Conclusion
15. Chapter 11: Hands-On Exercise – Analyzing Advertising Data 16. Chapter 12: Hands-On Exercise – Analyzing Transportation Data 17. Chapter 13: Hands-On Exercise – Analyzing Customer Support Data 18. Chapter 14: Summary and Future Directions 19. Index 20. Other Books You May Enjoy

Understanding data distributions

The distribution of a dataset is how the data is spread out or clustered around certain values or ranges. By understanding the distribution of a dataset, we can gain insights into the characteristics and patterns of the data, which can be useful in making informed decisions and predictions. Understanding the distribution of data is important for identifying patterns, detecting outliers, and making informed decisions about how to explore data.

Data distributions can be examined through descriptive statistics, which provide summary measures that help us understand the central tendency, variability, and shape of the data. Measures such as mean, median, mode, range, and standard deviation provide a snapshot of the dataset’s overall characteristics. BigQuery offers SQL functions that allow us to compute these descriptive statistics efficiently. Some of the most commonly used functions are as follows:

  • COUNT(*): This function returns the...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime