Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Data Exploration and Preparation with BigQuery

You're reading from   Data Exploration and Preparation with BigQuery A practical guide to cleaning, transforming, and analyzing data for business insights

Arrow left icon
Product type Paperback
Published in Nov 2023
Publisher Packt
ISBN-13 9781805125266
Length 264 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Mike Kahn Mike Kahn
Author Profile Icon Mike Kahn
Mike Kahn
Arrow right icon
View More author details
Toc

Table of Contents (21) Chapters Close

Preface 1. Part 1: Introduction to BigQuery FREE CHAPTER
2. Chapter 1: Introducing BigQuery and Its Components 3. Chapter 2: BigQuery Organization and Design 4. Part 2: Data Exploration with BigQuery
5. Chapter 3: Exploring Data in BigQuery 6. Chapter 4: Loading and Transforming Data 7. Chapter 5: Querying BigQuery Data 8. Chapter 6: Exploring Data with Notebooks 9. Chapter 7: Further Exploring and Visualizing Data 10. Part 3: Data Preparation with BigQuery
11. Chapter 8: An Overview of Data Preparation Tools 12. Chapter 9: Cleansing and Transforming Data 13. Chapter 10: Best Practices for Data Preparation, Optimization, and Cost Control 14. Part 4: Hands-On and Conclusion
15. Chapter 11: Hands-On Exercise – Analyzing Advertising Data 16. Chapter 12: Hands-On Exercise – Analyzing Transportation Data 17. Chapter 13: Hands-On Exercise – Analyzing Customer Support Data 18. Chapter 14: Summary and Future Directions 19. Index 20. Other Books You May Enjoy

Data preparation

By examining Figure 12.7, you can see that many of the columns in our dataset are of the data type FLOAT. While FLOAT is a legacy SQL data type, the GoogleSQL modern datatype is FLOAT64. FLOAT64 provides higher precision than FLOAT as FLOAT64 uses 64 bits to represent floating-point numbers, while FLOAT uses 32 bits.

For this hands-on example, we will leave most of the FLOAT and other data types and only modify the TIME column. We will be able to gain the insights we need from our dataset by leaving most of the data types as they are.

Figure 12.8 – Previewing our GPS data set to examine the TIME column

Note in Figure 12.7 that the TIME column is FLOAT, has a decimal point, and is not very readable. Upon loading, the TIME column is formatted YYYYMMDDHHMMSS (14 digits). To prepare our dataset, we will convert this column into YYYY-MM-DD format (8 digits).

To convert the TIME column, we will use BigQuery Data Definition Language...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image