Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Data Engineering with Google Cloud Platform
Data Engineering with Google Cloud Platform

Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud , Second Edition

Arrow left icon
Profile Icon Adi Wijaya
Arrow right icon
Free Trial
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5 (6 Ratings)
Paperback Apr 2024 476 pages 2nd Edition
eBook
zł39.99 zł135.99
Paperback
zł169.99
Subscription
Free Trial
Arrow left icon
Profile Icon Adi Wijaya
Arrow right icon
Free Trial
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5 (6 Ratings)
Paperback Apr 2024 476 pages 2nd Edition
eBook
zł39.99 zł135.99
Paperback
zł169.99
Subscription
Free Trial
eBook
zł39.99 zł135.99
Paperback
zł169.99
Subscription
Free Trial

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Data Engineering with Google Cloud Platform

Fundamentals of Data Engineering

Years ago, when I initially entered the world of data analytics, I used to think data was clean – clean in terms of readiness and neatly organized. I was so excited to experiment with machine learning models, find unusual patterns in data, and play around with clean data. But after years of experience working with data, I realized that data analytics in big organizations isn’t straightforward.

Most of the effort goes into collecting, cleaning, and transforming the data. If you have had any experience in working with data, I am sure you’ve noticed something similar. But the good news is that we know that all processes can be automated using proper planning, designing, and engineering skills. That was the point where I realized that data engineering would be the most critical role in the future of the data science world.

To develop a successful data ecosystem in any organization, the most crucial part is how they design the...

Understanding the data life cycle

Understanding the data life cycle is the first principle in becoming a data engineer. If you’ve worked with data, you must know that data doesn’t stay in one place; it moves from one storage to another, from one database to another database. Understanding the data life cycle means you need to be able to answer these sorts of questions if you want to display information to your end user:

  • Who will consume the data?
  • What data sources should I use?
  • Where should I store the data?
  • When should the data arrive?
  • Why does the data need to be stored in this place?
  • How should the data be processed?

To answer all those questions, we’ll start by looking back a little bit at the history of data technologies.

Understanding the need for a data warehouse

Data warehouse is not a new concept; I believe you’ve at least heard of it. In fact, this terminology is no longer appealing. In my experience, no...

Start with knowing the roles of a data engineer

In the later chapters, we will spend much of our time doing practical exercises to understand data engineering concepts. But before that, let’s quickly take a look at the data engineer role.

The job role is getting more and more popular now, but the terminology itself is relatively new compared to well-established job roles, such as accountant, lawyer, and doctor. The impact is that sometimes there is still a debate about what a data engineer should and shouldn’t do.

For example, if you came to a hospital and met a doctor, you know for sure that the doctor would do the following:

  1. Examine your condition.
  2. Make a diagnosis of your health issues.
  3. Prescribe medicine.

The doctor wouldn’t do the following:

  1. Clean the hospital.
  2. Make the medicine.
  3. Manage hospital administration.

It’s clear, and it applies to most well-established job roles. But how about data engineers...

Going through the foundational concepts for data engineering

Even though there are many data engineering concepts that we will learn throughout the book by using Google Cloud Platform (GCP), there are some basic concepts that you need to know as data engineers. In my experience of interviewing in data companies, I discovered that these foundational concepts are often asked to assess how much you know about data engineering. Take the following examples:

  • What is ETL?
  • What’s the difference between ETL and Extract, Load, and Transform (ELT)?
  • What is big data?
  • How do you handle large volumes of data?

These questions are quite common, yet particularly important to deeply understand the concepts since they may affect our decisions on architecting our data life cycles.

ETL concept in data engineering

ETL is the key foundation of data engineering. Everything in the data life cycle is ETL; any part that happens from upstream to downstream is ETL. Let&...

Summary

As a summary of the first chapter, we’ve learned the fundamental knowledge we need as data engineers. Here are some key takeaways from this chapter. First, data doesn’t stay in one place. Data moves from one place to another, called the data life cycle. We also understand that data in a big organization is mostly in silos, and we can solve these data silos using the concepts of a data warehouse and data lake.

As someone who has started to look into data engineer roles, you may be a little bit lost. The role of data engineers may vary. The key takeaway is not to be confused about the broad expectations in the market. First, you should focus on the core and then expand as you get more experience from the core. In this chapter, we’ve learned what the core of a data engineer is. At the end of the chapter, we learned some of the key concepts. There are three key concepts as a data engineer that you need to be familiar with. These concepts are ETL, big data...

Exercise

You are a data engineer at a book publishing company and your product manager has asked you to build a dashboard to show the total revenue and customer satisfaction index in a single dashboard.

Your company doesn’t have any data infrastructure yet, but you know that your company has these three applications that contain TBs of data:

  • The company website
  • A book sales application using MongoDB to store sales transactions, including transactions, book IDs, and author IDs
  • An author portal application using a MySQL Database to store authors’ personal information, including age

Do the following:

  1. List down important follow-up questions for your manager
  2. List down your technical thinking process of how to do it at a high level
  3. Draw a data pipeline architecture

There is no right or wrong answer to this practice. The important thing is that you can imagine how the data flows from upstream to downstream, how it should be processed...

Further Reading

You can visit the following links to explore more about the topics discussed in this chapter:

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Get up to speed with data governance on Google Cloud
  • Learn how to use various Google Cloud products like Dataform, DLP, Dataplex, Dataproc Serverless, and Datastream
  • Boost your confidence by getting Google Cloud data engineering certification guidance from real exam experiences
  • Purchase of the print or Kindle book includes a free PDF eBook

Description

The second edition of Data Engineering with Google Cloud builds upon the success of the first edition by offering enhanced clarity and depth to data professionals navigating the intricate landscape of data engineering. Beyond its foundational lessons, this new edition delves into the essential realm of data governance within Google Cloud, providing you with invaluable insights into managing and optimizing data resources effectively. Written by a Data Strategic Cloud Engineer at Google, this book helps you stay ahead of the curve by guiding you through the latest technological advancements in the Google Cloud ecosystem. You’ll cover essential aspects, from exploring Cloud Composer 2 to the evolution of Airflow 2.5. Additionally, you’ll explore how to work with cutting-edge tools like Dataform, DLP, Dataplex, Dataproc Serverless, and Datastream to perform data governance on datasets. By the end of this book, you'll be equipped to navigate the ever-evolving world of data engineering on Google Cloud, from foundational principles to cutting-edge practices.

Who is this book for?

Data analysts, IT practitioners, software engineers, or any data enthusiasts looking to have a successful data engineering career will find this book invaluable. Additionally, experienced data professionals who want to start using Google Cloud to build data platforms will get clear insights on how to navigate the path. Whether you're a beginner who wants to explore the fundamentals or a seasoned professional seeking to learn the latest data engineering concepts, this book is for you.

What you will learn

  • Load data into BigQuery and materialize its output
  • Focus on data pipeline orchestration using Cloud Composer
  • Formulate Airflow jobs to orchestrate and automate a data warehouse
  • Establish a Hadoop data lake, generate ephemeral clusters, and execute jobs on the Dataproc cluster
  • Harness Pub/Sub for messaging and ingestion for event-driven systems
  • Apply Dataflow to conduct ETL on streaming data
  • Implement data governance services on Google Cloud

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Apr 30, 2024
Length: 476 pages
Edition : 2nd
Language : English
ISBN-13 : 9781835080115
Vendor :
Google
Category :
Languages :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Apr 30, 2024
Length: 476 pages
Edition : 2nd
Language : English
ISBN-13 : 9781835080115
Vendor :
Google
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just zł20 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just zł20 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 513.97
Database Design and Modeling with Google Cloud
zł141.99
Google Machine Learning and Generative AI for Solutions Architects
zł201.99
Data Engineering with Google Cloud Platform
zł169.99
Total 513.97 Stars icon
Banner background image

Table of Contents

18 Chapters
Part 1: Getting Started with Data Engineering with GCP Chevron down icon Chevron up icon
Chapter 1: Fundamentals of Data Engineering Chevron down icon Chevron up icon
Chapter 2: Big Data Capabilities on GCP Chevron down icon Chevron up icon
Part 2: Build Solutions with GCP Components Chevron down icon Chevron up icon
Chapter 3: Building a Data Warehouse in BigQuery Chevron down icon Chevron up icon
Chapter 4: Building Workflows for Batch Data Loading Using Cloud Composer Chevron down icon Chevron up icon
Chapter 5: Building a Data Lake Using Dataproc Chevron down icon Chevron up icon
Chapter 6: Processing Streaming Data with Pub/Sub and Dataflow Chevron down icon Chevron up icon
Chapter 7: Visualizing Data to Make Data-Driven Decisions with Looker Studio Chevron down icon Chevron up icon
Chapter 8: Building Machine Learning Solutions on GCP Chevron down icon Chevron up icon
Part 3: Key Strategies for Architecting Top-Notch Solutions Chevron down icon Chevron up icon
Chapter 9: User and Project Management in GCP Chevron down icon Chevron up icon
Chapter 10: Data Governance in GCP Chevron down icon Chevron up icon
Chapter 11: Cost Strategy in GCP Chevron down icon Chevron up icon
Chapter 12: CI/CD on GCP for Data Engineers Chevron down icon Chevron up icon
Chapter 13: Boosting Your Confidence as a Data Engineer Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5
(6 Ratings)
5 star 66.7%
4 star 16.7%
3 star 16.7%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Steve Young Jun 21, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Google Cloud Platform can be a very broad topic and contains many different products and services, yet this author was able to articulate on data engineering within the platform in a way that was much less dense than other books I’ve read. I am a data engineer and work with GCP on a daily basis. It was a pretty easy read while containing a lot of insightful and useful information about building data pipelines and other necessary activities of data engineering. The author also provided good color on how the platform is often used in particular industries which I found both useful and interesting. This book is a must-read if you have an interest in becoming a more functional and knowledgeable data engineer using GCP.
Amazon Verified review Amazon
SHASHI ANANTH Jun 12, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
One of the best books I have ever read, author is incredible in articulating Data warehousing, Data Engineering concepts, scenario are nicely explained , GCP data engineering tools are well explained with detailed steps. overall very nice book to learn Data engineering on Google cloud platform.
Amazon Verified review Amazon
Daniel J. Hampton III Jun 17, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have been struggling to find a book that covered comprehensive big picture concepts as well as technical details and I think this book balances it quite well.
Amazon Verified review Amazon
Johnnie Sep 15, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Many concepts are covered (batch and streaming data pipeline creation, job orchestration, data governance and cost strategies) - as well as GCP cloud data storage options (with discussions in data warehouse design using BigQuery).The book went into more complex data engineering concepts in GCP such as ephemeral clusters, Dataproc (examining Hadoop, Spark and Dataframe concepts) and CI/CD practices.Note:For the curious minded engineer who ask, “dude … you mentioned ephemeral clusters. What’s the difference in ephemeral and persistent clusters???”Good question! With Persistent clusters there always is some infrastructure running. But with ephemeral clusters the clusters are created, exist for the time it takes for jobs to complete, and then cease to exist when they are brought downHow about transient clusters? I’ll leave that research up to you!“Lots of examples and exercises” are provided that enable a “hands on experience” for the reader to engage for greater understanding.This book provides data engineers with the concepts, hands on activities and guidance necessary to navigate the Google Cloud Platform (GCP).
Amazon Verified review Amazon
mayanktripathi4u Sep 27, 2024
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
This book offers a comprehensive exploration of data engineering principles, specifically in the context of Google Cloud Platform (GCP). Aimed at both beginners and intermediate data engineers, it serves as an excellent resource for those looking to understand the fundamentals of building scalable data pipelines using GCP services. The book is particularly well-suited for data engineers, cloud architects, and IT professionals seeking to build robust, scalable data pipelines using Google Cloud’s services.What I liked:One of the most valuable aspects of this book is its structured approach. Adi Wijaya begins by laying a solid foundation, introducing readers to essential tools such as BigQuery, Cloud Storage, and Cloud Dataflow. From there, he builds upon that knowledge with more advanced topics like real-time data processing and machine learning integration, making it accessible for readers with varying levels of experience.The hands-on tutorials are another highlight, offering step-by-step instructions that allow readers to practice and implement what they've learned. This practical emphasis makes complex topics easier to grasp, particularly for those who prefer learning by doing. The author also includes command-line tools like gcloud and gsutil for interacting with Google Cloud services, providing readers with real-world experience in managing cloud resources. Additionally, the author does an excellent job showcasing real-world use cases, allowing readers to understand how these tools are applied in professional data engineering settings.Things which are missing as per my opinion:Although the book is packed with useful information, it may feel fast-paced for absolute beginners to cloud computing. Some prior understanding of cloud concepts would be beneficial to fully grasp the more advanced sections. Additionally, while the book provides a detailed look into GCP, readers looking for cross-platform comparisons (e.g., AWS or Azure) won’t find such insights here.Final Thoughts:Overall, "Data Engineering with Google Cloud Platform" is a highly valuable resource for anyone looking to master data engineering within GCP. Adi Wijaya delivers a balanced mix of theory and practical application, making it an ideal read for aspiring and practicing data engineers. Whether you're developing pipelines, optimizing workflows, or integrating machine learning, this book provides the knowledge you need to excel in GCP’s data ecosystem.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.