You're reading from Journey to Become a Google Cloud Machine Learning Engineer Build the mind and hand of a Google Certified ML professional

Product type Paperback

Published in Sep 2022

Publisher Packt

ISBN-13 9781803233727

Length 330 pages

Edition 1st Edition

Languages

Python

Tools

BigQuery

Concepts

Machine Learning

Author (1):

Dr. Logan Song

View More author details

Table of Contents (23) Chapters

Preface

1. Part 1: Starting with GCP and Python

2. Chapter 1: Comprehending Google Cloud Services FREE CHAPTER

3. Chapter 2: Mastering Python Programming

4. Part 2: Introducing Machine Learning

5. Chapter 3: Preparing for ML Development

6. Chapter 4: Developing and Deploying ML Models

7. Chapter 5: Understanding Neural Networks and Deep Learning

8. Part 3: Mastering ML in GCP

9. Chapter 6: Learning BQ/BQML, TensorFlow, and Keras

10. Chapter 7: Exploring Google Cloud Vertex AI

11. Chapter 8: Discovering Google Cloud ML API

12. Chapter 9: Using Google Cloud ML Best Practices

13. Part 4: Accomplishing GCP ML Certification

14. Chapter 10: Achieving the GCP ML Certification

15. Part 5: Appendices

16. Index

Why subscribe?

17. Other Books You May Enjoy

Appendix 1: Practicing with Basic GCP Services

1. Appendix 2: Practicing Using the Python Data Libraries

2. Appendix 3: Practicing with Scikit-Learn

3. Appendix 4: Practicing with Google Vertex AI

4. Appendix 5: Practicing with Google Cloud ML API

Splitting the dataset

Through the data preparation process, we have gained a dataset that is ready to be used for model development. To avoid model underfitting and overfitting, it is a best practice to split the dataset randomly yet proportionally, into independent subsets based on the model development process: a training dataset, a validation dataset, and a testing dataset:

Training dataset: The subset of data used to train the model. The model will learn from the training dataset.
Validation dataset: The subset of data used to validate the trained model. Model hyperparameters will be tuned for optimization based on validation.
Testing dataset: The subset of data used to evaluate a final model before its deployment to production.

A common practice is to use 80 percent of the data for the training subset, 10 percent for validation, and 10 percent for testing. When you have a large amount of data, you can split it into 70 percent training, 15 percent validation...