Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Journey to Become a Google Cloud Machine Learning Engineer

You're reading from   Journey to Become a Google Cloud Machine Learning Engineer Build the mind and hand of a Google Certified ML professional

Arrow left icon
Product type Paperback
Published in Sep 2022
Publisher Packt
ISBN-13 9781803233727
Length 330 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Dr. Logan Song Dr. Logan Song
Author Profile Icon Dr. Logan Song
Dr. Logan Song
Arrow right icon
View More author details
Toc

Table of Contents (23) Chapters Close

Preface 1. Part 1: Starting with GCP and Python
2. Chapter 1: Comprehending Google Cloud Services FREE CHAPTER 3. Chapter 2: Mastering Python Programming 4. Part 2: Introducing Machine Learning
5. Chapter 3: Preparing for ML Development 6. Chapter 4: Developing and Deploying ML Models 7. Chapter 5: Understanding Neural Networks and Deep Learning 8. Part 3: Mastering ML in GCP
9. Chapter 6: Learning BQ/BQML, TensorFlow, and Keras 10. Chapter 7: Exploring Google Cloud Vertex AI 11. Chapter 8: Discovering Google Cloud ML API 12. Chapter 9: Using Google Cloud ML Best Practices 13. Part 4: Accomplishing GCP ML Certification
14. Chapter 10: Achieving the GCP ML Certification 15. Part 5: Appendices
16. Index 17. Other Books You May Enjoy Appendix 1: Practicing with Basic GCP Services 1. Appendix 2: Practicing Using the Python Data Libraries 2. Appendix 3: Practicing with Scikit-Learn 3. Appendix 4: Practicing with Google Vertex AI 4. Appendix 5: Practicing with Google Cloud ML API

Splitting the dataset

Through the data preparation process, we have gained a dataset that is ready to be used for model development. To avoid model underfitting and overfitting, it is a best practice to split the dataset randomly yet proportionally, into independent subsets based on the model development process: a training dataset, a validation dataset, and a testing dataset:

  • Training dataset: The subset of data used to train the model. The model will learn from the training dataset.
  • Validation dataset: The subset of data used to validate the trained model. Model hyperparameters will be tuned for optimization based on validation.
  • Testing dataset: The subset of data used to evaluate a final model before its deployment to production.

A common practice is to use 80 percent of the data for the training subset, 10 percent for validation, and 10 percent for testing. When you have a large amount of data, you can split it into 70 percent training, 15 percent validation...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime