You're reading from Machine Learning at Scale with H2O A practical guide to building and deploying machine learning models on enterprise systems

Product type Paperback

Published in Jul 2022

Publisher Packt

ISBN-13 9781800566019

Length 396 pages

Edition 1st Edition

Tools

H2O

Concepts

Machine Learning

Authors (2):

Gregory Keys

David Whiting

View More author details

Table of Contents (22) Chapters

Preface

1. Section 1 – Introduction to the H2O Machine Learning Platform for Data at Scale

2. Chapter 1: Opportunities and Challenges FREE CHAPTER

3. Chapter 2: Platform Components and Key Concepts

4. Chapter 3: Fundamental Workflow – Data to Deployable Model

5. Section 2 – Building State-of-the-Art Models on Large Data Volumes Using H2O

6. Chapter 4: H2O Model Building at Scale – Capability Articulation

7. Chapter 5: Advanced Model Building – Part I

8. Chapter 6: Advanced Model Building – Part II

9. Chapter 7: Understanding ML Models

10. Chapter 8: Putting It All Together

11. Section 3 – Deploying Your Models to Production Environments

12. Chapter 9: Production Scoring and the H2O MOJO

13. Chapter 10: H2O Model Deployment Patterns

14. Section 4 – Enterprise Stakeholder Perspectives

15. Chapter 11: The Administrator and Operations Views

16. Chapter 12: The Enterprise Architect and Security Views

17. Section 5 – Broadening the View – Data to AI Applications with the H2O AI Cloud Platform

18. Chapter 13: Introducing H2O AI Cloud

19. Chapter 14: H2O at Scale in a Larger Platform Context

20. Other Books You May Enjoy

Appendix : Alternative Methods to Launch H2O Clusters

Splitting data for validation or cross-validation and testing

Splitting data into training, validation, and test sets is the accepted standard for model building when the size of the data is sufficiently large. The idea behind validation is simple: most algorithms naturally overfit on training data. Here, overfitting means that some of what is being modeled are actual idiosyncrasies of that specific dataset (for instance, noise) rather than representative of the population as a whole. So, how do you correct this? Well, you can do it by creating a holdout sample, called a validation set, which is scored against during the model-building process to determine whether what is being modeled is a signal or noise. This enables things such as hyperparameter tuning, model regularization, early stopping, and more.

The test dataset is an additional holdout that is used at the end of model building to determine true model performance. Having holdout test data is critical for...

The rest of the chapter is locked

You're reading from Machine Learning at Scale with H2O A practical guide to building and deploying machine learning models on enterprise systems

Table of Contents (22) Chapters

Splitting data for validation or cross-validation and testing

Unlock this book and the full library FREE for 7 days

Authors (2)

Personalised recommendations for you