You're reading from Machine Learning at Scale with H2O A practical guide to building and deploying machine learning models on enterprise systems

Product type Paperback

Published in Jul 2022

Publisher Packt

ISBN-13 9781800566019

Length 396 pages

Edition 1st Edition

Tools

H2O

Concepts

Machine Learning

Authors (2):

Gregory Keys

David Whiting

View More author details

Table of Contents (22) Chapters

Preface

1. Section 1 – Introduction to the H2O Machine Learning Platform for Data at Scale

2. Chapter 1: Opportunities and Challenges FREE CHAPTER

3. Chapter 2: Platform Components and Key Concepts

4. Chapter 3: Fundamental Workflow – Data to Deployable Model

5. Section 2 – Building State-of-the-Art Models on Large Data Volumes Using H2O

6. Chapter 4: H2O Model Building at Scale – Capability Articulation

7. Chapter 5: Advanced Model Building – Part I

8. Chapter 6: Advanced Model Building – Part II

9. Chapter 7: Understanding ML Models

10. Chapter 8: Putting It All Together

11. Section 3 – Deploying Your Models to Production Environments

12. Chapter 9: Production Scoring and the H2O MOJO

13. Chapter 10: H2O Model Deployment Patterns

14. Section 4 – Enterprise Stakeholder Perspectives

15. Chapter 11: The Administrator and Operations Views

16. Chapter 12: The Enterprise Architect and Security Views

17. Section 5 – Broadening the View – Data to AI Applications with the H2O AI Cloud Platform

18. Chapter 13: Introducing H2O AI Cloud

19. Chapter 14: H2O at Scale in a Larger Platform Context

20. Other Books You May Enjoy

Appendix : Alternative Methods to Launch H2O Clusters

Feature engineering

In Chapter 5, Advanced Model Building – Part I, we introduced some feature engineering concepts and discussed target encoding at length. In this section, we will delve into feature engineering in a bit more depth. We can organize feature engineering as follows:

Algebraic transformations
Features engineered from dates
Simplifying categorical variables by combining categories
Missing value indicator functions
Target encoding categorical columns

The ordering of these transformations is not important except for the last one. Target encoding is the only transformation that requires data to be split into train and test sets. By saving it for the end, we can apply the other transformations to the entire dataset at once rather than separately to the training and test splits. Also, we introduce stratified sampling for splitting data in H2O-3. This has very little impact on our current use case but is important when data is highly imbalanced...

The rest of the chapter is locked

You're reading from Machine Learning at Scale with H2O A practical guide to building and deploying machine learning models on enterprise systems

Table of Contents (22) Chapters

Feature engineering

Unlock this book and the full library FREE for 7 days

Authors (2)

Personalised recommendations for you