You're reading from Machine Learning at Scale with H2O A practical guide to building and deploying machine learning models on enterprise systems

Product type Paperback

Published in Jul 2022

Publisher Packt

ISBN-13 9781800566019

Length 396 pages

Edition 1st Edition

Tools

H2O

Concepts

Machine Learning

Authors (2):

Gregory Keys

David Whiting

View More author details

Table of Contents (22) Chapters

Preface

1. Section 1 – Introduction to the H2O Machine Learning Platform for Data at Scale

2. Chapter 1: Opportunities and Challenges FREE CHAPTER

3. Chapter 2: Platform Components and Key Concepts

4. Chapter 3: Fundamental Workflow – Data to Deployable Model

5. Section 2 – Building State-of-the-Art Models on Large Data Volumes Using H2O

6. Chapter 4: H2O Model Building at Scale – Capability Articulation

7. Chapter 5: Advanced Model Building – Part I

8. Chapter 6: Advanced Model Building – Part II

9. Chapter 7: Understanding ML Models

10. Chapter 8: Putting It All Together

11. Section 3 – Deploying Your Models to Production Environments

12. Chapter 9: Production Scoring and the H2O MOJO

13. Chapter 10: H2O Model Deployment Patterns

14. Section 4 – Enterprise Stakeholder Perspectives

15. Chapter 11: The Administrator and Operations Views

16. Chapter 12: The Enterprise Architect and Security Views

17. Section 5 – Broadening the View – Data to AI Applications with the H2O AI Cloud Platform

18. Chapter 13: Introducing H2O AI Cloud

19. Chapter 14: H2O at Scale in a Larger Platform Context

20. Other Books You May Enjoy

Appendix : Alternative Methods to Launch H2O Clusters

Data wrangling

It is frequently said that 80–90% of a data scientist's job is dealing with data. At a minimum, you should understand the data granularity (that is, what the rows represent) and know what each column in the dataset means. Presented with a raw data source, there are multiple steps required to clean, organize, and transform your data into a modeling-ready dataset format.

The dataset used for the Lending Club example in Chapters 3, 5, and 7 was derived from a raw data file that we begin with here. In this section, we will illustrate the following steps:

Import the raw data and determine which columns to keep.
Define the problem, and create a response variable.
Convert the implied numeric data from strings into numeric values.
Clean up any messy categorical columns.

Let's begin with the first step: importing the data.

Importing the raw data

We import the raw data file using the following code:

input_csv = "rawloans...

The rest of the chapter is locked

You're reading from Machine Learning at Scale with H2O A practical guide to building and deploying machine learning models on enterprise systems

Table of Contents (22) Chapters

Data wrangling

Importing the raw data

Unlock this book and the full library FREE for 7 days

Authors (2)

Personalised recommendations for you