You're reading from Building Data Science Solutions with Anaconda A comprehensive starter guide to building robust and complete models

Product type Paperback

Published in May 2022

Publisher Packt

ISBN-13 9781800568785

Length 330 pages

Edition 1st Edition

Tools

Anaconda

Concepts

Data Science

Author (1):

Dan Meador

View More author details

Table of Contents (16) Chapters

Preface

1. Part 1: The Data Science Landscape – Open Source to the Rescue

2. Chapter 1: Understanding the AI/ML landscape FREE CHAPTER

3. Chapter 2: Analyzing Open Source Software

4. Chapter 3: Using the Anaconda Distribution to Manage Packages

5. Chapter 4: Working with Jupyter Notebooks and NumPy

6. Part 2: Data Is the New Oil, Models Are the New Refineries

7. Chapter 5: Cleaning and Visualizing Data

8. Chapter 6: Overcoming Bias in AI/ML

9. Chapter 7: Choosing the Best AI Algorithm

10. Chapter 8: Dealing with Common Data Problems

11. Part 3: Practical Examples and Applications

12. Chapter 9: Building a Regression Model with scikit-learn

13. Chapter 10: Explainable AI - Using LIME and SHAP

14. Chapter 11: Tuning Hyperparameters and Versioning Your Model

15. Other Books You May Enjoy

Evaluating potential models using MSE and R2 scores

There will always be a large number of potential models that you can attempt to train, and you can spend a large amount of time tweaking each of them to optimize them. It's valuable to understand which ones could give you the best outcome before you spend a large amount of time on any option. We're going to use k-fold validation to check how we trained the model. This will take our training data and create k sections. You can think of this as folding a piece of paper k times, and then taking turns using one of the k sections as the testing data, and the rest as the training data:

First, we want to import what we need for this exercise. The next bit of code will do the training so we can see which model would be a nice fit. We'll start as usual by importing what we need:
```
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import ...
```