Packt+ | Advance your knowledge in tech

You're reading from Machine Learning Solutions Expert techniques to tackle complex machine learning problems using Python

Product type Paperback

Published in Apr 2018

Publisher Packt

ISBN-13 9781788390040

Length 566 pages

Edition 1st Edition

Languages

Python

Tools

OpenCV

Concepts

Machine Learning

Author (1):

Jalaj Thanaki

View More author details

Table of Contents (19) Chapters

Machine Learning Solutions

Foreword

Contributors

Preface

1. Credit Risk Modeling

2. Stock Market Price Prediction FREE CHAPTER

3. Customer Analytics

4. Recommendation Systems for E-Commerce

5. Sentiment Analysis

6. Job Recommendation Engine

7. Text Summarization

8. Developing Chatbots

9. Building a Real-Time Object Recognition App

10. Face Recognition and Face Emotion Recognition

11. Building Gaming Bot

List of Cheat Sheets

Strategy for Wining Hackathons

Index

Testing the baseline model

In this section, we will implement the code, which will give us an idea about how good or how bad our trained ML models perform in a validation set. We are using the mean accuracy score and the AUC-ROC score.

Here, we have generated five different classifiers and, after performing testing for each of them on the validation dataset, which is 25% of held-out dataset from the training dataset, we will find out which ML model works well and gives us a reasonable baseline score. So let's look at the code:.

Figure 1.55: Code snippet to obtain a test score for the trained ML model

In the preceding code snippet, you can see the scores for three classifiers.

Refer to the code snippet in the following figure:

Figure 1.56: Code snippet to obtain the test score for the trained ML model

In the code snippet, you can see the score of the two classifiers.

Using the score() function of scikit-learn, you will get the mean accuracy score, whereas, the roc_auc_score() function will provide you with the ROC-AUC score, which is more significant for us because the mean accuracy score considers only one threshold value, whereas the ROC-AUC score takes into consideration all possible threshold values and gives us the score.

As you can see in the code snippets given above, the AdaBoost and GradientBoosting classifiers get a good ROC-AUC score on the validation dataset. Other classifiers, such as logistic regression, KNN, and RandomForest do not perform well on the validation set. From this stage onward, we will work with AdaBoost and GradientBoosting classifiers in order to improve their accuracy score.

In the next section, we will see what we need to do in order to increase classification accuracy. We need to list what can be done to get good accuracy and what are the current problems with the classifiers. So let's analyze the problem with the existing classifiers and look at their solutions.

The rest of the chapter is locked

You're reading from Machine Learning Solutions Expert techniques to tackle complex machine learning problems using Python

Table of Contents (19) Chapters

Testing the baseline model

Authors (1)

Other recommended products

Personalised recommendations for you

You're reading from Machine Learning Solutions Expert techniques to tackle complex machine learning problems using Python

Table of Contents (19) Chapters

Testing the baseline model

Unlock this book and the full library FREE for 7 days

Authors (1)

Other recommended products

Personalised recommendations for you