You're reading from Machine Learning with R Learn techniques for building and improving machine learning models, from data preparation to model tuning, evaluation, and working with big data

Product type Paperback

Published in May 2023

Publisher Packt

ISBN-13 9781801071321

Length 762 pages

Edition 4th Edition

Languages

Tools

H2O

Concepts

Big Data

Author (1):

Brett Lantz

View More author details

Table of Contents (18) Chapters

Preface

1. Introducing Machine Learning

2. Managing and Understanding Data FREE CHAPTER

3. Lazy Learning – Classification Using Nearest Neighbors

4. Probabilistic Learning – Classification Using Naive Bayes

5. Divide and Conquer – Classification Using Decision Trees and Rules

6. Forecasting Numeric Data – Regression Methods

7. Black-Box Methods – Neural Networks and Support Vector Machines

8. Finding Patterns – Market Basket Analysis Using Association Rules

9. Finding Groups of Data – Clustering with k-means

10. Evaluating Model Performance

11. Being Successful with Machine Learning

12. Advanced Data Preparation

13. Challenging Data – Too Much, Too Little, Too Complex

14. Building Better Learners

15. Making Use of Big Data

16. Other Books You May Enjoy

17. Index

Summary

This chapter presented several of the most common measures and techniques for evaluating the performance of machine learning classification models. Although accuracy provides a simple method for examining how often a model is correct, this can be misleading in the case of rare events because the real-life importance of such events may be inversely proportional to how frequently they appear in the data.

Some measures based on confusion matrices better capture a model’s performance as well as the balance between the costs of various types of errors. The kappa statistic and Matthews correlation coefficient are two more sophisticated measures of performance, which work well even for severely unbalanced datasets. Additionally, closely examining the tradeoffs between sensitivity and specificity, or precision and recall, can be a useful tool for thinking about the implications of errors in the real world. Visualizations such as the ROC curve are also helpful to this end...