You're reading from The Statistics and Machine Learning with R Workshop Unlock the power of efficient data science modeling with this hands-on guide

Product type Paperback

Published in Oct 2023

Publisher Packt

ISBN-13 9781803240305

Length 516 pages

Edition 1st Edition

Languages

Concepts

Data Science

Author (1):

Liu Peng

View More author details

Table of Contents (20) Chapters

Preface

1. Part 1:Statistics Essentials

2. Chapter 1: Getting Started with R FREE CHAPTER

3. Chapter 2: Data Processing with dplyr

4. Chapter 3: Intermediate Data Processing

5. Chapter 4: Data Visualization with ggplot2

6. Chapter 5: Exploratory Data Analysis

7. Chapter 6: Effective Reporting with R Markdown

8. Part 2:Fundamentals of Linear Algebra and Calculus in R

9. Chapter 7: Linear Algebra in R

10. Chapter 8: Intermediate Linear Algebra in R

11. Chapter 9: Calculus in R

12. Part 3:Fundamentals of Mathematical Statistics in R

13. Chapter 10: Probability Basics

14. Chapter 11: Statistical Estimation

15. Chapter 12: Linear Regression in R

16. Chapter 13: Logistic Regression in R

17. Chapter 14: Bayesian Statistics

18. Index

Why subscribe?

19. Other Books You May Enjoy

Dealing with an imbalanced dataset

When building a logistic regression model using a dataset whose target is a binary outcome, it could be the case that the target values are not equally distributed. This means that we would observe more non-events (y = 0) than events (y = 1), as is often the case in applications such as fraudulent transactions in banks, spam/phishing emails for corporate employees, identification of diseases such as cancer, and natural disasters such as earthquakes. In these situations, the classification performance may be dominated by the majority class.

Such domination can result in misleadingly high accuracy scores, which correspond to poor predictive performance. To see this, suppose we are developing a default prediction model using a dataset that consists of 1,000 observations, where only 10 (or 1%) of them are default cases. A naive model would simply predict every observation as non-default, resulting in a 99% accuracy.

When we encounter an imbalanced...

The rest of the chapter is locked

You're reading from The Statistics and Machine Learning with R Workshop Unlock the power of efficient data science modeling with this hands-on guide

Table of Contents (20) Chapters

Dealing with an imbalanced dataset

Authors (1)

Personalised recommendations for you

You're reading from The Statistics and Machine Learning with R Workshop Unlock the power of efficient data science modeling with this hands-on guide

Table of Contents (20) Chapters

Dealing with an imbalanced dataset

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you