You're reading from Machine Learning for Streaming Data with Python Rapidly build practical online machine learning solutions using River and other top key frameworks

Product type Paperback

Published in Jul 2022

Publisher Packt

ISBN-13 9781803248363

Length 258 pages

Edition 1st Edition

Languages

Python

Tools

River

Concepts

Machine Learning

Author (1):

Joos Korstanje

View More author details

Table of Contents (17) Chapters

Preface

1. Part 1: Introduction and Core Concepts of Streaming Data

2. Chapter 1: An Introduction to Streaming Data FREE CHAPTER

3. Chapter 2: Architectures for Streaming and Real-Time Machine Learning

4. Chapter 3: Data Analysis on Streaming Data

5. Part 2: Exploring Use Cases for Data Streaming

6. Chapter 4: Online Learning with River

7. Chapter 5: Online Anomaly Detection

8. Chapter 6: Online Classification

9. Chapter 7: Online Regression

10. Chapter 8: Reinforcement Learning

11. Part 3: Advanced Concepts and Best Practices around Streaming Data

12. Chapter 9: Drift and Drift Detection

13. Chapter 10: Feature Transformation and Scaling

14. Chapter 11: Catastrophic Forgetting

15. Chapter 12: Conclusion and Best Practices

16. Other Books You May Enjoy

Comparing anomaly detection and imbalanced classification

For detecting positive cases against negative cases, the standard go-to family of methods would be classification. For the problems described, as long as you have historical data on at least a few positive and negative cases, you can use classification algorithms. However, you have a very common problem: there are only very few observations that are anomalies. This is a problem that is generally known as the problem of imbalanced data.

The problem of imbalanced data

Imbalanced datasets are datasets in which the target class has very unevenly distributed occurrences. An often-occurring example is website sales: among 1,000 visitors, you often have at least 900 visitors that are just watching and browsing, as opposed to maybe 100 who actually buy something.

Using classification methods carelessly on imbalanced data is prone to errors. Imagine that you fit a classification model that needs to predict for each website visitor...