Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Machine Learning for Streaming Data with Python
Machine Learning for Streaming Data with Python

Machine Learning for Streaming Data with Python: Rapidly build practical online machine learning solutions using River and other top key frameworks

eBook
€19.99 €28.99
Paperback
€35.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Table of content icon View table of contents Preview book icon Preview Book

Machine Learning for Streaming Data with Python

Chapter 1: An Introduction to Streaming Data

Streaming analytics is one of the new hot topics in data science. It proposes an alternative framework to the more standard batch processing, in which we are no longer dealing with datasets on a fixed time of treatment, but rather we are handling every individual data point directly upon reception.

This new paradigm has important consequences for data engineering, as it requires much more robust and, particularly, much faster data ingestion pipelines. It also imposes a big change in data analytics and machine learning.

Until recently, machine learning and data analytics methods and algorithms were mainly designed to work on entire datasets. Now that streaming has become a hot topic, it becomes more and more common to see use cases in which entire datasets just do not exist anymore. When a continuous stream of data is being ingested into a data storage source, there is no natural moment to relaunch an analytics batch job.

Streaming analytics and streaming machine learning models are models that are designed to work specifically with streaming data sources. A part of the solution, for example, is in the updating. Streaming analytics and machine learning need to update all the time as new data is being received. When updating, you may also want to forget the much older data.

This and other problems that are introduced by moving from batch analytics to streaming analytics need a different approach to analytics and machine learning. This book will lay out the basis for getting you started with data analytics and machine learning on data that is received as a continuous stream.

In this first chapter, you'll get a more solid understanding of the differences between streaming and batch data. You'll see some example use cases that showcase the importance of working with streaming rather than converting back into batch. You'll also start working with a first Python example to get a feel for the type of work that you'll be doing throughout this book.

In later chapters, you'll see some more background notions on architecture and, then, you'll go into a number of data science and analytics use cases and how they can be adapted to the new streaming paradigm.

In this chapter, you will discover the following topics:

  • A short history of data science
  • Working with streaming data
  • Real-time data formats and importing an example dataset in Python

Technical requirements

You can find all the code for this book on GitHub at the following link: https://github.com/PacktPublishing/Machine-Learning-for-Streaming-Data-with-Python. If you are not yet familiar with Git and GitHub, the easiest way to download the notebooks and code samples is the following:

  1. Go to the link of the repository.
  2. Go to the green Code button.
  3. Select Download ZIP:
Figure 1.1 – GitHub interface example

Figure 1.1 – GitHub interface example

When you download the ZIP file, you unzip it in your local environment, and you will be able to access the code through your preferred Python editor.

Setting up a Python environment

To follow along with this book, you can download the code in the repository and execute it using your preferred Python editor.

If you are not yet familiar with Python environments, I would advise you to check out Anaconda (https://www.anaconda.com/products/individual), which comes with the Jupyter Notebook and JupyterLab, which are both great for executing notebooks. It also comes with Spyder and VSCode for editing scripts and programs.

If you have difficulty installing Python or the associated programs on your machine, you can check out Google Colab (https://colab.research.google.com/) or Kaggle Notebooks (https://www.kaggle.com/code), which both allow you to run Python code in online notebooks for free, without any setup to do.

Note

The code in the book will generally use Colab and Kaggle Notebooks with Python version 3.7.13 and you can set up your own environment to mimic this.

A short history of data science

Over the last few years, new technology domains have quickly taken over a lot of parts of the world. Machine learning, artificial intelligence, and data science are new fields that have entered our daily life, both in our personal lives and in our professional lives.

The topics that data scientists work on today are not new. The absolute foundation of the field is in mathematics and statistics, two fields that have existed for centuries. As an example, least squares regression was first published in 1805. With time, mathematicians and statisticians have continued working on finding other methods and models.

In the following timeline, you can see how the recent boom in technology has been able to take place. In the 1600s and 1700s, very smart people were already laying the foundations for what we still do in statistics and mathematics today. However, it was not until the invention and popularization of computing power that the field became booming.

Figure 1.2 – A timeline of the history of data

Figure 1.2 – A timeline of the history of data

Personal computer and internet accessibility is an important reason for data science's popularity today. Almost everyone has a computer that is performant enough for fairly complex machine learning. This strongly helps computer literacy, but also, online documentation accessibility is a big booster for learning.

The availability of big data tools such as Hadoop and Spark is also an important part of the popularization of data science, as they allow practitioners to work with datasets that are larger than anyone could ever imagine before.

Lastly, cloud computing is allowing data scientists from all over the world to access very powerful hardware at low prices. Especially for big data tools, the hardware needed is still priced in a way that most students would not be able to buy it for training purposes. Cloud computing gives access to those use cases for many.

In this book, you will learn how to work with streaming data. It is important to have this short history of data science in mind, as streaming data is one of those technologies that has been disadvantaged by the need for difficult hardware and setup requirements. Streaming data is currently gaining popularity quickly in many domains and has the potential to be a big hit in the coming period. Let's now have a deeper look into the definition of streaming data.

Working with streaming data

Streaming data is data that is streamed. You may know the term streaming from online video services on which you can stream video. When doing this, the video streaming service will continue sending the next parts of the video to you while you are already watching the first part of the video.

The concept is the same when working with streaming data. The data format is not necessarily video and can be any data type that is useful for your use case. One of the most intuitive examples is that of an industrial production line, in which you have continuous measurements from sensors. As long as your production line doesn't pause, you will continue to generate measurements. We will check out the following overview of the data streaming process:

Figure 1.3 – The data streaming process

Figure 1.3 – The data streaming process

The important notion is that you have a continuous flow of data that you need to treat in real time. You cannot wait until the production line stops to do your analysis, as you would need to detect potential problems right away.

Streaming data versus batch data

Streaming data is generally not among the first use cases that new data scientists tend to start with. The type of problem that is usually introduced first is batch use cases. Batch data is the opposite of streaming data, as it works in phases: you collect a bunch of data, and then you treat a bunch of data.

If you see streaming data as streaming a video online, you could see batch data as downloading the entire video first and then watching it when the downloading is finished. For analytical purposes, this would mean that you get the analysis of a bunch of data when the data generating process is finished rather than whenever a problem occurs.

For some use cases, this is not a problem. Yet, you can understand that streaming can deliver great added value in those use cases where fast analytics can have an impact. It also has added value in use cases where data is ingested in a streaming method, which is becoming more and more common. In practice, many use cases that would get added value through streaming are still solved with batch treatment, just because these methods are better known and more widespread.

The following overview shows the batch treatment process:

Figure 1.4 – The batch process

Figure 1.4 – The batch process

Advantages of streaming data

Let's now look at some advantages of using streaming analytics rather than other approaches in the following subsections.

Data generating processes are in real time

The first advantage of building streaming data analytics rather than batch systems is that many data generating processes are actually in real time. You will discover a number of use cases later, but in general, it is rare that data collection is done in batches.

Although most of us are used to building batch systems around real-time data generating systems, it often makes more sense to build streaming analytics directly.

Of course, batch analytics and streaming analytics can co-exist. Yet, adding a batch treatment to a streaming analytics service is often much easier than adding streaming functionality into a system that is designed for batches. It simply makes the most sense to start with streaming.

Real-time insights have value

When designing data science solutions, streaming does not always come to mind first. However, when solutions or tools are built in real time, it is rare that the real-time functionality is not appreciated.

Many analytical solutions of today are built in real time and the tools are available. In many problems, real-time information will be used at some point. Maybe it will not be used from the start, but the day that anomalies happen, you will find a great competitive advantage in having the analytics straight away, rather than waiting till the next hour or the next morning.

Examples of successful implementation of streaming analytics

Let's talk about some examples of companies that have implemented real-time analytics successfully. The first example is Shell. They have been able to implement real-time analytics of their security cameras on their gas stations. An automated and real-time machine learning pipeline is able to detect whether people are smoking.

Another example is the use of sensor data in connected sports equipment. By measuring heart rate and other KPIs in real time, they are able to alert you when anything is wrong with your body.

Of course, the big players such as Facebook and Twitter also analyze a lot of data in real time, for example, when detecting fake news or bad content. There are many successful use cases of streaming analytics, yet at the same time, there are some common challenges that streaming data brings with them. Let's have a look at them now.

Challenges of streaming data

Streaming data analytics are currently less widespread than batch data analytics. Although this is slowly changing, it is good to understand where the challenges are when working with streaming data.

Knowledge of streaming analytics

One simple reason for streaming analytics being less widespread is a question of knowledge and know-how. Setting up streaming analytics is often not taught in schools and is definitely not taught as the go-to method. There are also fewer resources available on the internet to get started with it. As there are much more resources on machine learning and analytics for batch treatment, and the batch methods do not apply to streaming data, people tend to start with batch applications for data science.

Understanding the architecture

A second difficulty when working on streaming data is architecture. Although some data science practitioners have knowledge of architecture, data engineering, and DevOps, this is not always the case. To set up a streaming analytics proof of concept or a minimum viable product (MVP), all those skills are needed. For batch treatment, it is often enough to work with scripts.

Architectural difficulties are inherent to streaming, as it is necessary to work with real-time processes that send individually collected records to an analytical treatment process that will update in real time. If there is no architecture that can handle this, it does not make much sense to start with streaming analytics.

Financial hurdles

Another challenge when working with streaming data is the financial aspect. Although working with streaming is not necessarily more expensive in the long run, it can be more expensive to set up the infrastructure needed to get started. Working on a local developer PC for an MVP is unlikely to succeed as the data needs to be treated in real time.

Risks of runtime problems

Real-time processes also have a larger risk of runtime problems. When building software, bugs and failures happen. If you are on a daily batch process, you may be able to repair the process, rerun the failed batch, and solve the problem.

If a streaming tool is down, there are risks of losing data. As the data should be ingested in real time, the data that is generated during a time-out of your process may not be recuperable. If your process is very important, you will need to set up extensive monitoring day and night and have more quality checks before pushing your solutions to production. Of course, this is also important in batch processes, but even more so in streaming.

Smaller analytics (fewer methods easily available)

The last challenge of streaming analytics is that the common methods are generally developed for batch data first. There are currently many solutions out there for analytics on real time and streaming data, but still not as many as for batch data.

Also, since the streaming analysis has to be done very quickly to respect real-time delivery, streaming use cases tend to end up with much less interesting analytical methodologies and stay at the basic level of descriptive or basic analyses.

How to get started with streaming data

For companies to get started with streaming data, the first step is often to start by putting in place simple applications that collect real-time data and make that real-time data accessible in real time. Common use cases to start with are log data, website visits data, or sensor data.

A next step would often be to build reporting tools on top of the real-time data source. You can think about KPI dashboards that update in real time, or small and simple alerting tools based on high or low threshold values based on business rules.

When such systems are in place, this leads the way to replace those business rules, or add on top of them. You can think about more advanced analytics tools including real-time machine learning for anomaly detection and more.

The most complex step is to add automated feedback loops between your real-time machine learning and your process. After all, there is no reason to stop at analytics for business insights if there is potential to automate and improve decision-making as well.

Common use cases for streaming data

Let's see a few of the most common use cases for streaming data so that you can get a better feel of the use cases that can benefit from streaming techniques. This will cover three use cases that are relatively accessible for anyone, but of course, there are many more.

Sensor data and anomaly detection

A common use case for streaming data is the analysis of sensor data. Sensor data can occur in a multitude of use cases, such as industry production lines and IoT use cases. When companies decide to collect sensor data, it is often treated in real time.

For a production line, there is great value in detecting anomalies in real time. When too many anomalies occur, the production line can be shut down or the problem can be solved before a number of faulty products are delivered.

A good example of streaming analytics for monitoring humidity for artwork can be found here: https://azure.github.io/iot-workshop-asset-tracking/step-003-anomaly-detection/.

Finance and regression forecasting

Finance data is another great use case for streaming data. For example, in the world of stock trading, timing is important. The faster you can detect up or downtrends in the stock market, the faster a trader (or algorithm) can react by selling or buying stocks and making money.

A great example is described in the following paper by K.S Umadevi et al (2018): https://ieeexplore.ieee.org/document/8554561.

Clickstream for websites and classification

Websites or apps are a third common use case for real-time insights. If you can track and analyze your visitors in real time, you can propose a personalized experience for them on your website. By proposing products or services that match with a website visitor, you can increase your online sales.

The following paper by Ramanna Hanamanthrao and S Thejaswini (2017) gives a great use case for this technology applied to clickstream data: https://ieeexplore.ieee.org/abstract/document/8256978.

Streaming versus big data

It is important to understand different definitions of streaming that you may encounter. One distinction to make is between streaming and big data. Some definitions will consider streaming mainly in a big data (Hadoop/Spark) context, whereas others do not.

Streaming solutions often have a large volume of data, and big data solutions can be the appropriate choice. However, other technologies, combined with a well-chosen hardware architecture, may also be able to do the analytics in real time and, therefore, build streaming solutions without big data technologies.

Streaming versus real-time inference

Real-time inference of models is often built and made accessible via an API. As we define streaming as the analysis of data in real time without batches, such predictions in real time can be considered streaming. You will see more about real-time architectures in a later chapter.

Real-time data formats and importing an example dataset in Python

To finalize this chapter, let's have a look at how to represent streaming data in practice. After all, when building analytics, we will often have to implement test cases and example datasets.

The simplest way to represent streaming data in Python would be to create an iterable object that contains the data and to build your analytics function to work with an iterable.

The following code creates a DataFrame using pandas. There are two columns, temperature and pH:

Code block 1-1

import pandas as pd
data_batch = pd.DataFrame({
'temperature': [10, 11, 10, 11, 12, 11, 10, 9, 10, 11, 12, 11, 9, 12, 11],
    ‹pH›: [5, 5.5, 6, 5, 4.5, 5, 4.5, 5, 4.5, 5, 4, 4.5, 5, 4.5, 6]
})
print(data_batch)

When showing the DataFrame, it will look as follows. The pH is around 4.5/5 but is sometimes higher. The temperature is generally around 10 or 11.

Figure 1.5 – The resulting DataFrame

Figure 1.5 – The resulting DataFrame

This dataset is a batch dataset; after all, you have all the rows (observations) at the same time. Now, let's see how to convert this dataset to a streaming dataset by making it iterable.

You can do this by iterating through the data's rows. When doing this, you set up a code structure that allows you to add more building blocks to this code one by one. When your developments are done, you will be able to use your code on a real-time stream rather than on an iteration of a DataFrame.

The following code iterates through the rows of the DataFrame and converts the rows to JSON format. This is a very common format for communication between different systems. The JSON of the observation contains a value for temperature and a value for pH. Those are printed out as follows:

Code block 1-2

data_iterable = data_batch.iterrows()
for i,new_datapoint in data_iterable:
  print(new_datapoint.to_json())

After running this code, you should obtain a print output that looks like the following:

Figure 1.6 – The resulting print output

Figure 1.6 – The resulting print output

Let's now define a super simple example of streaming data analytics. The function that is defined in the following code block will print an alert whenever the temperature gets below 10:

Code block 1-3

def super_simple_alert(datapoint):
  if datapoint[‹temperature›] < 10:
    print('this is a real time alert. temp too low')

You can now add this alert into your simulated streaming process simply by calling the alerting test at every data point. You can use the following code to do this:

Code block 1-4

data_iterable = data_batch.iterrows()
for i,new_datapoint in data_iterable:
  print(new_datapoint.to_json())
  super_simple_alert(new_datapoint)

When executing this code, you will notice that alerts will be given as soon as the temperature goes below 10:

Figure 1.7 – The resulting print output with alerts on temperature

Figure 1.7 – The resulting print output with alerts on temperature

This alert works only on the temperature, but you could easily add the same type of alert on pH. The following code shows how this can be done. The alert function could be updated to include a second business rule as follows:

Code block 1-5

def super_simple_alert(datapoint):
  if datapoint[‹temperature›] < 10:
    print('this is a real time alert. temp too low')
  if datapoint[‹pH›] > 5.5:
    print('this is a real time alert. pH too high')

Executing the function would still be done in exactly the same way:

Code block 1-6

data_iterable = data_batch.iterrows()
for i,new_datapoint in data_iterable:
  print(new_datapoint.to_json())
  super_simple_alert(new_datapoint)

You will see several alerts being raised throughout the execution on the example streaming data, as follows:

Figure 1.8 – The resulting print output with alerts on temperature and pH

Figure 1.8 – The resulting print output with alerts on temperature and pH

With streaming data, you have to decide without seeing the complete data but just on those data points that have been received in the past. This means that there is a need for a different approach to redeveloping algorithms that are similar to batch processing algorithms.

Throughout this book, you will discover methods that apply to streaming data. The difficulty, as you may understand, is that a statistical method is generally developed to compute things using all the data.

Summary

In this introductory chapter on streaming data and streaming analytics, you have first seen some definitions of what streaming data is, and how it is opposed to batch data processing. In streaming data, you need to work with a continuous stream of data, and more traditional (batch) data science solutions need to be adapted to make things work with this newer and more demanding method of data treatment.

You have seen a number of example use cases, and you should now understand that there can be much-added value for businesses and advanced technology use cases to have data science and analytics calculated on the fly rather than wait for a fixed moment. Real-time insights can be a game-changer, and autonomous machine learning solutions often need real-time decision capabilities.

You have seen an example in which a data stream was created and a simple real-time alerting system was developed. In the next chapter, you will get a much deeper introduction to a number of streaming solutions. In practice, data scientists and analysts will generally not be responsible for putting streaming data ingestion in place, but they will be constrained by the limits of those systems. It is, therefore, important to have a good understanding of streaming and real-time architecture: this will be the goal of the next chapter.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Work on streaming use cases that are not taught in most data science courses
  • Gain experience with state-of-the-art tools for streaming data
  • Mitigate various challenges while handling streaming data

Description

Streaming data is the new top technology to watch out for in the field of data science and machine learning. As business needs become more demanding, many use cases require real-time analysis as well as real-time machine learning. This book will help you to get up to speed with data analytics for streaming data and focus strongly on adapting machine learning and other analytics to the case of streaming data. You will first learn about the architecture for streaming and real-time machine learning. Next, you will look at the state-of-the-art frameworks for streaming data like River. Later chapters will focus on various industrial use cases for streaming data like Online Anomaly Detection and others. As you progress, you will discover various challenges and learn how to mitigate them. In addition to this, you will learn best practices that will help you use streaming data to generate real-time insights. By the end of this book, you will have gained the confidence you need to stream data in your machine learning models.

Who is this book for?

This book is for data scientists and machine learning engineers who have a background in machine learning, are practice and technology-oriented, and want to learn how to apply machine learning to streaming data through practical examples with modern technologies. Although an understanding of basic Python and machine learning concepts is a must, no prior knowledge of streaming is required.

What you will learn

  • Understand the challenges and advantages of working with streaming data
  • Develop real-time insights from streaming data
  • Understand the implementation of streaming data with various use cases to boost your knowledge
  • Develop a PCA alternative that can work on real-time data
  • Explore best practices for handling streaming data that you absolutely need to remember
  • Develop an API for real-time machine learning inference

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 15, 2022
Length: 258 pages
Edition : 1st
Language : English
ISBN-13 : 9781803242637
Category :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning

Product Details

Publication date : Jul 15, 2022
Length: 258 pages
Edition : 1st
Language : English
ISBN-13 : 9781803242637
Category :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 114.97
Modern Time Series Forecasting with Python
€39.99
Time Series Analysis with Python Cookbook
€38.99
Machine Learning for Streaming Data with Python
€35.99
Total 114.97 Stars icon

Table of Contents

16 Chapters
Part 1: Introduction and Core Concepts of Streaming Data Chevron down icon Chevron up icon
Chapter 1: An Introduction to Streaming Data Chevron down icon Chevron up icon
Chapter 2: Architectures for Streaming and Real-Time Machine Learning Chevron down icon Chevron up icon
Chapter 3: Data Analysis on Streaming Data Chevron down icon Chevron up icon
Part 2: Exploring Use Cases for Data Streaming Chevron down icon Chevron up icon
Chapter 4: Online Learning with River Chevron down icon Chevron up icon
Chapter 5: Online Anomaly Detection Chevron down icon Chevron up icon
Chapter 6: Online Classification Chevron down icon Chevron up icon
Chapter 7: Online Regression Chevron down icon Chevron up icon
Chapter 8: Reinforcement Learning Chevron down icon Chevron up icon
Part 3: Advanced Concepts and Best Practices around Streaming Data Chevron down icon Chevron up icon
Chapter 9: Drift and Drift Detection Chevron down icon Chevron up icon
Chapter 10: Feature Transformation and Scaling Chevron down icon Chevron up icon
Chapter 11: Catastrophic Forgetting Chevron down icon Chevron up icon
Chapter 12: Conclusion and Best Practices Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.2
(9 Ratings)
5 star 55.6%
4 star 33.3%
3 star 0%
2 star 0%
1 star 11.1%
Filter icon Filter
Top Reviews

Filter reviews by




Syeman Feb 21, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book is well organized and provides important concepts for working with streaming data for use in machine learning. An aspect I like about it is the exposure to tools to be used for different parts of the process.
Amazon Verified review Amazon
Kim ly Oct 18, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have been working on big data analysis, especial streaming data, this book have saved me so much times to watch tutorial, The Author has provided a lot of coding example that I can learn and apply for my project. More than that, this book also very useful to explain the complex terminology or concept about big data. Highly Recommend.
Amazon Verified review Amazon
Amazon Customer Sep 28, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is about stream data machine learning using Python library River. The stream ML is different from regular ML.The book discusses a lot of applications using River, such as Online Anomaly Detection, Online Classification, Online Regression, Reinforcement Learning and Drift and Drift Detection, et al.It offers ready to use codes for the popular algorithms, OneClassSVM, Isolation Forest (HalfSpaceTrees), LogisticRegression, Perceptron(), RandomForest, ALMAClassifier, passive-aggressive (PA) classifier, LinearRegression, HoeffdingAdaptiveTreeRegressor, SGTRegressor, SRPRegressor.I like this book and I think it is a good book for the readers who want to learn stream data ML.
Amazon Verified review Amazon
@maxgoff Aug 20, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Review of Machine Learning for Streaming Data with Python(authored by Joos Korstanje)"Streaming viewership surpassed cable TV for the first time, says Nielsen”-- Headline from TechCrunch Article, 18 August 2022Data science is a calling.As Jennifer Shin, Senior Principal Data Scientist at Nielsen is quoted as saying:“’Possessed’ is probably the right word. I often tell people, ‘I don’t want to necessarily be a data scientist. You just kind of are a data scientist. You just can’t help but look at that data set and go, ‘I feel like I need to look deeper. I feel like that’s not the right fit.’”I think it’s interesting that I am writing this review of this particular book at this particular time, when Nielsen is reporting the (inevitable) ascendency of streaming viewership, (inevitably) surpassing that of cable. The trend in that direction has been clear for years now. And we hit that particular milestone just as Joos’ text is being published. Good timing, coincidence, dharma or part of the Great Universe’s Master Plan, the fact is, the knowledge from this text must be assimilated well and quickly by practitioners of the Art and Science of Machine Learning in production environments today.Streaming is the future of data processing. Especially with a doubling of IoT-connected devices over the next four years, each one generating real-time feeds, each device begging for immediate consumption of their data, Machine Learning for Streaming Data must be mastered by those of us, like Jennifer, who are possessed by this calling.If you haven’t used the River package in python, this book offers a very useful tutorial. River is a library to build online machine learning models using python. What’s an ‘online ML model?’ It’s a term meant to differentiate between more traditional approaches to ML, called offline learning.Offline learning is an approach that ingests all the data at one time to build a model whereas online learning is an approach that ingests data one observation at a time.Online ML models operate on data streams. But the concept of a data stream is a bit vague.In general, a data stream is a sequence of individual elements. In the case of machine learning, each element is a bunch of features. We call these samples, or observations. Each sample might follow a fixed structure and always contain the same features. But features can also appear and disappear over time, depending on the use case.Regardless of data source or use case, the River package can be very useful when it comes to ML for streaming data.I enjoyed digesting this book. If you write code and need to jump-start your understanding of ML for streaming data, this is the text for you. Joos’ book with associated code provides a quick introduction to the field with sufficient code examples to get you well on your way.
Amazon Verified review Amazon
Sonali Aug 30, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book nicely translates fundamentals of both classical Machine Learning using descriptive statistics as well as Deep Learning into its streaming counterpart. Streaming analytics is a lesser ventured area and not much research is available both from academia as well as industry. Given scarcity of resources on this topic, the author has done a great job in explaining existing Machine Learning algorithms using streaming context. The concept is nicely backed by coding examples which are easy to follow.In addition to Machine Learning concepts for streaming data, this book also discusses issues with data and best practices with streaming data as data drift. This is so important and often missed in productization of Machine Learning Models.And last but not the least, the book discusses in-depth on using reinforcement learning techniques for streaming data. This is again a novel concept and has many applications typically in the financial domain.Overall, I thoroughly enjoyed the book and am eager to apply some of the concepts discussed!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.