Streaming Datasets for real-time machine learning
In this recipe, we create a streaming Dataset to demonstrate the use of Datasets with a Spark 2.0 structured programming paradigm. We stream stock prices from a file using a Dataset and apply a filter to select the day's stock that closed above $100.
The recipe demonstrates how streams can be used to filter and to act on the incoming data using a simple structured streaming programming model. While it is similar to a DataFrame, there are some differences in the syntax. The recipe is written in a generalized manner so the user can customize it for their own Spark ML programming projects.
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter13
- Import the necessary packages:
import java.util.concurrent.TimeUnit import org.apache.log4j.{Level, Logger} import org.apache.spark.sql.SparkSession...