Structured streaming for near real-time machine learning
In this recipe, we explore the new structured streaming paradigm introduced in Spark 2.0. We explore real-time streaming using sockets and structured streaming API to vote and tabulate the votes accordingly.
We also explore the newly introduced subsystem by simulating a stream of randomly generated votes to pick the most unpopular comic book villain.
Note
There are two distinct programs (VoteCountStream.scala
and CountStreamproducer.scala
) that make up this recipe.
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter13
- Import the necessary packages for the Spark context to get access to the cluster and
log4j.Logger
to reduce the amount of output produced by Spark:
import org.apache.log4j.{Level, Logger} import org.apache.spark.sql.SparkSession import java.io.{BufferedOutputStream...