Stock price binary classification problem
Stock prices have a tendency to go up and down. We want to Spark ML and a Spark time-series library to explore historical stock price data going back a couple years and come up numbers like the average closing price. We also want our stock price prediction model to forecast what the stock price will be over the timeframe of a few days.
This chapter presents an ML methodology to reduce the complexity associated with stock price prediction. We will obtain a smaller set of optimal financial indicators by feature selection and employ a Random Forest algorithm to build a price prediction pipeline.
We must first download the dataset from the ModernScalaProjects_Code
folder.
Stock price prediction dataset at a glance
We will use data from two sources:
- Reddit worldnews
- Dow Jones Industrial Average (DJIA)
The Getting started section that follows has two clear goals:
- Moving our development environment into a virtual appliance from a previous local Spark shell-centered...