The goal of this section will be to get started with developing a data pipeline using the Random Forests algorithm.
Implementation objectives
List of implementation goals
The following implementation objectives are the same and cover both the Random Forests pipeline and linear regression. We will perform preliminary steps such as Exploratory Data Analysis (EDA) once and develop specific implementation code that pertains to a particular pipeline. Therefore, the implementation objectives are listed here as follows:
- Get the stock price dataset.
- Carry out preliminary EDA in the Sandbox Zeppelin Notebook environment (or Spark shell), and run a statistical analysis.
- Develop the pipeline incrementally in Zeppelin, and port...