In this chapter, we will install, configure, and deploy a local analytical development environment by provisioning a self-contained single-node cluster that will allow us to do the following:
- Prototype and develop machine learning models and pipelines in Python
- Demonstrate the functionality and usage of Apache Spark's machine learning library, MLlib, via the Spark Python API (PySpark)
- Develop and test machine learning models on a single-node cluster using small sample datasets, and thereafter scale up to multi-node clusters processing much larger datasets with little or no code changes required
Our single-node cluster will host the following technologies:
- Operating system: CentOS Linux 7
https://www.centos.org/download/ - General Purpose Programming Languages:
- Java SE Development Kit (JDK) 8 (8u181)
https://www.oracle.com/technetwork...
- Java SE Development Kit (JDK) 8 (8u181)