All the examples have been implemented in Scala with some open source libraries, including Apahe Spark MLlib/ML and Deeplearning4j. However, to get the best out of this, you should have a powerful computer and software stack.
A Linux distribution is preferable (for example, Debian, Ubuntu, or CentOS). For example, for Ubuntu, it is recommended to have at least a 14.04 (LTS) 64-bit complete installation on VMware Workstation Player 12 or VirtualBox. You can run Spark jobs on Windows (7/8/10) or macOS X (10.4.7+) as well.
A computer with a Core i5 processor, enough storage (for example, for running Spark jobs, you'll need at least 50 GB of free disk storage for standalone cluster and for the SQL warehouse), and at least 16 GB RAM are recommended. And optionally, if you want to perform the neural network training on the GPU (for the last chapter only), the NVIDIA GPU driver has to be installed with CUDA and CuDNN configured.
The following APIs and tools are required in order to execute the source code in this book:
- Java/JDK, version 1.8
- Scala, version 2.11.8
- Spark, version 2.2.0 or higher
- Spark csv_2.11, version 1.3.0
- ND4j backend version nd4j-cuda-9.0-platform for GPU; otherwise, nd4j-native
- ND4j, version 1.0.0-alpha
- DL4j, version 1.0.0-alpha
- Datavec, version 1.0.0-alpha
- Arbiter, version 1.0.0-alpha
- Eclipse Mars or Luna (latest version) or IntelliJ IDEA
- Maven Eclipse plugin (2.9 or higher)
- Maven compiler plugin for Eclipse (2.3.2 or higher)
- Maven assembly plugin for Eclipse (2.4.1 or higher)