The training of Multilayer Neural Networks (MNNs) is computationally expensive—it involves huge datasets, and there is also the need to complete the training process in the fastest way possible. In Chapter 1, The Apache Spark Ecosystem, we have learned about how Apache Spark can achieve high performances when undertaking large-scale data processing. This makes it a perfect candidate to perform training, by taking advantage of its parallelism features. But Spark alone isn't enough—its performances are excellent, in particular for ETL or streaming, but in terms of computation, in an MNN training context, some data transformation or aggregation need to be moved down using a low-level language (such as C++).
Here's where the ND4J (https://nd4j.org/index.html) module of DL4J comes into play. There&apos...