Introducing deep learning in Spark
In this section, we will review some of the popular deep learning libraries using Spark. These include CaffeOnSpark, DL4J, TensorFrames, and BigDL.
Introducing CaffeOnSpark
CaffeOnSpark was developed by Yahoo for large-scale distributed learning on Hadoop clusters. By combining the features from the learning framework Caffe Apache Spark (and Apache Hadoop), CaffeOnSpark enables distributed deep learning on a cluster of GPU and CPU servers.
Note
For more details on CaffeOnSpark, refer to https://github.com/yahoo/CaffeOnSpark.
CaffeOnSpark supports neural network model training, testing, and feature extraction. It is complementary to non-deep learning libraries, Spark MLlib and Spark SQL. CaffeOnSpark's Scala API provides Spark applications with an easy mechanism to invoke deep learning algorithms over distributed Datasets. Here, deep learning is typically conducted in the same cluster as the existing data processing pipelines to support feature engineering...