Deep learning is the focused study of machine learning algorithms that deploy neural networks as their main method of learning. Deep learning has exploded onto the scene just within the last couple of years. Microsoft, Google, Facebook, Amazon, Apple, Tesla and many other companies are all utilizing deep learning models in their apps, websites, and products. At the same exact time, Spark, an in-memory compute engine running on top of big data sources, has made it easy to process volumes of information at record speeds and ease. In fact, Spark has now become the leading big data development tool for data engineers, machine learning engineers, and data scientists.
Since deep learning models perform better with more data, the synergy between Spark and deep learning allowed for a perfect marriage. Almost as important as the code used to execute deep learning algorithms is the work environment that enables optimal development. Many talented minds are eager to develop neural networks to help answer important questions in their research. Unfortunately, one of the greatest barriers to the development of deep learning models is access to the necessary technical resources required to learn on big data. The purpose of this chapter is to create an ideal virtual development environment for deep learning on Spark.