The advances in distributed computing environments and an easy availability of internet worldwide has resulted in the emergence of Distributed Artificial Intelligence (DAI). In this chapter, we will learn about two frameworks, one by Apache the machine learning library (MLlib), and another H2O.ai, both provide distributed and scalable machine learning (ML) for large, streaming data. The chapter will start with an introduction to Apache's Spark, the de facto distributed data processing system. This chapter will cover the following topics:
- Spark and its importance in distributed data processing
- Understanding the Spark architecture
- Learning about MLlib
- Using MLlib in your deep learning pipeline
- Delving deep into the H2O.ai platform