As the volume and complexity of data sources are increasing, deriving value out of data is also becoming increasingly difficult. Ever since Hadoop was made, it has built a massively scalable filesystem, HDFS. It has adopted the MapReduce concepts from functional programming to approach the large-scale data processing challenges. As technology is constantly evolving to overcome the challenges posed by data mining, enterprises are also finding ways to embrace these changes to stay ahead.
In this chapter, we will focus on these data processing solutions:
- MapReduce
- Apache Spark
- Spark SQL
- Spark Streaming