MapReduce is a common data processing pattern made famous by Google and now implemented in various systems and frameworks, most notably Apache Hadoop. Nowadays, this pattern is familiar and easy to understand at its core, but running large-scale systems such as Hadoop comes with its own set of challenges and cost of ownership. In this chapter, we'll show how this pattern can be implemented on your own using serverless technologies.
Implementing big data applications in a serverless environment may seem counter-intuitive due to the computing limitations of FaaS. Certain types of problems fit very well into a serverless ecosystem, especially considering we practically have unlimited file storage with distributed filesystems such as AWS S3. Additionally, MapReduce's magic is not so much in the application of an algorithm, but in the distribution...