Understanding the limitations of serverless MapReduce
MapReduce on a serverless platform can work very well. However, there are limitations that you need to keep in mind. First and foremost, memory, storage, and time limits will ultimately determine whether this pattern is possible for your dataset. Additionally, systems such as Hadoop are frameworks that one may use for any analysis. When implementing MapReduce in a serverless context, you will likely be implementing a system that will solve a particular problem.
I find that a serverless MapReduce implementation is viable when your final dataset is relatively small (a few hundred megabytes) such that your reducer can process all of the data without going over the memory limits for your FaaS provider. I will talk through some of the details behind that sentiment in the following.
Memory limits
In the reducer phase, all of the data produced from the mappers must, at some point, be read and stored in memory. In our example application, the reducer...