Summary
This chapter presents some of the ideas and algorithms involved in the analysis of very large datasets. The two main algorithms are Google's PageRank algorithm and the MapReduce framework.
To illustrate how MapReduce works, we have implemented the WordCount example, which counts the frequencies of the words in a collection of text files. The more realistic implementation would be with MongoDB, which is presented in Chapter 10, NoSQL Databases, or in Apache Hadoop, which is briefly described in this chapter.