Matrix multiplication with MapReduce
If A is an m Ă— p matrix and B is an p Ă— n matrix, then the product of A and B is the m Ă— n matrix C = AB, where the (i, j)th element of C is computed as the inner product of the ith row of A with the jth column of B:
This is a dot product—simple arithmetic if m, p, and n are small. But not so simple if we're working with big data.
The formula for cij requires p multiplications and p – 1 additions, and there are m· n of these to do. So, that implementation runs in O(mnp) time. That is slow. Furthermore, if A and B are dense matrices (that is, most elements are nonzero), then storage requirements can also be overwhelming. This looks like a job for MapReduce.
For MapReduce, think key-value pairs. We assume that each matrix is stored as a sequence of key-value pairs, one for each non-zero element of the matrix. The key is the subscript pair (i, j), and the value is the (i, j)th element of the matrix. For example, this...