Matrix multiplication with MapReduce
If A is an m × p matrix and B is an p × n matrix, then the product of A and B is the m × n matrix C = AB, where the (i, j)th element of C is computed as the inner product of the ith row of A with the jth column of B:
This is a dot product—simple arithmetic if m, p, and n are small. But not so simple if we're working with big data.
The formula for cij requires p multiplications and p – 1 additions, and there are m· n of these to do. So, that implementation runs in O(mnp) time. That is slow. Furthermore, if A and B are dense matrices (that is, most elements are nonzero), then storage requirements can also be overwhelming. This looks like a job for MapReduce.
For MapReduce, think key-value pairs. We assume that each matrix is stored as a sequence of key-value pairs, one for each non-zero element of the matrix. The key is the subscript pair (i, j), and the value is the (i, j)th element of the matrix. For example, this matrix
would be represented by the list shown...