Dimensionality reduction
Imagine a large matrix with many rows and columns. In many matrix applications, this large matrix can be represented by some narrow matrices with small number of rows and columns that still represents the original matrix. Then processing this smaller matrix may yield similar results as that of the original matrix. This can be computationally efficient.
Dimensionality reduction is about finding that small matrix. MLLib supports two algorithms, SVD and PCA for dimensionality reduction on RowMatrix class. Both of these algorithms allow us to specify the number of dimensions we are interested in retaining. Let us look at example first and then delve into the underlying theory .
Example 9: Dimensionality reduction
Scala:
scala> import scala.util.Random import scala.util.Random scala> import org.apache.spark.mllib.linalg.{Vector, Vectors} import org.apache.spark.mllib.linalg.{Vector, Vectors} scala> import org.apache.spark.mllib.linalg.distributed.RowMatrix import...