Dimensionality reduction models in MLlib require vectors as inputs. However, unlike clustering that operated on an RDD[Vector], PCA and SVD computations are provided as methods on a distributed RowMatrix (this difference is largely down to syntax, as a RowMatrix is simply a wrapper around an RDD[Vector]).
Training a dimensionality reduction model
Running PCA on the LFW dataset
Now that we have extracted our image pixel data into vectors, we can instantiate a new RowMatrix.
def computePrincipalComponents(k: Int): Matrix
Computes the top k principal components. Rows correspond to observations, and columns correspond to variables. The principal components are stored as a local matrix of size n-by-k. Each column corresponds for one principal component, and the columns...
Computes the top k principal components. Rows correspond to observations, and columns correspond to variables. The principal components are stored as a local matrix of size n-by-k. Each column corresponds for one principal component, and the columns...