PCA using H2O
One of the greatest difficulties encountered in multivariate statistical analysis is the problem of displaying a dataset with many variables. Fortunately, in datasets with many variables, some pieces of data are often closely related to each other. This is because they actually contain the same information, as they measure the same quantity that governs the behavior of the system. These are therefore redundant variables that add nothing to the model we want to build. We can then simplify the problem by replacing a group of variables with a new variable that encloses the information content.
PCA generates a new set of variables, among them uncorrelated, called principal components; each main component is a linear combination of the original variables. All principal components are orthogonal to each other, so there is no redundant information. The principal components as a whole constitute an orthogonal basis for the data space. The goal of PCA is to explain the maximum amount...