What Is Dimensionality Reduction?
Dimensionality reduction is an important tool in any data scientist's toolkit, and due to its wide variety of use cases, is essentially assumed knowledge within the field. So, before we can consider reducing the dimensionality and why we would want to reduce it, we must first have a good understanding of what dimensionality is. To put it simply, dimensionality is the number of dimensions, features, or variables associated with a sample of data. Often, this can be thought of as a number of columns in a spreadsheet, where each sample is on a new row, and each column describes an attribute of the sample. The following table is an example:
In the preceding table, we have two samples of data, each with three independent features or dimensions. Depending on the problem being solved, or the origin of this dataset, we may want to reduce the number of dimensions per...