Performing Multivariate Analysis in Python
A common problem we will typically face with large datasets is analyzing multiple variables at once. While techniques covered under univariate and bivariate analysis are useful, they typically fall short when we are required to analyze five or more variables at once. The problem with working with high-dimensional data (data with several variables) is a well-known one, and it is commonly referred to as the curse of dimensionality. Having many variables can be a good thing because we can glean more insights from more data. However, it can also be a challenge because there aren’t many techniques that can analyze or visualize several variables at once.
In this chapter, we will cover multivariate analysis techniques that can be used to analyze several variables at once. We will cover the following:
- Implementing cluster analysis on multiple variables using Kmeans
- Choosing the optimal number of K clusters in Kmeans
- Profiling...