Visualizing high-dimensional data
One of the first steps when working with a new dataset should be to systematically look into the data, finding patterns, hypotheses, and insights by manually inspecting your dataset. While this advice might make sense to you at first, it will be hard to follow when your dataset contains thousands of numerical values in a spreadsheet. How should you navigate the data? What should you look for? And what insights can you get?
A great way to get quick insights and a good understanding of your data is to visualize it. This will also help you to identify clusters in your data and irregularities and anomalies—all things that need to be considered in all further data processing. But how can you visualize a dataset with 10, 100, 1,000 feature dimensions? And where should you keep the analysis?
In this section, we will answer all these questions. First, we will explore Azure Machine Learning functionality to register Matplotlib figures with your...