Data exploration, visualization, and discovery
Data visualization is not something that is done at the end of a long, costly Business Intelligence (BI) project. It is not the cute dashboard that we create to justify the investment in a new data warehouse and several Online Analytical Processing (OLAP) cubes. Data visualization is an integral part of a data exploration process that begins on the first day that we start extracting raw data.
The importance and effectiveness of using data visualization when we are exploring data is highlighted using Anscombe's quartet. Each of the following scatterplots analyzes the correlation between two variables. Correlation can also be explained numerically by means of R-squared. If we were to summarize the correlations of each of the following scatterplots using R-squared, we would discover that the number is be the same for each scatterplot, .816. It is only by visualizing the data in a two-dimensional space do we notice how different each correlation...