Reducing Rows and Columns in Your Result Sets
Today the sources of data that a data analyst has access to have grown to the point that the amount of data that is available to you is unlimited. The challenge that a data analyst faces today is to determine how to generate a result set that is manageable and has the information that ensures that it will meet the needs of the analyst for their reports and analysis. If there is too much data, the result set will become unmanageable and unusable due to information overload; too little data and the data will have gaps, and the end user will lose trust in the data.
In this chapter, we will review how you determine how much data and what data you should keep in your result set and how to filter that data efficiently. We will also review how to determine which columns you should keep and how you can efficiently select the correct columns. The chapter will then wrap up with a short discussion on how these activities will impact future data aggregations.
By the end of this chapter, you will understand how to identify the data and columns that you need and the most efficient method for getting the data into a usable result set that can easily be recreated.
In this chapter, we will cover the following main topics:
- Identifying data to be removed from the dataset
- Understanding the value of creating views versus removing data
- Exploring the impact of row and column reductions on aggregations