Data exploration is an extremely important task in every data science or machine learning project. Without good knowledge of the data, we'll never succeed with our further predictive models. In this section, we will show you how to explore data using T-SQL queries, the SSIS Data Profiling Task, and a simple R function.
Data exploration
Exploring data using T-SQL
For simple data exploration, we can use T-SQL queries. Here, we will explore the uniqueness of values in columns where we estimate the uniqueness, a quality of reference between the SourceData.Contracts and SourceData.Actions tables, and also a rate of NULLs in several columns.
First of all, let's query both tables to obtain a sample of data and the structures...