Understanding your data
In this first phase, it is important to understand the meaning of each variable in the context of the problem that the dataset represents. Once the measurable business entities with which the variables are associated are clear, it is easier to infer how they interact with each other.
Having an idea of the size of the dataset, understood as the number of variables and the number of observations (rows), will help you get a first idea of the size of the data you will be dealing with. Next, it is crucial to identify and immediately define the type of variables involved (which can be numerical or categorical) in order to visualize them in the most appropriate way.
Then, knowing the descriptive statistics of the numerical variables in the dataset helps to gain greater sensitivity to their values. In addition, when you’re looking at them and trying to figure out how they’re distributed, there are a few different types of graphs that can help...