Datasets are the backbone of any data science project—with a good, well-structured dataset, we will have more chances to explore and discover important insights from the data; conversely, a bad dataset can lead to erroneous and harmful conclusions and decision-making. This is why we need to pay extra attention to see what kind of data we are working with, well before starting developing code to analyze it.
In this section, we will go over some things to keep in mind in terms of the data for our projects, as well as some hands-on practices of working with datasets. These practices will help us to form good habits that place us at a good starting point when working on a data-related project.
Now, the first step we need to take to start a data science pipeline is to actually determine what question and/or problem we are trying to address. After that,...