Exploring Your Data
If you are running your project by following the CRISP-DM methodology, the first step will be to discuss the project with the stakeholders and clearly define their requirements and expectations. Only once this is clear can you start having a look at the data and see whether you will be able to achieve these objectives.
After receiving a dataset, you may want to make sure that the dataset contains the information you need for your project. For instance, if you are working on a supervised project, you will check whether this dataset contains the target variable you need and whether there are any missing or incorrect values for this field. You may also check how many observations (rows) and variables (columns) there are. These are the kind of questions you will have initially with a new dataset. This section will introduce you to some techniques you can use to get the answers to these questions.
For the rest of this section, we will be working with a dataset...