A workflow for data exploration
Now that you are familiar with the different ways to acquire and load data into Rstudio, let’s go over a basic workflow that I regularly use for data exploration. Naturally, the steps presented here are flexible and should be understood as a guide to begin understanding the dataset. It can and should be changed to adapt to your project’s needs.
When you start a data exploration, it is important to have in mind your final goal. What problem are you trying to solve? Then, you look to understand the variables, look for errors and missing data, understand the distributions, and create a couple of visualizations that will help you to extract good insights to help you along the way. Let’s explore the steps that can be performed:
- Load and view: Every Data Science project starts with data. Load a dataset to RStudio and take a first look at it, making sure the data types are correctly inferred and that the dataset is completely...