Understanding the data
As already mentioned, understanding data is interleaved with data preparation. In order to know what to do, which variables need recoding, which variables have missing values, and how to combine variables into a new one, you need to deeply understand the data you are dealing with. You can get this understanding with a simple overview of the data, which might be a method good enough for small datasets, or a method for checking just a small subset of a large dataset.
You can get more information about the distribution of the variables by showing the distributions graphically. Basic statistical methods are also useful for data overview. Finally, sometimes these basic statistical results and graphs are already exactly what you need for a report.
R is an extremely powerful language and environment for both visualizations and statistics. You will learn how to:
- Create simple graphs
- Show plots and histograms
- Calculate frequencies distribution
- Use descriptive statistics methods