Exploring the employment data
Now that the data is imported into R and we have learned some strategies to import larger datasets into R, we will do some preliminary analysis of the data. The purpose is to see what the data looks like, identify idiosyncrasies, and ensure that the rest of the analysis plan can move forward.
Getting ready
If you completed the last recipe, you should be ready to go.
How to do it...
The following steps will walk you through this recipe to explore the data:
- First, let's see how large this data is:
> dim(ann2012) [1] 3556289 15
Good, it's only 15
columns.
- Let's take a peek at the first few rows so that we can see what the data looks like:
head(ann2012)
You can refer to the following screenshot:
What are the variables own_code
,
industry_code
, and so on, and what do they mean? We might need more data to understand this data.
- There is also a weird idiosyncrasy in this data. Some of the values for
total_annual_wages
,taxable_annual_wages
, andannual_contributions...