Understanding the data
Once the data is up in RStudio, we should look at it. The first checkup points are to confirm that the data was fully loaded and without errors. For that, we can use the software’s built-in viewer by typing View(df)
, or we can use the head()
function to look at just a couple of lines, remembering that the default is to show the first six observations:
df %>% head()
It displays the following result:
Figure 9.1 – Head of the College Majors dataset
After a first look, the data looks good. In general, what we are looking for here are the following:
- Whether the data is rectangular—in other words, divided into rows and columns.
- Whether we see any problems with language encoding, which, when it occurs, shows some symbols amidst the words.
- Whether the CSV reading was successful for all columns because if the separator for the file is a semicolon or tab, for example, the columns can appear all merged...