Reading another CSV file
We can look at another CSV in the same dataset to see what kind of issues we run across. Using the yearly batting records for all Major League Baseball players that we previously downloaded from the same site, we can use coding like the following to start analyzing the data:
players <- read.csv(file="Documents/baseball.csv", header=TRUE, sep=",")head(players)
This produces the following head display:
There are many statistics for baseball players in this dataset. There are also many NA
values. R is pretty good at ignoring NA
values. Let us first look at the statistics for the data using:
summary(players)
This generates statistics on all the fields involved (there are several more that are not in this display):
A number of interesting points are visible in the preceding display that are worth noting:
- We have about 30 data points per player
- It is interesting that the player data goes back to 1871
- There are about 1,000 data points per team
- American League and National League...