Checking the time intervals
Earlier, we mentioned needing to have equally sized time intervals. Additionally, before we perform any time series analysis, we need to check for the number of non-missing time intervals. So, let's check the number of enrollment years for each category.
Using the dplyr
package, we can use summarize (n()
) to count the number of entries for each category:
# -- summarize and sort by the number of years yr.count <- x2 %>% group_by(cat) %>% summarise(n = n()) %>% arrange(n) # - we can see that there are 14 years for all of the groups. That is good! print(yr.count, 10) > Source: local data frame [24 x 2] > > cat n > (fctr) (int) > 1 18 to 24 YEARS 14 > 2 25 to 34 YEARS 14 > 3 35 to 44 YEARS 14 > 4 45 to 54 YEARS 14 > 5 55 to 64 YEARS 14 > 6 65 YEARS AND OVER 14 > 7 ALL AGES 14 > 8 ...