In this chapter, we looked at the various ways in which data.table and dplyr can be used. We covered the basics of loading data from various data sources, performing basic subsetting, grouping, pivoting, and other operations from both the data.table and dplyr perspective. We saw that both packages offer a high level of versatility—data.table is much faster than dplyr and is extremely useful for large-scale datasets but it comes at the expense of learning a new syntax. dplyr, on the other hand, is relatively slower than data.table but it provides a high level of simplicity and ease of downstream analysis.
In the next chapter, we will discuss data mining techniques for both structured data that conform to a clearly defined schema and unstructured data that exists in the form of natural language text. Specific topics include pattern discovery, clustering, text retrieval...