Working with relational databases
In the previous chapters, we used a family of built-in functions such as read.csv
and read.table
to import data from separator-delimited files, such as those in the csv format. Using text formats to store data is handy and portable. When the data file is large, however, such a storage method may not be the best way.
There are three main reasons why text formats can no longer be easy to use. They are as follows:
- Functions such as
read.csv()
are mostly used to load the whole file into memory, that is, a data frame in R. If the data is too large to fit into the computer memory, we simply cannot do it. - Even if the dataset is large, we usually don't have to load the whole dataset into memory when we work on a task. Instead, we often need to extract a subset of the dataset that meets a certain condition. The built-in data-importer functions simply do not support querying a csv file.
- The dataset is still updating, that is, we need to insert records...