Merging datasets
Besides the previously described elementary actions on a single dataset, joining multiple data sources is one of the most used methods in everyday action. The most often used solution for such a task is to simply call the merge
S3 method, which can act as a traditional SQL inner and left/right/full outer joiner of operations—represented in a brief summary by C.L. Moffatt (2008) as follows:
The dplyr
package provides some easy ways for doing the previously presented join operations right from R, in an easy way:
inner_join
: This joins the variables of all the rows, which are found in both datasetsleft_join
: This includes all the rows from the first dataset and join variables from the other tablesemi_join
: This includes only those rows from the first dataset that are found in the other one as wellanti_join
: This is similar tosemi_join
, but includes only those rows from the first dataset that are not found in the other oneNote
For more examples, take a look at the Two-table...