Data merging with dplyr
In practical data analysis, the information we need is not necessarily confined to one table but is spread across multiple tables. Storing data in separate tables is memory-efficient but not analysis-friendly. Data merging is the process of merging different datasets into one table to facilitate data analysis. When joining two tables, there need to be one or more columns, or keys, that exist in both tables and serve as the common ground for joining.
This section will cover different ways to join tables and analyze them in combination, including inner join, left join, right join, and full join. The following list shows the verbs and their definitions for these four types of joining:
inner_join()
: Returns common observations in both tables according to the matching key.left_join()
: Returns all observations from the left table and matched observations from the right table. Note that in the case of a duplicate key value in the right table, an additional...