Joining DataFrames with pd.DataFrame.join
While pd.merge
is the most common approach for merging two different pd.DataFrame
objects, the lesser used yet functionally similar pd.DataFrame.join
method is another viable option. Stylistically, you can think of pd.DataFrame.join
as a shortcut for when you want to augment an existing pd.DataFrame
with a few more columns; by contrast, pd.merge
defaults to treating both pd.DataFrame
objects with equal importance.
How to do it
To drive home the point about pd.DataFrame.join
being a shortcut to augment an existing pd.DataFrame
, let’s imagine a sales table where the row index corresponds to a salesperson but uses a surrogate key instead of a natural key:
sales = pd.DataFrame(
[[1000], [2000], [4000]],
columns=["sales"],
index=pd.Index([42, 555, 9000], name="salesperson_id")
)
sales = sales.convert_dtypes(dtype_backend="numpy_nullable")
sales
sales
salesperson_id
42 ...