Adding columns from different DataFrames
All DataFrames can add new columns to themselves. However, as usual, whenever a DataFrame is adding a new column from another DataFrame or Series, the indexes align first, and then the new column is created.
This recipe uses the employee dataset to append a new column containing the maximum salary of that employee's department.
How to do it…
- Import the employee data and select the
DEPARTMENT
andBASE_SALARY
columns in a new DataFrame:>>> employee = pd.read_csv("data/employee.csv") >>> dept_sal = employee[["DEPARTMENT", "BASE_SALARY"]]
- Sort this smaller DataFrame by salary within each department:
>>> dept_sal = dept_sal.sort_values( ... ["DEPARTMENT", "BASE_SALARY"], ... ascending=[True, False], ... )
- Use the
.drop_duplicates
method to keep the first row of eachDEPARTMENT
: ...