Summary
In this chapter, we discussed how to join dataframes, how to determine the data we will lose for each type of join using set operations, and how to query dataframes as we would a database. We then went over some more involved transformations on our columns, such as binning and ranking, and how to do so efficiently with the apply()
method. We also learned the importance of vectorized operations in writing efficient pandas
code. Then, we explored window calculations and using pipes for cleaner code. Our discussion of window calculations served as a primer for aggregating across whole dataframes and by groups. We also went over how to generate pivot tables and crosstabs. Finally, we looked at some time series-specific functionality in pandas
for everything from selection and aggregation to merging.
In the next chapter, we will cover visualization, which pandas
implements by providing a wrapper around matplotlib
. Data wrangling will play a key role in prepping our data for visualization...