Summary
In this chapter, we covered the basics of handling pandas
DataFrames to format them as inputs for different visualization functions in libraries such as pandas
, seaborn
and more, and we covered some essential concepts in generating and modifying plots to create pleasing figures.
The pandas
library contains functions such as read_csv()
, read_excel()
, and read_json()
to read structured text data files. Functions such as describe()
and info()
are useful to get information on the summary statistics and memory usage of the features in a DataFrame. Other important operations on pandas
DataFrames include subletting based on user-specified conditions/constraints, adding new columns to a DataFrame, transforming existing columns with built-in Python functions as well as user-defined functions, deleting specific columns in a DataFrame, and writing a modified DataFrame to a file on the local system.
Once equipped with knowledge of these common operations on pandas
DataFrames, we went over the basics of visualization and learned how to refine the visual appeal of the plots. We illustrated these concepts with the plotting of histograms and bar plots. Specifically, we learned about different ways of presenting labels and legends, changing the properties of tick labels, and adding annotations.
In the next chapter, we will learn about some popular visualization techniques and understand the interpretation, strengths, and limitations of each.