Example 1 – unpacking columns and reformulating the table
In this example, we will use the level I cleaned speech_df
dataset to create the following bar chart. We cleaned this DataFrame in the Example 1 – unwise data collection section of Chapter 9, Data Cleaning Level I – Cleaning Up the Table. The level I cleaned speech_df
database only has two columns: FileName
and Content
. To be able to create the following visual, we need columns such as the month of the speech and the number of times the four words (vote, tax, campaign, and economy) have been repeated in each speech. While the level I cleaned speech_df
dataset contains all this information, it is somewhat buried inside the two columns.
The following is a list of the information we need and the column of speech_df
that this information is stored in:
- The month of the speech: This information is in the
FileName
column. - The number of times the words vote, tax, campaign, and economy have been repeated...