Chapter 2: Data Exploration and Visualization
Activity 2: Analyzing Advertisements
Import pandas and seaborn using the following code:
import pandas as pd import seaborn as sns sns.set()
Read the Advertising.csv file and look at the first few rows:
ads = pd.read_csv("Advertising.csv", index_col = 'Date') ads.head()
Look at the memory and other internal information about the DataFrame:
ads.info
This gives the following output:
As all the attributes are numeric, it is enough to understand the distribution of the data with describe():
ads.describe()
This gives the following output:
See how the values in the column are spread:
ads.quantile([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
As all values are within a reasonable range, we don't need to filter out any data and can directly proceed.
Look at the histograms of individual features and understand the values better:
sns.distplot(ads['TV'], kde = False) sns.distplot(ads['newspaper'], kde = False) sns.distplot(ads['radio'], kde = False) sns.distplot(ads['sales'], kde = False)
Looking at the data, it is clear that we are interested in analyzing behaviors that drive an increase in sales. Therefore, sales is the KPI we need to look at.
Understand the relationships between columns with this command:
sns.pairplot(ads)
This should give the following output:
You can derive the following insights from the data: Both TV and radio have a clear positive correlation with sales. The correlation with newspaper is not that direct, but as the distribution of newspapers is low, we can't make a claim about no or negative correlation.
You can also try to find unknown or hidden relationships in the data. Let's analyze the relationship between newspaper and sales:
ads[['newspaper', 'sales']].plot()
There seems to be a trend in the sales values preceding the newspaper value. We can look at this relationship in detail in further analysis. Anyway, the data seems to be fully explored now. The data from 1st Jan 2018 to 19th July 2018 has TV and radio in direct correlation with sales, but the relationship between sales and newspaper can be explored further using different techniques.