Initial Data Analysis
As a rule of thumb, when starting the analysis of a new dataset, it is good practice to check the dimensionality of the data, type of columns, possible missing values, and some generic statistics on the numerical columns. We can also get the first 5 to 10 entries in order to acquire a feeling for the data itself. We'll perform these steps in the following code snippets:
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline # import data from the GitHub page of the book data = pd.read_csv('https://raw.githubusercontent.com'\ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â '/PacktWorkshops/The-Data-Analysis-Workshop'\ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â '/master/Chapter02/data/'\ Â Â Â Â Â Â Â Â Â Â Â ...