Analyzing Your Dataset
Previously, we learned about the overall structure of a dataset and the kind of information it contains. Now, it is time to really dig into it and look at the values of each column.
First, we need to import the pandas
package:
import pandas as pd
Then, we'll load the data into a pandas
DataFrame:
file_url = 'https://github.com/PacktWorkshops/'\ Â Â Â Â Â Â Â Â Â Â Â 'The-Data-Science-Workshop/blob/'\ Â Â Â Â Â Â Â Â Â Â Â 'master/Chapter10/dataset/'\ Â Â Â Â Â Â Â Â Â Â Â 'Online%20Retail.xlsx?raw=true' df = pd.read_excel(file_url)
The pandas
package provides several methods so that you can display a snapshot of your dataset. The most popular ones are head()
, tail()
, and sample()
.
The head()
method will show the top rows of your dataset. By default, pandas
will display...