Dealing with data
Typically, when you deal with data, this is the path you go through: you fetch it; you clean and manipulate it; and then you analyze it and present results as values, spreadsheets, graphs, and so on. We want you to be in charge of all three steps of the process without having any external dependency on a data provider, so we're going to do the following:
- We're going to create the data, simulating that it comes in a format that is not perfect or ready to be worked on.
- We're going to clean it and feed it to the main tool we'll use in the project, which is a
DataFrame
from thepandas
library. - We're going to manipulate the data in a
DataFrame
. - We're going to save a
DataFrame
to a file in different formats. - We're going to analyze the data and get some results out of it.
Setting up the Notebook
First things first, let's produce the data. We start from the ch13-dataprep
Notebook. Cell...