Dealing with data
Typically, when you deal with data, this is the path you go through: you fetch it; you clean and manipulate it; and then you analyze it and present results as values, spreadsheets, graphs, and so on. We want you to be able to perform all three steps of the process without having any external dependency on a data provider, so we are going to do the following:
- Create the data, simulating that it comes in a format that is not perfect or ready to be worked on.
- Clean it and feed it to the main tool we will use in the project, which is a
DataFrame
from thepandas
library. - Manipulate the data in a
DataFrame
. - Save a
DataFrame
to a file in different formats. - Analyze the data and get some results out of it.
Setting up the Notebook
First, let us produce the data. We start from the ch13-dataprep
Notebook. Cell #1
takes care of the imports:
#1
import json
import random
from datetime import date, timedelta
import faker
The...