Data wrangling and analyzing iTunes data
The terms "data wrangling" and "data munging" have become common phrases in data science, and generally mean to clean and prepare data for downstream uses such as analytics and modeling. Let's dive into data wrangling with the chinook iTunes dataset.
Loading and saving data with Pandas
In this first example, we're working for Apple in the iTunes analytics department. Our first task is to find any useful information from a set of music sales data that could improve the iTunes business. We'll be using the chinook dataset again, which is a sampling of iTunes data that we used in Chapter 3, SQL and Built-in File Handling Modules in Python.
The first step to wrangling data is, of course, loading it. Pandas provides several functions to load data for a variety of file types. A co-worker in the iTunes department gave us CSV, Excel, and SQLite database files that we need to load into Python for analysis...