Example 3 (challenges 1, 3, 5, and 6)
In this example, we would like to figure out what makes a song rise to the top 10 songs on Billboard (https://www.billboard.com/charts/hot-100) and stay there for at least 5 weeks. Billboard magazine publishes a weekly chart that ranks popular songs based on sales, radio play, and online streaming in the United States. We will integrate three CSV files – billboardHot100_1999-2019.csv
, songAttributes_1999-2019.csv
, and artistDf.csv
from https://www.kaggle.com/danield2255/data-on-songs-from-billboard-19992019 to do this.
This is going to be a long example with many pieces that come together. How you organize your thoughts and work in such data integration challenges is very important. So, before reading on, spend some time getting to know these three data sources and form a plan. This will be a very valuable practice.
Now that you've had a chance to think about how you would go about this, let's do this together. These datasets...