Connecting to an internet data source
This recipe will teach you how we can connect to a data source from a public internet URL. We're going to use demographic income (Statistics of Income - SOI) data from the Internal Revenue Service (IRS). We will explore and transform this data with our Databricks workspace environment.
We will get data from the following links:
- https://www.irs.gov/pub/irs-soi/17zpallnoagi.csv: This is a CSV file that contains the year 2017 incomes per state per zip code.
- https://worldpopulationreview.com/static/states/abbr-name.csv: This CSV file contains state abbreviations (codes) and state names.
The first file contains income data and state codes. We will use the second file to augment the first one with state names. We will also aggregate the state income file to get one state income per state instead of zip code.
Getting ready
As with every recipe in this chapter, you will need to upgrade your trial Azure subscription to...