Copying data from the internet
This recipe will teach you how we can connect to a data source from a public internet URL. We're going to use demographic income Status of Income (SOI) data from the Internal Revenue Service (IRS). We will explore and transform this data with our Databricks workspace environment.
We will get data from the following links:
- https://www.irs.gov/pub/irs-soi/17zpallnoagi.csv: This is a CSV file that contains the year 2017 incomes per state per zip code.
- https://worldpopulationreview.com/static/states/abbr-name.csv: This CSV file contains state abbreviations (codes) and state names.
The first file contains income data and state codes. We will use the second file to augment the first one with state names. We will also aggregate the state income file to get one state income per state instead of zip code.
Getting ready
This recipe assumes that you have access to an Azure subscription. It can be a free trial one as described in...