Collecting geodemographic data for modeling
Before you start developing models, it is critical that you gather, clean, explore, and process data in a way that will lead to the most effective clustering models. You may recall that these four steps are the first four steps in the data science pipeline we’ve discussed throughout this book. To begin, you’ll leverage the Census API to collect geodemographic data.
Extracting data using the Census API
The clustering exercise that you’ll work through later on in this chapter focuses on building out geodemographic clusters for New York City (NYC). To do this, you’ll first need to collect data utilizing the US Census Bureau API. To pull data via this API, you’ll need to request an API key by visiting https://api.census.gov/data/key_signup.html. Requesting an API key and pulling data from the Census Bureau is free and open to the public. After requesting a key, you will be given a unique 40-digit alphanumeric...