Building a robust ETL pipeline with US construction data in AWS
In this section, we’ll dive into a real-world scenario by constructing an ETL pipeline using US construction market data, which is conveniently available through the AWS Marketplace: https://aws.amazon.com/marketplace/pp/prodview-6dxonc3cvfpeq#dataSets. The Construction Marketing Data Warehouse (CMDW) contains an array of residential, commercial, and solar construction projects, as well as businesses operating within the US. This gives you a lot of content to play around with! As with the previous sections of this chapter, we will initiate a simplistic approach to developing an AWS data pipeline for the CMDW data; we highly encourage you to spend some time building out this pipeline to a professional level.
Prerequisites
This pipeline will write data to and from an AWS S3 bucket. As you may recall from Chapter 10, we must use the boto3
Python module to connect to S3. We’ve listed boto3
and the other...