Chapter 9: Serverless ETL Pipelines
In the previous chapter, you learned how to tame unstructured or loosely structured data using Athena to manipulate logs, JavaScript Object Notation (JSON), and other types of machine-generated data. In this chapter, we'll continue with the theme of controlling chaos by using automation to normalize newly arrived data through a process known as extract, transform, load (ETL). We start with a brief explanation of ETL, and once we've established a basic understanding of ETL processes, we will move on to best practices and common pitfalls of using Athena for ETL.
As with most of the chapters in this book, we'll then get hands-on by designing and implementing a serverless ETL pipeline. More precisely, we'll implement the serverless ETL pipeline discussed in Chapter 2, Introduction to Amazon Athena. In that chapter, we described a fictional hedge fund with a propensity for trading widely shorted meme stocks. Their equally fictional...