Running a highly selective query on a big fact table using AWS Glue
We will start with one of the common data processing use cases, where you would end up scanning a large volume of data but it returns a selected value as a result. For example, if you want to find out the city with the highest population within the US, it would end up scanning data for more than 19,000 cities and then returning only one city as a result. Working with a large volume of data comes with the challenges of high amounts of processing costs and spending a lot of time scaling them. You should know the right techniques for data filtering to avoid any kind of data processing bottlenecks.
In this section, you will learn how to handle highly selective queries with AWS Glue. Let’s say that you have a use case to query a big fact table that consists of humongous clickstream data stored in Amazon S3 that contains billions of records. The clickstream data stores information that’s been collected about...