Optimizing analytics
If data is the new gold, we want to ensure we’re mining it without incurring waste. Data analytics and ML are discussion topics that deserve their own books but, in this section, we’ll summarize cost-optimization considerations when running these types of workloads. Broadly, we can categorize the steps involved as data ingestion, data exploration, model training, and model deployment.
We already know about Amazon S3 as an object store that functions nicely as a data lake. With data in S3, we can use a managed service such as Amazon Athena to run Structured Query Language (SQL) queries directly on our data in Amazon S3. Athena is serverless, meaning you don’t have to manage any infrastructure to run SQL queries on your data. Additionally, it scales automatically and parallelizes queries on large datasets without you having to specify configurations. It also requires no maintenance because the underlying servers powering Athena are managed...