Summary
In this chapter, we learned how to analyze a problem and identified that it was a big data problem. We also learned how to choose a platform and technology that will be performance-savvy, optimized, and cost-effective. We learned how to use all these factors judiciously to develop a big data batch processing solution in the cloud. Then, we learned how to analyze, profile, and draw inferences from big data files using AWS Glue DataBrew. After that, we learned how to develop, deploy, and run a Spark Java application in the AWS cloud to process a huge volume of data and store it in an ODL. We also discussed how to write an AWS Lambda trigger function in Java to automate the Spark jobs. Finally, we learned how to expose the processed ODL data through an AWS Athena table so that downstream systems can easily query and use the ODL data.
Now that we have learned how to develop optimized and cost-effective batch-based data processing solutions for different kinds of data volumes...