Summary
In this chapter, we discussed the methods and different optimization features that can be used in AWS Glue ETL to ingest data from file/object stores, JDBC-compatible data stores, and streaming data stores. We also explored serialization and deserialization, which are used by AWS GSR to handle evolving schemas. Then, we introduced Glue Studio Marketplace connectors, using which we can ingest data from SaaS. Finally, we briefly discussed how users can build custom JDBC/Spark/Athena Federated Query connectors to ingest data from data stores that are not directly supported by AWS Glue and when there is no connector readily available in AWS Marketplace.
In the next chapter, we will be discussing data preparation strategies. We'll explore different factors that can be considered while choosing the right service/tool. We will also discuss the different available options: visual data preparation versus source code-/SQL-based data preparation and the different transformation...