Creating and updating Delta Lake tables using Glue
Delta Lake is also an open source framework that was initially developed by Databricks. Similar to Hudi, Delta Lake is also supported by Spark, Presto, and Hive among many others.
We will now execute the 04 - DeltaLake Init load for Data Analysis Chapter job to create a Delta Lake table. The 04 - DeltaLake Init load for Data Analysis Chapter job was created by the CloudFormation template executed earlier:
- Run the Glue job: 04 - DeltaLake Init load for Data Analysis Chapter. Notice in the job script that we are using Spark SQL to create a table definition in the Glue Catalog for the Delta Table. Here is the Spark SQL statement from the code of the 04 - DeltaLake Init load for Data Analysis Chapter job:
spark.sql("CREATE TABLE `chapter-data-analysis-glue-database`.employees_deltalake (emp_no int, name string, department string, city string, salary int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe...