Transforming the data
Our Silver layer code will process our device data, flattening, transforming, and deduplicating the data from our Bronze layer.
Refer to the following code:
val reprocess: Boolean = args(0).toBoolean val bronzeSource: String = "./src/main/scala/com/packt/dewithscala/ chapter13/data/bronze/data/" val target: String = "./src/main/scala/com/packt/dewithscala/ chapter13/data/silver/"
This code defines a reprocess
Boolean variable based on a command-line argument and sets file paths for the Bronze and Silver data sources, as shown here:
val bronzeData: DataFrame = spark.read.format("delta").load(bronzeSource)
Here, the process loads data from the Delta Lake Bronze layer located at the bronzeSource
path into a DataFrame named bronzeData
.
Next, the following code defines a schema for parsing JSON data from the value
column of bronzeData
:
val jsonSchema: StructType = StructType( Seq( ...