Storage
In today’s data-driven world, storing and managing vast amounts of information is crucial. Consider ADLS Gen2 as your data lake storage. Azure Data Lake Storage Gen2 (ADLS Gen2) offers a secure and scalable solution specifically designed for Big Data Analytics.
You can create the following folder structure to handle your batch pipeline:
- The raw trip data can be stored here:
iac/raw/trips/2024/01/01
. - The cleaned-up data can be copied over to the
transform/in
folder:iac/transform/in/2024/01/01
. - The output of the transformed data can be moved into the
transform/out
folder:iac/transform/out/2024/01/01
. - Finally, you can import the data from
transform/out
into a Synapse SQL dedicated pool using PolyBase.
Note that tools such as ADF and PolyBase also provide the ability to directly move data between Spark and Synapse SQL dedicated pools. You can choose this direct approach instead of storing the intermediate data in the data lake if that works...