Using different file formats
Storage is said to be cheap nowadays. That does not mean that we should waste our money. When you store a lot of data and pay for the volume of data stored, it pays to compress your data.
When you use the on-demand options of Azure Databricks or Azure Synapse Analytics to process data in a data lake, it also pays to reduce the total duration of the processing. Both storage and processing are arguments to have a look at the different big data file formats that come from the Hadoop platform.
PolyBase in Synapse Analytics can work with delimited text files, ORC files, and Parquet files. Azure Data Factory can also work with AVRO files. Other processing platforms might even have other file types. AVRO, Parquet, and ORC evolved in Hadoop to decrease the cost of storage and compute. With the right file format, you can do the following:
- Increase read performance
- Increase write performance
- Split files to get more parallelism
- Add support...