6. Big Data File Formats
Overview
This chapter introduces popular big data file formats and skims through their advantages and disadvantages. The file formats that are covered in the chapter are Avro, ORC, and Parquet. It will walk through the code snippets required to implement their transformation and conversion to the desired file format. It will also educate you on attributes such as compression and the read-write strategy and executing queries to highlight the operational performance.
By the end of the chapter, you will be able to select the optimum file format for any user-specific case. You will strengthen these concepts by applying them to a real-world situation and get first-hand experience of performing the necessary queries.