Serialization and deserialization formats and data types
Serialization and deserialization formats are popularly known as SerDes. Hive allows the framework to read or write data in a particular format. These formats parse the structured or unstructured data bytes stored in HDFS in accordance with the schema definition of Hive tables. Hive provides a set of in-built SerDes
and also allows the user to create custom SerDes
based on their data definition. These are as follows:
LazySimpleSerDe
RegexSerDe
AvroSerDe
OrcSerde
ParquetHiveSerDe
JSONSerDe
CSVSerDe
How to do it…
You can use different types of SerDes
for reading or writing the data in a particular format.
LazySimpleSerDe
This is the default SerDes
format of Hive. When a user creates a table in Hive without any explicit SerDes
definition, LazySimpleSerDe
gets associated with the table. LazySimpleSerDe
takes line feed (\n
) as the record separator and tab ('\t'
) as the attribute (column) delimiter. It parse the data bytes it receives from HDFS and...