HCatalog
HCatalog (see https://cwiki.apache.org/confluence/display/Hive/HCatalog) is a metadata management system for Hadoop data. It stores consistent schema information for Hadoop ecosystem tools, such as Pig, Hive, and MapReduce. By default, HCatalog supports data in the format of RCFile, CSV, JSON, SequenceFile, ORC file, and a customized format if InputFormat
, OutputFormat
, and SerDe are implemented. By using HCatalog, users are able to directly create, edit, and expose (via its REST API) metadata, which becomes effective immediately in all tools sharing the same piece of metadata. At first, HCatalog was a separate Apache project from Hive and was part of Apache Incubator, where most Apache projects first started. Eventually, HCatalog became a part of the Hive project in 2013 starting with Hive 0.11.0.
HCatalog is built on top of the Hive metastore and incorporates support for Hive DDL. It provides read and write interfaces and HCatLoader
and HCatStorer
, for Pig, by implementing Pig...