So far, we've learned how to create a Hadoop cluster and how to load data into it. In the previous chapter, we learned about various data ingestion tools and techniques. As we know by now, there are various open source tools available in the market, but there is a single silver bullet tool that can take on all our use cases. Each data ingestion tool has certain unique features; they can prove to be very productive and useful in typical use cases. For example, Sqoop is more useful when used to import and export Hadoop data from and to an RDBMS.
In this chapter, we will learn how to store and model data in Hadoop clusters. Like data ingestion tools, there are various data stores available. These data stores support different data models—that is, columnar data storage, key value pairs, and so on; and they support various file formats, such as ORC...