Chapter 4: Storing and Serving Data in a Data Lakehouse
The journey so far has covered a lot of ground, storing data in a data lake is the new architecture paradigm for data architecture. The first chapter covered trends in big data and discussed the need for a new paradigm. The second chapter provided an overview of the data lakehouse architecture and discussed the seven layers of a data lakehouse. The third chapter focused on the methods in which data can be ingested and processed in a data lakehouse. In this chapter, we will focus on storing the data in the data lake and the data serving layers of the data lakehouse architecture.
Data storage is critical from both a storage and performance perspective. This chapter will begin by providing a view of how data is stored in the data lake layer. Next, we will discuss the different data stores within a data lake, along with their needs and benefits. We will then explore the standard data formats used for storing data in a data lake...