Summary
We started this chapter with the problem of storing huge amounts of data, processing it in bulk, and randomly accessing it. This arose from the fact that we were ambitiously wanting to store every single web page on earth and process it to extract some results from it. We introduced a solution called BigTable and examined its data model. We saw that in BigTable, we can define multiple tables, with each table having multiple column families, which are defined at the time of creating the table. We learned that column families are logical groupings of columns, and new columns can be defined in a column family, as needed. We also learned that the datastore in BigTable has no meaning on its own, and it stores them just as plain bytes; its interpretation and meanings depend on the user of the data. We also learned that each row in BigTable has a unique row key, which has a length of 64 KB.
Once done with the logical model of BigTable, we turned our attention to how data is actually stored...