Logical and physical data models
An HBase cluster is divided into namespaces. A namespace is a logical collection of tables, representing an application or organizational unit.
A table in HBase is made up of rows and columns, like a table in any other database. The table is divided up into regions such that each region is a collection of rows. A row cannot be split across regions:
Values are identified by the combination of the Row Key, a Version Timestamp, the Column Family (Metrics) and the Column Qualifier (Temperature)
However, in addition to rows and columns, HBase has another construct called a ColumnFamily. A ColumnFamily, as the name suggests, represents a set of columns. For a given set of rows, all data for columns in a column family is stored physically together on a disk. So, if a table has a single region with 100 rows and two column families with 10 columns each, then there are two underlying HFiles, corresponding to each column family.
What should the criteria be for grouping...