Loading data into HBase
HBase is another component in the Hadoop ecosystem. It is a columnar database, which stores datasets based on the columns, instead of the rows that make it up. This allows for higher compression and faster searching, making columnar databases ideal for the kinds of analytical queries that can cause significant performance issues in traditional relational databases.
Note
For this recipe we will be using the Baseball Dataset loaded into Hadoop in the recipe Loading data into Hadoop, (also in this chapter). It is recommended that the recipe Loading data into Hadoop is performed before continuing.
Getting ready
In this recipe, we will be loading the Schools.csv
, Master.csv
, and SchoolsPlayers.csv
files. The data relates (via the SchoolsPlayers.csv
file) schools (found in the Schools.csv
file) to players (found in the Master.csv
file). This data is designed for a relational database, so we will be tweaking the data to take advantage of Hbase's data store capabilities. Before...