HBase
HBase is the NoSQL datastore in the Hadoop ecosystem. Integration with a database is essential for Spark. It can read data from an HBase table or write to one. In fact, Spark supports HBase very well via the HadoopdataSet
calls.
Tip
If you want to experiment with HBase, you can install a standalone local version of HBase, as described in http://hbase.apache.org/book.html#quickstart.
Before working through the examples, let's create a table and three records in HBase. For testing, you can install a local standalone version of HBase that works from the local filesystem. So there's no need for Hadoop or HDFS. However, this won't be suitable for production.
I created a test
table with three records via the HBase shell, as shown in the following screenshot:
Loading from HBase
The HBase test code in the Apache Spark examples is a good start to testing our HBase connectivity and loading data. The code is not that difficult, but we do need to keep track of the data types, that...