HBase pros and cons
Let's now briefly discuss HBase pros and cons.
The following are some advantages of HBase:
- Great for analytics in association with Hadoop MapReduce
- It can handle very large volumes of data
- Supports scaling out in coordination with Hadoop file system even on commodity hardware
- Fault tolerance
- License free
- Very flexible on schema design/no fixed schema
- Can be integrated with Hive for SQL-like queries, which is better for DBAs who are more familiar with SQL queries
- Auto-sharding
- Auto failover
- Simple client interface
- Row-level atomicity, that is, the PUT operation will either write or fail
The following are some missing aspects:
- Single point of failure (when only one HMaster is used)
- No transaction support
- JOINs are handled in MapReduce layer rather than the database itself
- Indexed and sorted only on key, but RDBMS can be indexed on some arbitrary field
- No built-in authentication or permissions
So overall, we can say if we are in a position to neglect these cons, we can go with HBase which provides many other benefits that are not there in RDBMS. We can see that it's still an evolving technology with Hadoop and with time, it will become more mature and rich, which will make it one of the best tools for analytical database and distributed fault tolerant database. It is an open source Apache project where users and developers can contribute and add more and more features.
Hadoop HBase and a combination of some other Hadoop subproject can do wonders in the data analysis field; using these technologies, the data can be a hidden treasure, which were stored somewhere uselessly as a dump and now they can be very beneficial for understanding various prospects of a specific industry.