You're reading from Hbase Essentials A practical guide to realizing the seamless potential of storing and managing high-volume, high-velocity data quickly and painlessly with HBase

Product type Paperback

Published in Nov 2014

Publisher

ISBN-13 9781783987245

Length 164 pages

Edition 1st Edition

Languages

Java

Tools

HBase

Concepts

Databases

Author (1):

Nishant Garg

View More author details

Table of Contents (9) Chapters

Preface

1. Introducing HBase FREE CHAPTER

2. Defining the Schema

3. Advanced Data Modeling

4. The HBase Architecture

5. The HBase Advanced API

6. HBase Clients

7. HBase Administration

Index

The world of Big Data

Since the last decade, the amount of data being created is more than 20 terabytes per second and this size is only increasing. Not only volume and velocity but this data is also of a different variety, that is, structured and semi structured in nature, which means that data might be coming from blog posts, tweets, social network interactions, photos, videos, continuously generated log messages about what users are doing, and so on. Hence, Big Data is a combination of transactional data and interactive data. This large set of data is further used by organizations for decision making. Storing, analyzing, and summarizing these large datasets efficiently and cost effectively have become among the biggest challenges for these organizations.

In 2003, Google published a paper on the scalable distributed filesystem titled Google File System (GFS), which uses a cluster of commodity hardware to store huge amounts of data and ensure high availability by using the replication of data between nodes. Later, Google published an additional paper on processing large, distributed datasets using MapReduce (MR).

For processing Big Data, platforms such as Hadoop, which inherits the basics from both GFS and MR, were developed and contributed to the community. A Hadoop-based platform is able to store and process continuously growing data in terabytes or petabytes.

Note

The Apache Hadoop software library is a framework that allows the distributed processing of large datasets across clusters of computers.

However, Hadoop is designed to process data in the batch mode and the ability to access data randomly and near real time is completely missing. In Hadoop, processing smaller files has a larger overhead compared to big files and thus is a bad choice for low latency queries.

Later, a database solution called NoSQL evolved with multiple flavors, such as a key-value store, document-based store, column-based store, and graph-based store. NoSQL databases are suitable for different business requirements. Not only do these different flavors address scalability and availability but also take care of highly efficient read/write with data growing infinitely or, in short, Big Data.

Note

The NoSQL database provides a fail-safe mechanism for the storage and retrieval of data that is modeled in it, somewhat different from the tabular relations used in many relational databases.

You're reading from Hbase Essentials A practical guide to realizing the seamless potential of storing and managing high-volume, high-velocity data quickly and painlessly with HBase

Table of Contents (9) Chapters

The world of Big Data

Note

Note

Authors (1)

Personalised recommendations for you