Packt+ | Advance your knowledge in tech

All Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

HBase High Performance Cookbook

You're reading from HBase High Performance Cookbook

Product type Book

Published in Jan 2017

Publisher Packt

ISBN-13 9781783983063

Pages 350 pages

Edition 1st Edition

Languages

Concepts

Database Administration

Author (1):

Ruchir Choudhry

Table of Contents (19) Chapters

HBase High Performance Cookbook

Credits

About the Author

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

1. Configuring HBase

2. Loading Data from Various DBs

3. Working with Large Distributed Systems Part I

4. Working with Large Distributed Systems Part II

5. Working with Scalable Structure of tables

6. HBase Clients

7. Large-Scale MapReduce

Introduction

8. HBase Performance Tuning

9. Performing Advanced Tasks on HBase

10. Optimizing Hbase for Cloud

11. Case Study

Index

Scaling elastically or Auto Scaling with built-in fault tolerance

Before we go into Auto Scaling we need to give a microscopic view of how HBase accomplishes auto-sharding and how the distributed components within HBase architecture work.

Let's first look at Region.

HBase Regions are a subgroup of table's data which is adjoining; these are in a sorted order of range of rows which sit together, and these regions are distributed across the clusters. Region never overlaps with other regions and the job of a single region server is to serve to expose Region details to the region client at any given point of time; this way HBase provides a guarantee a very strong consistency.

The region has many stores.

A stores hosts a MemStore and deals with the store files (commonly known as HFiles). MemStore is an In-Memory state of the data and takes care of the modifications of the key/value pairs. Due to the following reason (as following) when a flush process is initiated the data which is residing in MemStore...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Authors (1)

Ruchir Choudhry

Ruchir Choudhry is a principle architect in one of the largest e-commerce companies and specializes in leading, articulating, technology vision, strategizing, and implementing very large-scale software engineering-driven technology changes, with a track record of over 16 years of success. He was responsible for leading strategy, architecture, engineering, and operations of multitenant e-commerce sites and platforms in US, UK, Brazil, and other major markets for Walmart. The sites helped Walmart enter new markets and/or grow its market share. The sites combined service millions of customers and take in orders with annual revenues exceeding $2.0 billion. His personal interest is in performance and scalability. Recently, he has become obsessed with technology as a vehicle to drive and prioritize optimization across organizations and in the world. He is a core team member in conceptualizing, designing, and reshaping a new platform that will serve the next generation of frontend engineering, based on the cutting edge technology in WalMart.com and NBC/GE/VF Image ware. He has led some of the most complex and technologically challenging R&D and innovative projects in VF Image Ware, Walmart.com and in GE/NBC (China and Vancouver Olympic websites), Hiper World Cyber Tech Limited (which created the first wireless-based payment gateway of India that worked on non-smart phones, which was presented at Berlin in 1999). He is the author of more than 8 white papers, which spans from biometric-based single sign on to Java Cards, performance tuning, and JVM tuning, among others. He was a presenter of JVM, performance optimization using Jboss, in Berlin and various other places. Ruchir Choudhry did his BE at Bapuji Institute of Technology, MBA in information technology at National institute of Engineering and Technology, and his MS Systems at BITS Pilani. He is currently working and consulting on HBase, Spark, and Cassandra. He can be reached at ruchirchoudhry@gmail.com

See other products by Ruchir Choudhry

Other recommended products

Related to this chapter

Hadoop 2.x Administration Cookbook

Hadoop 2.x Administration Cookbook

A practical and use case driven approach to Hadoop administration with coverage on a vast array of topics including Hadoop cluster installation, performance tuning, cluster planning, security, and much more. This book covers Hadoop from the perspective of running clusters in critical and large environments with complex data and at scale.

May 2017 11 hours 36 minutes

Apache Hadoop 3 Quick Start Guide

Apache Hadoop 3 Quick Start Guide

Apache Hadoop is a widely used distributed data platform. It enables large datasets to be efficiently processed instead of using one large computer to store and process the data. This book will get you started with the Hadoop ecosystem, and introduce you to the main technical topics such as MapReduce, YARN and HDFS.

Oct 2018 7 hours 20 minutes

Mastering Hadoop 3

Mastering Hadoop 3

This is a comprehensive guide to understand advanced concepts of Hadoop ecosystem. You will learn how Hadoop works internally, and build solutions to some of real world use cases. Finally, you will have a solid understanding of how components in the Hadoop ecosystem are effectively integrated to implement a fast and reliable Big Data pipeline

Feb 2019 18 hours 8 minutes

Mastering Apache Storm

Mastering Apache Storm

With real-world examples and clear explanations, this book will ensure you will have a thorough mastery Apache Storm.You’ll get an understanding of deploying Storm on clusters. Introduce yourself to topics such as trident topology, monitoring, Storm Parallelism, scheduler and log processing. Learn how to integrate Storm with other well-known Big Data technologies such as HBase, Redis, Kafka, and Hadoop to realize the full potential of Storm.You will be able to use the knowledge to develop efficient, distributed real-time applications to cater to your business needs.

Aug 2017 9 hours 28 minutes

Modern Big Data Processing with Hadoop

Modern Big Data Processing with Hadoop

This book presents unique techniques to conquer different Big Data processing and analytics challenges using Hadoop. Practical examples are provided to boost your understanding of Big Data concepts and their implementation. By the end of the book, you will have all the knowledge and skills you need to become a true Big Data expert.

Mar 2018 13 hours 8 minutes

Seven NoSQL Databases in a Week

Seven NoSQL Databases in a Week

This book will help you understand the fundamentals of seven of the most popular NoSQL databases. You will see how the functionalities of each of them differ, while still giving you the same result - a database solution with speed, high performance, and accuracy.

Mar 2018 10 hours 16 minutes

Mastering Apache Spark 2.x

Mastering Apache Spark 2.x

Apache Spark is an in-memory cluster-based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and more. This book will familiarize you with the newest features in Apache Spark 2.x, and take you through an exciting journey of complex Big Data processing, analytics, streaming analytics as well as advanced machine learning with Apache Spark. During the course of the book, you will leverage different functionalities and modules of Apache Spark such as Spark SQL, Spark MLlib, Spark Streaming, SparkML and more, to build efficient data processing solutions. By the end of this book, you will have all the necessary knowledge to use Apache Spark effectively in your day to day tasks.

Jul 2017 11 hours 48 minutes

Apache Hive Essentials

Apache Hive Essentials

Apache Hive helps you deal with data summarization, queries, and analysis for huge amounts of data. This book will give you a background in big data, and familiarize you with your Hive working environment. Next you will cover advanced topics like performance and security in Hive and how to work efficiently to find solutions to big data problems.

Jun 2018 7 hours 0 minutes

Data Lake for Enterprises

Data Lake for Enterprises

The term 'Data Lake' has recently emerged as a prominent term in the big data industry. Data scientists can make use of it in deriving meaningful insights which can be used by businesses to redefine or transform the way they operate. Lambda architecture is also emerging as one of the very eminent patterns in the big data landscape, as it helps to derive useful information from not only the historical data but also correlates real-time data to enable business for taking critical decisions. This book tries to bring these two important aspects into one, namely data lake and lambda architecture.

May 2017 19 hours 52 minutes

Big Data Analytics with Hadoop 3

Big Data Analytics with Hadoop 3

Apache Hadoop is the most popular platform for big data processing to build powerful analytics solutions. This book shows you how to do just that, with the help of practical examples. You will be well-versed with the analytical capabilities of Hadoop ecosystem with Apache Spark and Apache Flink to perform big data analytics by the end of this book.

May 2018 16 hours 4 minutes

MySQL 8 for Big Data

MySQL 8 for Big Data

MySQL is one of the most popular relational databases in the world today, and has become a popular choice of tool to handle vast amounts of structured data - that is, structured Big Data. This book will demonstrate how you can dabble with large amounts of data using MySQL 8. It also highlights topics such as integrating MySQL 8 and a Big Data solution like Apache Hadoop using different tools like Apache Sqoop and MySQL Applier. With practical examples and use-cases, you will get a better clarity on how you can leverage the offerings of MySQL 8 to build a robust Big Data solution.

Oct 2017 9 hours 52 minutes

Architecting Data-Intensive Applications

Architecting Data-Intensive Applications

Are you a software architect or developer looking at your own applications gingerly while browsing through Facebook and applauding its data-intensive yet fluent and efficient behavior? This book is your gateway to build smart Data Intensive Systems by imbibing Core Data Intensive Architectural Principles, patterns, and techniques directly into your application architecture.

Jul 2018 11 hours 20 minutes

Personalised recommendations for you

Based on your interests and search pattern

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

Aug 2023 7 hours 40 minutes

Generative AI with LangChain

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Mastering Tableau 2023

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

Aug 2023 22 hours 48 minutes

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

Sep 2023 8 hours 36 minutes

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

Sep 2023 8 hours 36 minutes

Data Engineering with AWS

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Oct 2023 21 hours 12 minutes

Modern Data Architecture on AWS

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

Aug 2023 14 hours 0 minutes

Practical Guide to Applied Conformal Prediction in Python

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

Dec 2023 8 hours 0 minutes

TinyML Cookbook

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

Nov 2023 22 hours 8 minutes