You're reading from Scaling Big Data with Hadoop and Solr, Second Edition Understand, design, build, and optimize your big data search engine with Hadoop and Apache Solr

Product type Paperback

Published in Apr 2015

Publisher

ISBN-13 9781783553396

Length 166 pages

Edition 1st Edition

Tools

Solr

Concepts

Big Data

Author (1):

Hrishikesh Vijay Karambelkar

View More author details

Table of Contents (8) Chapters

Preface

1. Processing Big Data Using Hadoop and MapReduce

2. Understanding Apache Solr FREE CHAPTER

3. Enabling Distributed Search using Apache Solr

4. Big Data Search Using Hadoop and Its Ecosystem

5. Scaling Search Performance

A. Use Cases for Big Data Search

Index

Chapter 1. Processing Big Data Using Hadoop and MapReduce

Continuous evolution in computer sciences has enabled the world to work in a faster, more reliable, and more efficient manner. Many businesses have been transformed to utilize electronic media. They use information technologies to innovate the communication with their customers, partners, and suppliers. It has also given birth to new industries such as social media and e-commerce. This rapid increase in the amount of data has led to an "information explosion." To handle the problems of managing huge information, the computational capabilities have evolved too, with a focus on optimizing the hardware cost, giving rise to distributed systems. In today's world, this problem has multiplied; information is generated from disparate sources such as social media, sensors/embedded systems, and machine logs, in either a structured or an unstructured form. Processing of these large and complex data using traditional systems and methods is a challenging task. Big Data is an umbrella term that encompasses the management and processing of such data.

Big data is usually associated with high-volume and heavily growing data with unpredictable content. The IT advisory firm Gartner defines big data using 3Vs (high volume of data, high velocity of processing speed, and high variety of information). IBM has added a fourth V (high veracity) to this definition to make sure that the data is accurate and helps you make your business decisions. While the potential benefits of big data are real and significant, there remain many challenges. So, organizations that deal with such a high volumes of data, must work on the following areas:

Data capture/acquisition from various sources
Data massaging or curating
Organization and storage
Big data processing such as search, analysis, and querying
Information sharing or consumption
Information security and privacy

Big data poses a lot of challenges to the technologies in use today. Many organizations have started investing in these big data areas. As per Gartner, through 2015, 85% of the Fortune 500 organizations will be unable to exploit big data for a competitive advantage.

To handle the problem of storing and processing complex and large data, many software frameworks have been created to work on the big data problem. Among them, Apache Hadoop is one of the most widely used open source software frameworks for the storage and processing of big data. In this chapter, we are going to understand Apache Hadoop. We will be covering the following topics:

Apache Hadoop's ecosystem
Configuring Apache Hadoop
Running Apache Hadoop
Setting up a Hadoop cluster

The rest of the chapter is locked

You're reading from Scaling Big Data with Hadoop and Solr, Second Edition Understand, design, build, and optimize your big data search engine with Hadoop and Apache Solr

Table of Contents (8) Chapters

Chapter 1. Processing Big Data Using Hadoop and MapReduce

Authors (1)

Personalised recommendations for you

You're reading from Scaling Big Data with Hadoop and Solr, Second Edition Understand, design, build, and optimize your big data search engine with Hadoop and Apache Solr

Table of Contents (8) Chapters

Chapter 1. Processing Big Data Using Hadoop and MapReduce

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you