Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Hadoop Blueprints
Hadoop Blueprints

Hadoop Blueprints: Use Hadoop to solve business problems by learning from a rich set of real-life case studies

Arrow left icon
Profile Icon Anurag Shrivastava Profile Icon Sudheesh Narayan Profile Icon Deshpande
Arrow right icon
€20.98 €29.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
eBook Sep 2016 316 pages 1st Edition
eBook
€20.98 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Anurag Shrivastava Profile Icon Sudheesh Narayan Profile Icon Deshpande
Arrow right icon
€20.98 €29.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
eBook Sep 2016 316 pages 1st Edition
eBook
€20.98 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€20.98 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Table of content icon View table of contents Preview book icon Preview Book

Hadoop Blueprints

Chapter 2. A 360-Degree View of the Customer

In this chapter, we will take the example of a fictitious company called Cosmetica Inc. This company was founded in 1983, when the web commerce was invented. From its humble start as a small shop in Uden, it has now more than 300 shopping outlets. The company also runs a web shop where customers can buy products 24x7. The company is planning to launch a personalized shopping service where the customer will get assistance in choosing the right product.

This company is interested in building a 360-degree view of customers who often visit their web shop, and who are also active on social media. To build a 360-degree view, we will follow the following steps in this chapter:

  • Understanding the data required in the 360-degree view
  • Setting up the technology stack
  • Engineering the solution
  • Presenting the solution using a web interface

Capturing business information

Like any other mid-sized retailer, the information technology needs of Cosmetica have grown with times. Previously, most customers visited their shopping outlets and did most of their purchasing during the weekends. During the festival seasons, the sales used to be brisk. In late 90s, Cosmetica introduced a loyalty card to boost customer loyalty. This loyalty card allowed customers to collect loyalty points at the time of making a purchase in the shop. The customers could redeem those loyalty points to buy products that were on special offer.

Since the year 2005, Cosmetica has a good presence on the World Wide Web through their webshop; customers can browse their products online and buy them. Cosmetica is planning to offer a personalized cosmetic shopping service. A customer can call the Cosmetica call center in order to approach a human shopping assistant and get personalized advice.

In order to do this, Cosmetica wants to have a 360-degree view of customers...

Setting up the technology stack

In Chapter 1, Hadoop and Big Data, we covered various tools in the Hadoop ecosystem. In this chapter, we will use some of those tools to set up the technology stack for building a 360-degree view of a customer. Setting up all the tools in the Hadoop ecosystem can be cumbersome and a fault-prone process, owing to multiple dependencies on the libraries. The tools in the Hadoop ecosystem have evolved over a period of time by contributions from the open source community. Therefore, these tools lack an integrated installation and configuration approach. The Pure Play Hadoop vendors have made good progress in easing the installation of Hadoop by offering Hadoop sandboxes and RPM packages. One such vendor is Hortonworks who offer the Hortonwork Data Platform or HDP. HDP is a pure open source platform built upon open source Hadoop, and several tools from the Hadoop ecosystem.

HDP is available on a CentOS-based virtual machine such as a VirtualBox image. We will...

Test driving Hive and Sqoop

In the previous section, we verified that MySQL, Hive, and Sqoop were available on our Hadoop Sandbox. We will now test drive Hive and Sqoop.

Querying data using Hive

We run Hive queries to select data from tables. Hive has two types of tables:

  • Managed tables
  • External tables

Hive creates managed tables by default. To create external tables, we specify the keyword external during table creation.

In the case of managed tables, the table lifecycle is completely managed by Hive. If you drop a managed table, then the associated data and metadata are also deleted by Hive. The external table reads data from an HDFS file. This file is not deleted when the table is dropped by Hive. Other tools can also access the HDFS file while at the same time we can run Hive queries on the HDFS by defining an external table for the file.

In Chapter 1, Hadoop and Big Data, of this book, we used a dataset containing the historical stock price of IBM to run a MapReduce job that calculated...

Engineering the solution

We will engineer the solution by breaking down the problem into several parts. In each part, we will perform a step to import or transform the data. Finally, we will bring everything together to create the view. To engineer the solution, we will use Sqoop to load customer master data from MySql RDBMS into Hive. We will use HDFS copy commands to load the Apache Access logs and tweets in Hadoop.

In the 360-degree view of the customer, we will combine the information from the following sources:

  • Full name, gender, userID, and e-mail from customer master data as the data from the system of records
  • Brand names frequently visited on Cosmetica's web shop as the data from web logs
  • Tweets on certain topics as the social media data

Engineering the solution
Figure 5 360-degree view combines data from various sources

You should bear in mind that we have taken a small set of data sources to create the 360-degree view. In practice, you should think of several data sources that can be used to build...

Capturing business information


Like any other mid-sized retailer, the information technology needs of Cosmetica have grown with times. Previously, most customers visited their shopping outlets and did most of their purchasing during the weekends. During the festival seasons, the sales used to be brisk. In late 90s, Cosmetica introduced a loyalty card to boost customer loyalty. This loyalty card allowed customers to collect loyalty points at the time of making a purchase in the shop. The customers could redeem those loyalty points to buy products that were on special offer.

Since the year 2005, Cosmetica has a good presence on the World Wide Web through their webshop; customers can browse their products online and buy them. Cosmetica is planning to offer a personalized cosmetic shopping service. A customer can call the Cosmetica call center in order to approach a human shopping assistant and get personalized advice.

In order to do this, Cosmetica wants to have a 360-degree view of customers...

Setting up the technology stack


In Chapter 1, Hadoop and Big Data, we covered various tools in the Hadoop ecosystem. In this chapter, we will use some of those tools to set up the technology stack for building a 360-degree view of a customer. Setting up all the tools in the Hadoop ecosystem can be cumbersome and a fault-prone process, owing to multiple dependencies on the libraries. The tools in the Hadoop ecosystem have evolved over a period of time by contributions from the open source community. Therefore, these tools lack an integrated installation and configuration approach. The Pure Play Hadoop vendors have made good progress in easing the installation of Hadoop by offering Hadoop sandboxes and RPM packages. One such vendor is Hortonworks who offer the Hortonwork Data Platform or HDP. HDP is a pure open source platform built upon open source Hadoop, and several tools from the Hadoop ecosystem.

HDP is available on a CentOS-based virtual machine such as a VirtualBox image. We will deploy...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Solve real-world business problems using Hadoop and other Big Data technologies
  • Build efficient data lakes in Hadoop, and develop systems for various business cases like improving marketing campaigns, fraud detection, and more
  • Power packed with six case studies to get you going with Hadoop for Business Intelligence

Description

If you have a basic understanding of Hadoop and want to put your knowledge to use to build fantastic Big Data solutions for business, then this book is for you. Build six real-life, end-to-end solutions using the tools in the Hadoop ecosystem, and take your knowledge of Hadoop to the next level. Start off by understanding various business problems which can be solved using Hadoop. You will also get acquainted with the common architectural patterns which are used to build Hadoop-based solutions. Build a 360-degree view of the customer by working with different types of data, and build an efficient fraud detection system for a financial institution. You will also develop a system in Hadoop to improve the effectiveness of marketing campaigns. Build a churn detection system for a telecom company, develop an Internet of Things (IoT) system to monitor the environment in a factory, and build a data lake – all making use of the concepts and techniques mentioned in this book. The book covers other technologies and frameworks like Apache Spark, Hive, Sqoop, and more, and how they can be used in conjunction with Hadoop. You will be able to try out the solutions explained in the book and use the knowledge gained to extend them further in your own problem space.

Who is this book for?

If you are interested in building efficient business solutions using Hadoop, this is the book for you This book assumes that you have basic knowledge of Hadoop, Java, and any scripting language.

What you will learn

  • Learn about the evolution of Hadoop as the big data platform
  • Understand the basics of Hadoop architecture
  • Build a 360 degree view of your customer using Sqoop and Hive
  • Build and run classification models on Hadoop using BigML
  • Use Spark and Hadoop to build a fraud detection system
  • Develop a churn detection system using Java and MapReduce
  • Build an IoT-based data collection and visualization system
  • Get to grips with building a Hadoop-based Data Lake for large enterprises
  • Learn about the coexistence of NoSQL and In-Memory databases in the Hadoop ecosystem

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Sep 30, 2016
Length: 316 pages
Edition : 1st
Language : English
ISBN-13 : 9781783980314
Vendor :
Apache
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want

Product Details

Publication date : Sep 30, 2016
Length: 316 pages
Edition : 1st
Language : English
ISBN-13 : 9781783980314
Vendor :
Apache
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 111.97
Hadoop Blueprints
€36.99
Mastering Hadoop
€41.99
Mastering Reactive JavaScript
€32.99
Total 111.97 Stars icon

Table of Contents

8 Chapters
1. Hadoop and Big Data Chevron down icon Chevron up icon
2. A 360-Degree View of the Customer Chevron down icon Chevron up icon
3. Building a Fraud Detection System Chevron down icon Chevron up icon
4. Marketing Campaign Planning Chevron down icon Chevron up icon
5. Churn Detection Chevron down icon Chevron up icon
6. Analyze Sensor Data Using Hadoop Chevron down icon Chevron up icon
7. Building a Data Lake Chevron down icon Chevron up icon
8. Future Directions Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(1 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Wissem Dec 22, 2016
Full star icon Full star icon Full star icon Full star icon Full star icon 5
As a technical reviewer of this book, I highly recommend reading it. It has very complete and useful real world use cases of using Hadoop and its ecosystem. Chapters explain Big Data technology trends like IOT , Data Lakes and how Hadoop fits with is ecosystem to solve those problems. Analyze Sensor Data Using Hadoop, Building a data lake, Building a Fraud Detection System, Churn Detection are my favorites chapters where the authors bring with examples the steps of using Hadoop ecosystem. In Summary, if you want to learn Hadoop with examples, this is the right book for you.Cheers
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.