Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Hadoop Blueprints
Hadoop Blueprints

Hadoop Blueprints: Use Hadoop to solve business problems by learning from a rich set of real-life case studies

Arrow left icon
Profile Icon Anurag Shrivastava Profile Icon Sudheesh Narayan Profile Icon Deshpande
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
Paperback Sep 2016 316 pages 1st Edition
eBook
€20.98 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Anurag Shrivastava Profile Icon Sudheesh Narayan Profile Icon Deshpande
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
Paperback Sep 2016 316 pages 1st Edition
eBook
€20.98 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€20.98 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Hadoop Blueprints

Chapter 2. A 360-Degree View of the Customer

In this chapter, we will take the example of a fictitious company called Cosmetica Inc. This company was founded in 1983, when the web commerce was invented. From its humble start as a small shop in Uden, it has now more than 300 shopping outlets. The company also runs a web shop where customers can buy products 24x7. The company is planning to launch a personalized shopping service where the customer will get assistance in choosing the right product.

This company is interested in building a 360-degree view of customers who often visit their web shop, and who are also active on social media. To build a 360-degree view, we will follow the following steps in this chapter:

  • Understanding the data required in the 360-degree view
  • Setting up the technology stack
  • Engineering the solution
  • Presenting the solution using a web interface

Capturing business information

Like any other mid-sized retailer, the information technology needs of Cosmetica have grown with times. Previously, most customers visited their shopping outlets and did most of their purchasing during the weekends. During the festival seasons, the sales used to be brisk. In late 90s, Cosmetica introduced a loyalty card to boost customer loyalty. This loyalty card allowed customers to collect loyalty points at the time of making a purchase in the shop. The customers could redeem those loyalty points to buy products that were on special offer.

Since the year 2005, Cosmetica has a good presence on the World Wide Web through their webshop; customers can browse their products online and buy them. Cosmetica is planning to offer a personalized cosmetic shopping service. A customer can call the Cosmetica call center in order to approach a human shopping assistant and get personalized advice.

In order to do this, Cosmetica wants to have a 360-degree view of customers...

Setting up the technology stack

In Chapter 1, Hadoop and Big Data, we covered various tools in the Hadoop ecosystem. In this chapter, we will use some of those tools to set up the technology stack for building a 360-degree view of a customer. Setting up all the tools in the Hadoop ecosystem can be cumbersome and a fault-prone process, owing to multiple dependencies on the libraries. The tools in the Hadoop ecosystem have evolved over a period of time by contributions from the open source community. Therefore, these tools lack an integrated installation and configuration approach. The Pure Play Hadoop vendors have made good progress in easing the installation of Hadoop by offering Hadoop sandboxes and RPM packages. One such vendor is Hortonworks who offer the Hortonwork Data Platform or HDP. HDP is a pure open source platform built upon open source Hadoop, and several tools from the Hadoop ecosystem.

HDP is available on a CentOS-based virtual machine such as a VirtualBox image. We will...

Test driving Hive and Sqoop

In the previous section, we verified that MySQL, Hive, and Sqoop were available on our Hadoop Sandbox. We will now test drive Hive and Sqoop.

Querying data using Hive

We run Hive queries to select data from tables. Hive has two types of tables:

  • Managed tables
  • External tables

Hive creates managed tables by default. To create external tables, we specify the keyword external during table creation.

In the case of managed tables, the table lifecycle is completely managed by Hive. If you drop a managed table, then the associated data and metadata are also deleted by Hive. The external table reads data from an HDFS file. This file is not deleted when the table is dropped by Hive. Other tools can also access the HDFS file while at the same time we can run Hive queries on the HDFS by defining an external table for the file.

In Chapter 1, Hadoop and Big Data, of this book, we used a dataset containing the historical stock price of IBM to run a MapReduce job that calculated...

Engineering the solution

We will engineer the solution by breaking down the problem into several parts. In each part, we will perform a step to import or transform the data. Finally, we will bring everything together to create the view. To engineer the solution, we will use Sqoop to load customer master data from MySql RDBMS into Hive. We will use HDFS copy commands to load the Apache Access logs and tweets in Hadoop.

In the 360-degree view of the customer, we will combine the information from the following sources:

  • Full name, gender, userID, and e-mail from customer master data as the data from the system of records
  • Brand names frequently visited on Cosmetica's web shop as the data from web logs
  • Tweets on certain topics as the social media data

Engineering the solution
Figure 5 360-degree view combines data from various sources

You should bear in mind that we have taken a small set of data sources to create the 360-degree view. In practice, you should think of several data sources that can be used to build...

Capturing business information


Like any other mid-sized retailer, the information technology needs of Cosmetica have grown with times. Previously, most customers visited their shopping outlets and did most of their purchasing during the weekends. During the festival seasons, the sales used to be brisk. In late 90s, Cosmetica introduced a loyalty card to boost customer loyalty. This loyalty card allowed customers to collect loyalty points at the time of making a purchase in the shop. The customers could redeem those loyalty points to buy products that were on special offer.

Since the year 2005, Cosmetica has a good presence on the World Wide Web through their webshop; customers can browse their products online and buy them. Cosmetica is planning to offer a personalized cosmetic shopping service. A customer can call the Cosmetica call center in order to approach a human shopping assistant and get personalized advice.

In order to do this, Cosmetica wants to have a 360-degree view of customers...

Setting up the technology stack


In Chapter 1, Hadoop and Big Data, we covered various tools in the Hadoop ecosystem. In this chapter, we will use some of those tools to set up the technology stack for building a 360-degree view of a customer. Setting up all the tools in the Hadoop ecosystem can be cumbersome and a fault-prone process, owing to multiple dependencies on the libraries. The tools in the Hadoop ecosystem have evolved over a period of time by contributions from the open source community. Therefore, these tools lack an integrated installation and configuration approach. The Pure Play Hadoop vendors have made good progress in easing the installation of Hadoop by offering Hadoop sandboxes and RPM packages. One such vendor is Hortonworks who offer the Hortonwork Data Platform or HDP. HDP is a pure open source platform built upon open source Hadoop, and several tools from the Hadoop ecosystem.

HDP is available on a CentOS-based virtual machine such as a VirtualBox image. We will deploy...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Solve real-world business problems using Hadoop and other Big Data technologies
  • Build efficient data lakes in Hadoop, and develop systems for various business cases like improving marketing campaigns, fraud detection, and more
  • Power packed with six case studies to get you going with Hadoop for Business Intelligence

Description

If you have a basic understanding of Hadoop and want to put your knowledge to use to build fantastic Big Data solutions for business, then this book is for you. Build six real-life, end-to-end solutions using the tools in the Hadoop ecosystem, and take your knowledge of Hadoop to the next level. Start off by understanding various business problems which can be solved using Hadoop. You will also get acquainted with the common architectural patterns which are used to build Hadoop-based solutions. Build a 360-degree view of the customer by working with different types of data, and build an efficient fraud detection system for a financial institution. You will also develop a system in Hadoop to improve the effectiveness of marketing campaigns. Build a churn detection system for a telecom company, develop an Internet of Things (IoT) system to monitor the environment in a factory, and build a data lake – all making use of the concepts and techniques mentioned in this book. The book covers other technologies and frameworks like Apache Spark, Hive, Sqoop, and more, and how they can be used in conjunction with Hadoop. You will be able to try out the solutions explained in the book and use the knowledge gained to extend them further in your own problem space.

Who is this book for?

If you are interested in building efficient business solutions using Hadoop, this is the book for you This book assumes that you have basic knowledge of Hadoop, Java, and any scripting language.

What you will learn

  • Learn about the evolution of Hadoop as the big data platform
  • Understand the basics of Hadoop architecture
  • Build a 360 degree view of your customer using Sqoop and Hive
  • Build and run classification models on Hadoop using BigML
  • Use Spark and Hadoop to build a fraud detection system
  • Develop a churn detection system using Java and MapReduce
  • Build an IoT-based data collection and visualization system
  • Get to grips with building a Hadoop-based Data Lake for large enterprises
  • Learn about the coexistence of NoSQL and In-Memory databases in the Hadoop ecosystem

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Sep 30, 2016
Length: 316 pages
Edition : 1st
Language : English
ISBN-13 : 9781783980307
Vendor :
Apache
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Sep 30, 2016
Length: 316 pages
Edition : 1st
Language : English
ISBN-13 : 9781783980307
Vendor :
Apache
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 111.97
Hadoop Blueprints
€36.99
Mastering Hadoop
€41.99
Mastering Reactive JavaScript
€32.99
Total 111.97 Stars icon

Table of Contents

8 Chapters
1. Hadoop and Big Data Chevron down icon Chevron up icon
2. A 360-Degree View of the Customer Chevron down icon Chevron up icon
3. Building a Fraud Detection System Chevron down icon Chevron up icon
4. Marketing Campaign Planning Chevron down icon Chevron up icon
5. Churn Detection Chevron down icon Chevron up icon
6. Analyze Sensor Data Using Hadoop Chevron down icon Chevron up icon
7. Building a Data Lake Chevron down icon Chevron up icon
8. Future Directions Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(1 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Wissem Dec 22, 2016
Full star icon Full star icon Full star icon Full star icon Full star icon 5
As a technical reviewer of this book, I highly recommend reading it. It has very complete and useful real world use cases of using Hadoop and its ecosystem. Chapters explain Big Data technology trends like IOT , Data Lakes and how Hadoop fits with is ecosystem to solve those problems. Analyze Sensor Data Using Hadoop, Building a data lake, Building a Fraud Detection System, Churn Detection are my favorites chapters where the authors bring with examples the steps of using Hadoop ecosystem. In Summary, if you want to learn Hadoop with examples, this is the right book for you.Cheers
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.