Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Modern Big Data Processing with Hadoop Expert techniques for architecting end-to-end big data solutions to get valuable insights

Product type Paperback

Published in Mar 2018

Publisher Packt

ISBN-13 9781787122765

Length 394 pages

Edition 1st Edition

Languages

Processing

Tools

Apache Spark

Concepts

Big Data

Authors (3):

Manoj R Patil

Prashant Shindgikar

V Naresh Kumar

View More author details

Table of Contents (12) Chapters

Preface

1. Enterprise Data Architecture Principles

2. Hadoop Life Cycle Management FREE CHAPTER

3. Hadoop Design Consideration

4. Data Movement Techniques

5. Data Modeling in Hadoop

6. Designing Real-Time Streaming Data Pipelines

7. Large-Scale Data Processing Frameworks

8. Building Enterprise Search Platform

9. Designing Data Visualization Solutions

10. Developing Applications Using the Cloud

11. Production Hadoop Cluster Deployment

Data masking

Businesses that deal with customer data have to make sure that the PII (personally identifiable information) of these customers is not moving freely around the entire data pipeline. This criterion is applicable not only to customer data but also to any other type of data that is considered classified, as per standards such as GDPR, SOX, and so on. In order to make sure that we protect the privacy of customers, employees, contractors, and vendors, we need to take the necessary precautions to ensure that when the data goes through several pipelines, users of the data see only anonymized data. The level of anonymization we do depends upon the standards the company adheres to and also the prevailing country standards.

So, data masking can be called the process of hiding/transforming portions of original data with other data without losing the meaning or context.

In this...

You have been reading a chapter from

Modern Big Data Processing with Hadoop

Published in: Mar 2018

Publisher: Packt

ISBN-13: 9781787122765

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (3)

Shindgikar

Prashant Shindgikar is an accomplished big data Architect with over 20 years of experience in data analytics. He specializes in data innovation and resolving data challenges for major retail brands. He is a hands-on architect having an innovative approach to solving data problems. He provides thought leadership and pursues strategies for engagements with the senior executives on innovation in data processing and analytics. He presently works for a large USA-based retail company.

See other products by Shindgikar

R Patil

Manoj R Patil is the Chief Architect in Big Data at Compassites Software Solutions Pvt. Ltd. where he overlooks the overall platform architecture related to Big Data solutions, and he also has a hands-on contribution to some assignments. He has been working in the IT industry for the last 15 years. He started as a programmer and, on the way, acquired skills in architecting and designing solutions, managing projects keeping each stakeholder's interest in mind, and deploying and maintaining the solution on a cloud infrastructure. He has been working on the Pentaho-related stack for the last 5 years, providing solutions while working with employers and as a freelancer as well. Manoj has extensive experience in JavaEE, MySQL, various frameworks, and Business Intelligence, and is keen to pursue his interest in predictive analysis. He was also associated with TalentBeat, Inc. and Persistent Systems, and implemented interesting solutions in logistics, data masking, and data-intensive life sciences.

See other products by R Patil

Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.

See other products by Kumar