Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Securing Hadoop

You're reading from   Securing Hadoop Implement robust end-to-end security for your Hadoop ecosystem

Arrow left icon
Product type Paperback
Published in Nov 2013
Publisher Packt
ISBN-13 9781783285259
Length 116 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Sudheesh Narayan Sudheesh Narayan
Author Profile Icon Sudheesh Narayan
Sudheesh Narayan
Arrow right icon
View More author details
Toc

Challenges for securing the Hadoop ecosystem


Big Data not only brings challenges for storing, processing, and analysis but also for managing and securing these large data assets. Hadoop was not built with security to begin with. As enterprises started adopting Hadoop, the Kerberos-based security model evolved within Hadoop. But given the distributed nature of the ecosystem and wide range of applications that are built on top of Hadoop, securing Hadoop from an enterprise context is a big challenge.

A typical Big Data ecosystem has multiple stakeholders who interact with the system. For example, expert users (business analysts and data scientists) within the organization would interact with the ecosystem using business intelligence (BI) and analytical tools, and would need deep data access to the data to perform various analysis. A finance department business analyst should not be able to see the data from the HR department and so on. BI tools need a wide range of system-level access to the Hadoop ecosystem depending on the protocol and data that they use for communicating with the ecosystem.

One of the biggest challenges for Big Data projects within enterprises today is about securely integrating the external data sources (social blogs, websites, existing ERP and CRM systems, and so on). This external connectivity needs to be established so that the extracted data from these external sources is available in the Hadoop ecosystem.

Hadoop ecosystem tools such as Sqoop and Flume were not built with full enterprise grade security. Cloudera, MapR, and few others have made significant contributions towards enabling these ecosystem components to be enterprise grade, resulting in Sqoop 2, Flume-ng, and Hive Server 2. Apart from these, there are multiple security-focused projects within the Hadoop ecosystem such as Cloudera Sentry (http://www.cloudera.com/content/cloudera/en/products/cdh/sentry.html), Hortonworks Knox Gateway (http://hortonworks.com/hadoop/knox-gateway/), and Intel's Project Rhino (https://github.com/intel-hadoop/project-rhino/). These projects are making significant progress to make Apache Hadoop provide enterprise grade security. A detailed understanding of each of these ecosystem components is needed to deploy them in production.

Another area of concern within enterprises is the need the existing enterprise Identity and Access Management (IDAM) systems with the Hadoop ecosystem. With such integration, enterprises can extend the Identity and Access Management to the Hadoop ecosystem. However, these integrations bring in multiple challenges as Hadoop inherently has not been built with such enterprise integrations in mind.

Apart from ecosystem integration, there is often a need to have sensitive information within the Big Data ecosystem, to derive patterns and inferences from these datasets. As we move these datasets to the Big Data ecosystem we need to mask/encrypt this sensitive information. Traditional data masking and encryption tools don't scale well for large scale Big Data masking and encryption. We need to identify new means for encryption of large scale datasets.

Usually, as the adoption of Big Data increases, enterprises quickly move to a multicluster/multiversion scenario, where there are multiple versions of the Hadoop ecosystem operating in an enterprise. Also, sensitive data that was earlier banned from the Big Data platform slowly makes its way in. This brings in additional challenges on how we address security in such a complex environment, as a small lapse in security could result in huge financial loss for the organization.

You have been reading a chapter from
Securing Hadoop
Published in: Nov 2013
Publisher: Packt
ISBN-13: 9781783285259
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image