Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Learning Hadoop 2

You're reading from   Learning Hadoop 2 Design and implement data processing, lifecycle management, and analytic workflows with the cutting-edge toolbox of Hadoop 2

Arrow left icon
Product type Paperback
Published in Feb 2015
Publisher Packt
ISBN-13 9781783285518
Length 382 pages
Edition 1st Edition
Tools
Arrow right icon
Toc

Table of Contents (13) Chapters Close

Preface 1. Introduction 2. Storage FREE CHAPTER 3. Processing – MapReduce and Beyond 4. Real-time Computation with Samza 5. Iterative Computation with Spark 6. Data Analysis with Apache Pig 7. Hadoop and SQL 8. Data Lifecycle Management 9. Making Development Easier 10. Running a Hadoop Cluster 11. Where to Go Next Index

Apache Crunch


Apache Crunch (http://crunch.apache.org) is a Java and Scala library to create pipelines of MapReduce jobs. It is based on Google's FlumeJava (http://dl.acm.org/citation.cfm?id=1806638) paper and library. The project goal is to make the task of writing MapReduce jobs as straightforward as possible for anybody familiar with the Java programming language by exposing a number of patterns that implement operations such as aggregating, joining, filtering, and sorting records.

Similar to tools such as Pig, Crunch pipelines are created by composing immutable, distributed data structures and running all processing operations on such structures; they are expressed and implemented as user-defined functions. Pipelines are compiled into a DAG of MapReduce jobs, whose execution is managed by the library's planner. Crunch allows us to write iterative code and abstracts away the complexity of thinking in terms of map and reduce operations, while at the same time avoiding the need of an ad...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime