Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Apache Hadoop 3 Quick Start Guide Learn about big data processing and analytics

Product type Paperback

Published in Oct 2018

Publisher Packt

ISBN-13 9781788999830

Length 220 pages

Edition 1st Edition

Languages

Java

Tools

Hadoop

Concepts

Big Data

Author (1):

Hrishikesh Vijay Karambelkar

View More author details

Table of Contents (10) Chapters

Preface

1. Hadoop 3.0 - Background and Introduction

2. Planning and Setting Up Hadoop Clusters FREE CHAPTER

3. Deep Dive into the Hadoop Distributed File System

4. Developing MapReduce Applications

5. Building Rich YARN Applications

6. Monitoring and Administration of a Hadoop Cluster

7. Demystifying Hadoop Ecosystem Components

8. Advanced Topics in Apache Hadoop

9. Other Books You May Enjoy

Leave a review - let other readers know what you think

Writing Apache Pig scripts

Apache Pig allows users to write custom scripts on top of the MapReduce framework. Pig was founded to offer flexibility in terms of data programming over large data sets and non-Java programmers. Pig can apply multiple transformations on input data in order to produce output on top of a Java virtual machine or an Apache Hadoop multi-node cluster. Pig can be used as a part of ETL (Extract Transform Load) implementations for any big data project.

Setting up Apache Pig in your Hadoop environment is relatively easy compared to other software; all you need to do is download the Pig source and build it to a pig.jar file, which can be used for your programs. Pig-generated compiled artifacts can be deployed on a standalone JVM, Apache Spark, Apache Tez, and MapReduce, and Pig supports six different execution environments (both local and distributed). The respective...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (1)

Vijay Karambelkar

Hrishikesh Vijay Karambelkar is an innovator and an enterprise architect with 16 years of software design and development experience, specifically in the areas of big data, enterprise search, data analytics, text mining, and databases. He is passionate about architecting new software implementations for the next generation of software solutions for various industries, including oil and gas, chemicals, manufacturing, utilities, healthcare, and government infrastructure. In the past, he has authored three books for Packt Publishing: two editions of Scaling Big Data with Hadoop and Solr and one of Scaling Apache Solr. He has also worked with graph databases, and some of his work has been published at international conferences such as VLDB and ICDE.

See other products by Vijay Karambelkar