Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Apache Spark Quick Start Guide Quickly learn the art of writing efficient big data applications with Apache Spark

Product type Paperback

Published in Jan 2019

Publisher Packt

ISBN-13 9781789349108

Length 154 pages

Edition 1st Edition

Languages

Scala

Tools

Apache Spark

Concepts

Big Data

Authors (2):

Akash Grade

Shrey Mehrotra

View More author details

Table of Contents (10) Chapters

Preface

1. Introduction to Apache Spark

2. Apache Spark Installation FREE CHAPTER

3. Spark RDD

4. Spark DataFrame and Dataset

5. Spark Architecture and Application Execution Flow

6. Spark SQL

7. Spark Streaming, Machine Learning, and Graph Analysis

8. Spark Optimizations

9. Other Books You May Enjoy

Leave a review - let other readers know what you think

Drawbacks of using RDDs

An RDD is a compile-time type-safe. That means, in the case of Scala and Java, if an operation is performed on the RDD that is not applicable to the underlying data type, then Spark will give a compile time error. This can avoid failures in production.

There are some drawbacks of using RDDs though:

RDD code can sometimes be very opaque. Developers might struggle to find out what exactly the code is trying to compute.
RDDs cannot be optimized by Spark, as Spark cannot look inside the lambda functions and optimize the operations. In some cases, where a filter() is piped after a wide transformation, Spark will never perform the filter first before the wide transformation, such as reduceByKey() or groupByKey().
RDDs are slower on non-JVM languages such as Python and R. In the case of these languages, a Python/R virtual machine is created alongside JVM. There...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (2)

Shrey Mehrotra

Shrey Mehrotra has over 8 years of IT experience and, for the past 6 years, has been designing the architecture of cloud and big-data solutions for the finance, media, and governance sectors. Having worked on research and development with big-data labs and been part of Risk Technologies, he has gained insights into Hadoop, with a focus on Spark, HBase, and Hive. His technical strengths also include Elasticsearch, Kafka, Java, YARN, Sqoop, and Flume. He likes spending time performing research and development on different big-data technologies. He is the coauthor of the books Learning YARN and Hive Cookbook, a certified Hadoop developer, and he has also written various technical papers.

See other products by Shrey Mehrotra

Grade

Akash Grade is a data engineer living in New Delhi, India. Akash graduated with a BSc in computer science from the University of Delhi in 2011, and later earned an MSc in software engineering from BITS Pilani. He spends most of his time designing highly scalable data pipeline using big-data solutions such as Apache Spark, Hive, and Kafka. Akash is also a Databricks-certified Spark developer. He has been working on Apache Spark for the last five years, and enjoys writing applications in Python, Go, and SQL.

See other products by Grade