Search icon CANCEL
Subscription
0
Cart icon
Cart
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Spark Cookbook

You're reading from  Spark Cookbook

Product type Book
Published in Jul 2015
Publisher
ISBN-13 9781783987061
Pages 226 pages
Edition 1st Edition
Languages
Author (1):
Rishi Yadav Rishi Yadav
Profile icon Rishi Yadav
Toc

Table of Contents (19) Chapters close

Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
1. Getting Started with Apache Spark 2. Developing Applications with Spark 3. External Data Sources 4. Spark SQL 5. Spark Streaming 6. Getting Started with Machine Learning Using MLlib 7. Supervised Learning with MLlib – Regression 8. Supervised Learning with MLlib – Classification 9. Unsupervised Learning with MLlib 10. Recommender Systems 11. Graph Processing Using GraphX 12. Optimizations and Performance Tuning Index

Optimizing garbage collection


JVM garbage collection can be a challenge if you have a lot of short lived RDDs. JVM needs to go over all the objects to find the ones it needs to garbage collect. The cost of the garbage collection is proportional to the number of objects the GC needs to go through. Therefore, using fewer objects and the data structures that use fewer objects (simpler data structures, such as arrays) helps.

Serialization also shines here as a byte array needs only one object to be garbage collected.

By default, Spark uses 60 percent of the executor memory to cache RDDs and the rest 40 percent for regular objects. Sometimes, you may not need 60 percent for RDDs and can reduce this limit so that more space is available for object creation (less need for GC).

How to do it…

You can set the memory allocated for RDD cache to 40 percent by starting the Spark shell and setting the memory fraction:

$ spark-shell --conf spark.storage.memoryFraction=0.4
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}