Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Programming MapReduce with Scalding

You're reading from   Programming MapReduce with Scalding A practical guide to designing, testing, and implementing complex MapReduce applications in Scala

Arrow left icon
Product type Paperback
Published in Jun 2014
Publisher
ISBN-13 9781783287017
Length 148 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Antonios Chalkiopoulos Antonios Chalkiopoulos
Author Profile Icon Antonios Chalkiopoulos
Antonios Chalkiopoulos
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Introduction to MapReduce FREE CHAPTER 2. Get Ready for Scalding 3. Scalding by Example 4. Intermediate Examples 5. Scalding Design Patterns 6. Testing and TDD 7. Running Scalding in Production 8. Using External Data Stores 9. Matrix Calculations and Machine Learning Index

Other libraries


For mining massive datasets, we can utilize the Algebird abstract algebra library for Scala, also open sourced by Twitter. The code was originally developed as part of the Scalding Matrix API. As it had broader applications in aggregation systems, such as Scalding and Storm, it became a separate library.

Locality Sensitivity Hashing is a technique that minimizes the data space and can provide an approximate similarity. It is based on the idea that items that have high-dimensional properties can be hashed into a smaller space but still produce results with high accuracy.

An implementation of the approximate Jaccard item-similarity using Locality Sensitive Hashing (LSH) is provided in the source code accompanying this book.

Another interesting open source project that integrates Mahout vectors into Scalding and provides implementations of Naive Bayes classifiers and K-Means is Ganitha, which can be found at https://github.com/tresata/ganitha. This library, among others, simplifies...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime