Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Apache Mahout Essentials

You're reading from   Apache Mahout Essentials Implement top-notch machine learning algorithms for classification, clustering, and recommendations with Apache Mahout

Arrow left icon
Product type Paperback
Published in Jun 2015
Publisher
ISBN-13 9781783554997
Length 164 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Jayani Withanawasam Jayani Withanawasam
Author Profile Icon Jayani Withanawasam
Jayani Withanawasam
Arrow right icon
View More author details
Toc

Apache Mahout

In this section, we will have a quick look at Apache Mahout.

Do you know how Mahout got its name?

Apache Mahout

As you can see in the logo, a mahout is a person who drives an elephant. Hadoop's logo is an elephant. So, this is an indicator that Mahout's goal is to use Hadoop in the right manner.

The following are the features of Mahout:

  • It is a project of the Apache software foundation
  • It is a scalable machine learning library
    • The MapReduce implementation scales linearly with the data
    • Fast sequential algorithms (the runtime does not depend on the size of the dataset)
  • It mainly contains clustering, classification, and recommendation (collaborative filtering) algorithms
  • Here, machine learning algorithms can be executed in sequential (in-memory mode) or distributed mode (MapReduce is enabled)
  • Most of the algorithms are implemented using the MapReduce paradigm
  • It runs on top of the Hadoop framework for scaling
  • Data is stored in HDFS (data storage) or in memory
  • It is a Java library (no user interface!)
  • The latest released version is 0.9, and 1.0 is coming soon
  • It is not a domain-specific but a general purpose library

Note

For those of you who are curious! What are the problems that Mahout is trying to solve? The following problems that Mahout is trying to solve:

The amount of available data is growing drastically.

The computer hardware market is geared toward providing better performance in computers. Machine learning algorithms are computationally expensive algorithms. However, there was no framework sufficient to harness the power of hardware (multicore computers) to gain better performance.

The need for a parallel programming framework to speed up machine learning algorithms.

Mahout is a general parallelization for machine learning algorithms (the parallelization method is not algorithm-specific).

No specialized optimizations are required to improve the performance of each algorithm; you just need to add some more cores.

Linear speed up with number of cores.

Each algorithm, such as Naïve Bayes, K-Means, and Expectation-maximization, is expressed in the summation form. (I will explain this in detail in future chapters.)

For more information, please read Map-Reduce for Machine Learning on Multicore, which can be found at http://www.cs.stanford.edu/people/ang/papers/nips06-mapreducemulticore.pdf.

Setting up Apache Mahout

Download the latest release of Mahout from https://mahout.apache.org/general/downloads.html.

If you are referencing Mahout as a Maven project, add the following dependency in the pom.xml file:

<dependency>
  <groupId>org.apache.mahout</groupId>
  <artifactId>mahout-core</artifactId>
  <version>${mahout.version}</version>
</dependency>

If required, add the following Maven dependencies as well:

<dependency>
  <groupId>org.apache.mahout</groupId>
  <artifactId>mahout-math</artifactId>
  <version>${mahout.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.mahout</groupId>
  <artifactId>mahout-integration</artifactId>
  <version>${mahout.version}</version>
</dependency>

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

More details on setting up a Maven project can be found at http://maven.apache.org/.

Follow the instructions given at https://mahout.apache.org/developers/buildingmahout.html to build Mahout from the source.

The Mahout command-line launcher is located at bin/mahout.

You have been reading a chapter from
Apache Mahout Essentials
Published in: Jun 2015
Publisher:
ISBN-13: 9781783554997
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime