Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Machine Learning with Apache Spark Quick Start Guide Uncover patterns, derive actionable insights, and learn from big data using MLlib

Product type Paperback

Published in Dec 2018

Publisher Packt

ISBN-13 9781789346565

Length 240 pages

Edition 1st Edition

Languages

Java

Tools

Apache Spark

Concepts

Big Data

Author (1):

Jillur Quddus

View More author details

Table of Contents (10) Chapters

Preface

1. The Big Data Ecosystem FREE CHAPTER

2. Setting Up a Local Development Environment

3. Artificial Intelligence and Machine Learning

4. Supervised Learning Using Apache Spark

5. Unsupervised Learning Using Apache Spark

6. Natural Language Processing Using Apache Spark

7. Deep Learning Using Apache Spark

8. Real-Time Machine Learning Using Apache Spark

9. Other Books You May Enjoy

Leave a review - let other readers know what you think

Machine learning pipelines in Apache Spark

To end this chapter, we will take a look at how Apache Spark can be used to implement the algorithms that we have previously discussed by taking a look at how its machine learning library, MLlib, works under the hood. MLlib provides a suite of tools designed to make machine learning accessible, scalable, and easy to deploy.

Note that as of Spark 2.0, the MLlib RDD-based API is in maintenance mode. The examples in this book will use the DataFrame-based API, which is now the primary API for MLlib. For more information, please visit https://spark.apache.org/docs/latest/ml-guide.html.

At a high level, the typical implementation of machine learning models can be thought of as an ordered pipeline of algorithms, as follows:

Feature extraction, transformation, and selection
Train a predictive model based on these feature vectors and labels
Make...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (1)

Quddus

Jillur Quddus is a lead technical architect, polyglot software engineer and data scientist with over 10 years of hands-on experience in architecting and engineering distributed, scalable, high-performance, and secure solutions used to combat serious organized crime, cybercrime, and fraud. Jillur has extensive experience of working within central government, intelligence, law enforcement, and banking, and has worked across the world including in Japan, Singapore, Malaysia, Hong Kong, and New Zealand. Jillur is both the founder of Keisan, a UK-based company specializing in open source distributed technologies and machine learning, and the lead technical architect at Methods, the leading digital transformation partner for the UK public sector.

See other products by Quddus