0

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

Scala for Machine Learning, Second Edition

You're reading from Scala for Machine Learning, Second Edition Build systems for data processing, machine learning, and deep learning

Product type Paperback

Published in Sep 2017

Publisher Packt

ISBN-13 9781787122383

Length 740 pages

Edition 2nd Edition

Languages

Scala

Tools

Apache Spark

Concepts

Data Processing

Author (1):

Patrick R. Nicolas

View More author details

Table of Contents (21) Chapters

Preface

1. Getting Started FREE CHAPTER

2. Data Pipelines

3. Data Preprocessing

4. Unsupervised Learning

5. Dimension Reduction

6. Naïve Bayes Classifiers

7. Sequential Data Models

8. Monte Carlo Inference

9. Regression and Regularization

10. Multilayer Perceptron

11. Deep Learning

12. Kernel Models and SVM

13. Evolutionary Computing

14. Multiarmed Bandits

15. Reinforcement Learning

16. Parallelism in Scala and Akka

17. Apache Spark MLlib

A. Basic Concepts

B. References

Index

Reusable ML pipelines

ML pipelines have been introduced in Apache Spark 1.4.0. An ML pipeline is a sequence of tasks that can be used to cleanse, filter, train, classify observations, detect anomalies, generate, validate models, and predict outcomes [17:04].

Contrary to the MLlib package classes that rely on RDDs, ML pipeline uses data frame or datasets as input and output of tasks.

Note

Data frame versus Dataset

The class Dataset was introduced in Spark 2.0. Dataset instances are typed (that is, Dataset[T]) while data frames are untyped.

This section is a very brief overview of ML pipelines.

The key ingredients of an ML pipeline are [17:05]:

Transformers are algorithms that can transform one data frame into another data frame. Transformers are stateless.
Estimators are algorithms that can fit on a data frame to produce a Transformer (that is, Estimator.fit).
Pipelines are estimators that weave or chain multiple transformers and estimators together to specify an ML workflow.
Pipeline stages are...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (1)

R. Nicolas

R. Nicolas

Patrick R. Nicolas is the director of engineering at Agile SDE, California. He has more than 25 years of experience in software engineering and building applications in C++, Java, and more recently in Scala/Spark, and has held several managerial positions. His interests include real-time analytics, modeling, and the development of nonlinear models.

See other products by R. Nicolas

Other recommended products

Related to this chapter

Hands-On Markov Models with Python

Hands-On Markov Models with Python

This book will help you become familiar with HMMs and different inference algorithms by working on real-world problems. You will start with an introduction to the basic concepts of Markov chains, Markov processes and then delve deeper into understanding hidden Markov models and its types using practical examples.

Sep 2018 5h 56m

Scala Machine Learning Projects

Scala Machine Learning Projects

Scala is one of the widely used programming language in the world when it comes to handle large amount of data. With the rise of machine learning, data scientists and machine learning experts do prefer scala as a language in order to handle and scale efficient machine learning applications. You will be acquainted with the popular deep/machine learning libraries for Scala such as Spark ML/MLlib, H2O, DeepLearning4j, MXNET etc., and will use their features to build and deploy projects on a framework such as Apache Spark. By the end of this book, you will be able to dominate numerical computing, deep learning, and functional programming to carry out complex advanced tasks with ease.

Jan 2018 15h 40m

Mastering Machine Learning Algorithms

Mastering Machine Learning Algorithms

This book is your guide to quickly get to grips with the most widely used machine learning algorithms. As a data science professional, this book will help you design and train better machine learning models to solve a variety of complex problems, and make the machine learn your requirements.

May 2018 19h 12m

Deep Learning with Hadoop

Deep Learning with Hadoop

Feb 2017 6h 52m

Mastering Machine Learning Algorithms

Mastering Machine Learning Algorithms

A new second edition of the bestselling guide to exploring and mastering the most important algorithms for solving complex machine learning problems, updated to include Python 3.8 and TensorFlow 2.x as well as the latest in new algorithms and techniques.

Jan 2020 26h 36m

Mastering Predictive Analytics with R

Mastering Predictive Analytics with R

R offers a free and open source environment that is perfect for both learning and deploying predictive modeling solutions in the real world. With its constantly growing community and plethora of packages, R offers the functionality to deal with a truly vast array of problems. Updated with revamped examples and to the latest version of R, this book is designed to be both a guide and a reference for moving beyond the basics of predictive modeling.

Aug 2017 14h 56m

Mastering Java Machine Learning

Mastering Java Machine Learning

Master key Java machine learning libraries and their applications with the help of real-world case studies. Explore advanced machine learning techniques such as anomaly detection, stream learning, active learning, semi-supervised learning, probabilistic graph modeling, text mining, deep learning, and big data batch and stream machine learning.

Jul 2017 18h 32m

Hands-On Machine Learning for Algorithmic Trading

Hands-On Machine Learning for Algorithmic Trading

With the help of this book, you'll build smart algorithmic models using machine learning algorithms covering tasks such as time series forecasting, backtesting, trade predictions, and more using easy-to-follow examples. By the end, you'll be able to adopt algorithmic trading in your own business and implement intelligent investigative strategies.

Dec 2018 22h 48m

Python Deep Learning

Python Deep Learning

Starting with a quick recap of important machine learning concepts, the book will delve straight into deep learning principles using Sci-kit learn. Moving ahead, you will learn to use the latest open source libraries such as Theano, Keras, Google's TensorFlow, and H20. Use this guide to uncover the difficulties of pattern recognition, scaling data with greater accuracy and discussing deep learning algorithms and techniques.

Apr 2017 13h 32m

Personalised recommendations for you

Based on your interests and search pattern

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Data Governance Handbook

Data Governance Handbook

This book provides a highly focused view of real business outcomes powered by data governance, that resonate with non-data executives such as CFOs and CEOs. You'll also find useful insights into how to implement data governance initiatives.

May 2024 13h 8m

Data Engineering with Databricks Cookbook

Data Engineering with Databricks Cookbook

This book shows you how to use Apache Spark, Delta Lake, and Databricks to build data pipelines, manage and transform data, optimize performance, and more. Additionally, you'll implement DataOps and DevOps practices, and orchestrate data workflows.

May 2024 14h 36m

Azure Data Engineer Associate Certification Guide

Azure Data Engineer Associate Certification Guide

Unlock the power of Azure data engineering with this certification guide, elevating your skills in data processing, storage, and security with the help of practical insights, hands-on exercises, and the latest advancements.

May 2024 18h 16m

Microsoft Power BI Cookbook

Microsoft Power BI Cookbook

Microsoft Power BI is the most sought-after platform for BI professionals' visualization needs. Explore the latest Power BI features, future AI enhancements, and integration with other Power Platform tools via new recipes in this updated edition.

Jul 2024 19h 56m

Python Data Cleaning Cookbook

Python Data Cleaning Cookbook

The book shows you how to clean, wrangle, and view data from multiple perspectives, including dataset and column attributes. You will cover common and not-so-common challenges that are faced while cleaning messy data for complex situations and learn to manipulate data to get it down to a form that can be useful for making the right decisions.

May 2024 16h 12m

Microsoft Azure AI Fundamentals AI-900 Exam Guide

Microsoft Azure AI Fundamentals AI-900 Exam Guide

This AI-900 study guide will help you prepare and practice for the certification exam. You'll delve into AI workloads, ML principles, computer vision, NLP, knowledge mining, and generative AI using Azure cloud services.

May 2024 9h 36m

Using Stable Diffusion with Python

Using Stable Diffusion with Python

This book shows you how to use Python to control Stable Diffusion and generate high-quality images. In addition to covering the basic usage of the diffusers package, the book provides solutions for extending the package for more advanced purposes.

Jun 2024 11h 44m

Getting Started with DuckDB

Getting Started with DuckDB

This hands-on book teaches you to analyze large datasets with blazing speed and ease. You will learn how to use DuckDB to quickly load, query, transform, analyze, and visualize data effectively through a series of practical examples.

Jun 2024 12h 44m

Databricks Certified Associate Developer for Apache Spark Using Python

Databricks Certified Associate Developer for Apache Spark Using Python

This guide gets you ready for certification with expert-backed content, key exam concepts, and topic reviews. Additionally, you'll be able to make the most of Apache Spark 3.0 to modernize workloads and more using specific tools and techniques.