Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Hands-On Machine Learning with IBM Watson Leverage IBM Watson to implement machine learning techniques and algorithms using Python

Product type Paperback

Published in Mar 2019

Publisher Packt

ISBN-13 9781789611854

Length 288 pages

Edition 1st Edition

Languages

Python

Tools

IBM Watson

Concepts

Machine Learning

Author (1):

James D. Miller

View More author details

Table of Contents (15) Chapters

Preface

1. Section 1: Introduction and Foundation

2. Introduction to IBM Cloud FREE CHAPTER

3. Feature Extraction - A Bag of Tricks

4. Supervised Machine Learning Models for Your Data

5. Implementing Unsupervised Algorithms

6. Section 2: Tools and Ingredients for Machine Learning in IBM Cloud

7. Machine Learning Workouts on IBM Cloud

8. Using Spark with IBM Watson Studio

9. Deep Learning Using TensorFlow on the IBM Cloud

10. Section 3: Real-Life Complete Case Studies

11. Creating a Facial Expression Platform on IBM Cloud

12. The Automated Classification of Lithofacies Formation Using ML

13. Building a Cloud-Based Multibiometric Identity Authentication Platform

14. Another Book You May Enjoy

Leave a review - let other readers know what you think

Introduction to Apache Spark

Before we get going on creating any kind of a pipeline, we should take a minute to familiarize ourselves with what Spark is and what it offers us.

Spark, built for both speed and ease of use, is a superfast open source engine that was designed with the large-scale processing of data in mind.

Through the advanced Directed Acyclic Graph (DAG) execution engine that supports cyclic data flow and in-memory computing, programs and scripts can run up to 100 times faster than Hadoop MapReduce in memory or 10 times faster on disk.

Spark consists of the following components:

Spark Core: This is the underlying engine of Spark, utilizing the fundamental programming abstraction called Resilient Distributed Datasets (RDDs). RDDs are small logical chunks of data Spark uses as "object collections".
Spark SQL: This provides a new data abstraction called...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (1)

James D. Miller

James D. Miller is an IBM certified expert, Master Consultant, Application/System Architect with +35 years of applications & system design/development experience across multiple platforms, technologies and data formats, including Big Data. His experience includes IBM Planning Analytics, BI, Web architecture & design, systems analysis, GUI design & testing, Data modeling, design, and development of OLAP, Client/Server, Web & Mainframe applications and systems utilizing: Planning Analytics Workspace (PAW), IBM Watson Analytics, Cognos BI & TM1, Framework Manager, dynaSight/ArcPlan, ASP, DHTML, XML, MS Visual Basic, VBA, PERL, R, SPLUNK, MS SQL Server, ORACLE, etc. He has authored numerous books, including Implementing Splunk - Second Edition; Mastering Splunk; Hands-On Machine Learning with IBM Watson; IBM Watson Projects; Statistics for Data Science; Mastering Predictive Analytics with R - Second Edition and others. Project areas include those with Data Analytics, Planning Analytics, and FOPM projects, holding various roles from architect, developer, technical and project leader.

See other products by James D. Miller