Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Modern Scala Projects
Modern Scala Projects

Modern Scala Projects: Leverage the power of Scala for building data-driven and high performance projects

eBook
€32.99
Paperback
€41.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Table of content icon View table of contents Preview book icon Preview Book

Modern Scala Projects

Build a Breast Cancer Prognosis Pipeline with the Power of Spark and Scala

Breast cancer is the leading cause of death among women each year, leaving others in various stages of the disease. Lately, machine learning (ML) has shown great promise for physicians and researchers working towards better outcomes and lowering the cost of treatment. With that in mind, the Wisconsin Breast Cancer Data Set represents a combination of suitable features that are useful enough to generate ML models, models that are able to predict a future diagnostic outcome by learning from predetermined or historical breast mass tissue sample data.

Here is  the dataset we refer to:

  • UCI Machine Learning Repository: Breast Cancer Wisconsin (Original) Data Set
  • UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data Set
  • Accessed July 13, 2018
  • Website URL: https:/...

Breast cancer classification problem

At the moment supervised learning is the most common class of ML problems in the business domain. In Chapter 1, Predict the Class of a Flower from the Iris Dataset, we approached the Iris classification task by employing a powerful supervised learning classification algorithm called Random Forests, which at its core depends on a categorical response variable. In this chapter, besides the Random Forest approach, we also turn to yet another intriguing yet popular classification technique, called logistic regression. Both approaches present a unique solution to the prediction problem of breast cancer prognosis, while an iterative learning process is a common denominator. The logistic regression technique occupies center stage in this chapter, taking precedence over Random Forests. However, both learn from a test dataset containing...

Getting started

The best way to get started is by understanding the bigger picture—gauging the magnitude of the work ahead of us. In this sense, we have identified two broad tasks:

  • Setting up the prerequisite software.
  • Developing two pipelines, starting with data collection and building a workflow sequence that could end with predictions. Those pipelines are as follows:
  • A Random Forests pipeline
  • A logistical regression pipeline

We will talk about setting up the prerequisite software in the next section.

Setting up prerequisite software

First, please refer back to the Setting up the prerequisite software section in Chapter 1, Predict the Class of a Flower from the Iris Dataset, to review your existing infrastructure...

Random Forest breast cancer pipeline

A good way to start this section off is to download the Skeleton SBT project archive file from the ModernScalaProjects_Code folder. Here is the structure of the Skeleton project:

Project structure

Instructions to readers: Copy and paste the file into a folder of your choice before extracting it. Import this project into IntelliJ, drill down to the package "com.packt.modern.chapter", and rename it "com.packt.modern.chapter2". If you would rather choose a different name, choose something appropriate. The breast cancer pipeline project is already set up with build.sbt, plugins.sbt, and build.properties. You only need to make appropriate changes to the organization element in build.sbt. Once these changes are done, you are all set for development. For an explanation of dependency entries in build.sbt, please refer...

LR breast cancer pipeline

Before getting down to the implementation of a logistic regression pipeline, refer back to the earlier table in section Breast cancer dataset at a glance where nine breast cancer tissue sample characteristics (features) are listed, along with one class column. To recap, those characteristics or features are listed as follows for context:

  • clump_thickness
  • size_uniformity
  • shape_uniformity
  • marginal_adhesion
  • epithelial_size
  • bare_nucleoli
  • bland_chromatin
  • normal_nucleoli
  • mitoses

Now, let's get down to high-level formulation of the logistic regression approach in terms of what it is meant to achieve. The following diagram represents the elements of such a formulation at a high level:

Breast cancer classification formulation

The preceding diagram represents a high-level formulation of a logistic classifier pipeline that we are aware...

Summary

In this chapter, we learned how to implement a binary classification task using two approaches such as, an ML pipeline using the Random Forest algorithm and an secondly using the logistic regression method. 

Both pipelines combined several stages of data analysis into one workflow. In both pipelines, we calculated metrics to give us an estimate of how well our classifier performed. Early on in our data analysis task, we introduced a data preprocessing step to get rid of rows that were missing attribute values that were filled in by a placeholder, ?. With 16 rows of unavailable attribute values eliminated and 683 rows with attribute values still available, we constructed a new DataFrame.

In each pipeline, we also created training, training, and validation datasets, followed by a training phase where we fit the models on training data. As with every ML task...

Questions

We will now list a set of questions to test your knowledge of what you have learned so far:

  • What do you understand by logistical regression? Why is it important?
  • How does logistical regression differ from linear regression?
  • Name one powerful feature of BinaryClassifier.
  • What are the feature variables in relation to the breast cancer dataset?

The breast cancer dataset problem is a classification task that can be approached with other machine learning algorithms as well. Prominent among other techniques are Support Vector Machine (SVM), k-nearest neighbor, and decision trees. When you run the pipelines developed in this chapter, compare the time it took to build a model in each case and how many of the input rows of the dataset were classified correctly by each algorithm.

This concludes this chapter. The next chapter implements a new kind of pipeline, which is a stock...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Gain hands-on experience in building data science projects with Scala
  • Exploit the powerful functionalities of machine learning libraries
  • Use machine learning algorithms and decision tree models for enterprise apps

Description

Scala is both a functional programming and object-oriented programming language designed to express common programming patterns in a concise, readable, and type-safe way. Complete with step-by-step instructions, Modern Scala Projects will guide you in exploring Scala capabilities and learning best practices. Along the way, you'll build applications for professional contexts while understanding the core tasks and components. You’ll begin with a project for predicting the class of a flower by implementing a simple machine learning model. Next, you'll create a cancer diagnosis classification pipeline, followed by tackling projects delving into stock price prediction, spam filtering, fraud detection, and a recommendation engine. The focus will be on application of ML techniques that classify data and make predictions, with an emphasis on automating data workflows with the Spark ML pipeline API. The book also showcases the best of Scala’s functional libraries and other constructs to help you roll out your own scalable data processing frameworks. By the end of this Scala book, you’ll have a firm foundation in Scala programming and have built some interesting real-world projects to add to your portfolio.

Who is this book for?

If you’re a Scala developer looking to gain hands-on experience building some interesting real-world projects, this book is for you. Prior programming experience with Scala is necessary to understand the concepts covered in this book.

What you will learn

  • Create pipelines to extract data for analytics and visualizations
  • Automate your process pipeline with jobs that are reproducible
  • Extract intelligent data efficiently from large, disparate datasets
  • Automate the extraction, transformation, and loading of data
  • Develop tools that collate, model, and analyze data
  • Maintain data integrity as data flows become more complex
  • Develop tools that predict outcomes based on pattern discovery
  • Build fast and accurate machine learning models in Scala

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 30, 2018
Length: 334 pages
Edition : 1st
Language : English
ISBN-13 : 9781788625272
Category :
Languages :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want

Product Details

Publication date : Jul 30, 2018
Length: 334 pages
Edition : 1st
Language : English
ISBN-13 : 9781788625272
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 116.97
Scala Programming Projects
€41.99
Professional Scala
€32.99
Modern Scala Projects
€41.99
Total 116.97 Stars icon

Table of Contents

8 Chapters
Predict the Class of a Flower from the Iris Dataset Chevron down icon Chevron up icon
Build a Breast Cancer Prognosis Pipeline with the Power of Spark and Scala Chevron down icon Chevron up icon
Stock Price Predictions Chevron down icon Chevron up icon
Building a Spam Classification Pipeline Chevron down icon Chevron up icon
Build a Fraud Detection System Chevron down icon Chevron up icon
Build Flights Performance Prediction Model Chevron down icon Chevron up icon
Building a Recommendation Engine Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.