Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Modern Scala Projects
Modern Scala Projects

Modern Scala Projects: Leverage the power of Scala for building data-driven and high performance projects

eBook
AU$41.99 AU$60.99
Paperback
AU$75.99
Subscription
Free Trial
Renews at AU$24.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Table of content icon View table of contents Preview book icon Preview Book

Modern Scala Projects

Build a Breast Cancer Prognosis Pipeline with the Power of Spark and Scala

Breast cancer is the leading cause of death among women each year, leaving others in various stages of the disease. Lately, machine learning (ML) has shown great promise for physicians and researchers working towards better outcomes and lowering the cost of treatment. With that in mind, the Wisconsin Breast Cancer Data Set represents a combination of suitable features that are useful enough to generate ML models, models that are able to predict a future diagnostic outcome by learning from predetermined or historical breast mass tissue sample data.

Here is  the dataset we refer to:

  • UCI Machine Learning Repository: Breast Cancer Wisconsin (Original) Data Set
  • UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data Set
  • Accessed July 13, 2018
  • Website URL: https:/...

Breast cancer classification problem

At the moment supervised learning is the most common class of ML problems in the business domain. In Chapter 1, Predict the Class of a Flower from the Iris Dataset, we approached the Iris classification task by employing a powerful supervised learning classification algorithm called Random Forests, which at its core depends on a categorical response variable. In this chapter, besides the Random Forest approach, we also turn to yet another intriguing yet popular classification technique, called logistic regression. Both approaches present a unique solution to the prediction problem of breast cancer prognosis, while an iterative learning process is a common denominator. The logistic regression technique occupies center stage in this chapter, taking precedence over Random Forests. However, both learn from a test dataset containing...

Getting started

The best way to get started is by understanding the bigger picture—gauging the magnitude of the work ahead of us. In this sense, we have identified two broad tasks:

  • Setting up the prerequisite software.
  • Developing two pipelines, starting with data collection and building a workflow sequence that could end with predictions. Those pipelines are as follows:
  • A Random Forests pipeline
  • A logistical regression pipeline

We will talk about setting up the prerequisite software in the next section.

Setting up prerequisite software

First, please refer back to the Setting up the prerequisite software section in Chapter 1, Predict the Class of a Flower from the Iris Dataset, to review your existing infrastructure...

Random Forest breast cancer pipeline

A good way to start this section off is to download the Skeleton SBT project archive file from the ModernScalaProjects_Code folder. Here is the structure of the Skeleton project:

Project structure

Instructions to readers: Copy and paste the file into a folder of your choice before extracting it. Import this project into IntelliJ, drill down to the package "com.packt.modern.chapter", and rename it "com.packt.modern.chapter2". If you would rather choose a different name, choose something appropriate. The breast cancer pipeline project is already set up with build.sbt, plugins.sbt, and build.properties. You only need to make appropriate changes to the organization element in build.sbt. Once these changes are done, you are all set for development. For an explanation of dependency entries in build.sbt, please refer...

LR breast cancer pipeline

Before getting down to the implementation of a logistic regression pipeline, refer back to the earlier table in section Breast cancer dataset at a glance where nine breast cancer tissue sample characteristics (features) are listed, along with one class column. To recap, those characteristics or features are listed as follows for context:

  • clump_thickness
  • size_uniformity
  • shape_uniformity
  • marginal_adhesion
  • epithelial_size
  • bare_nucleoli
  • bland_chromatin
  • normal_nucleoli
  • mitoses

Now, let's get down to high-level formulation of the logistic regression approach in terms of what it is meant to achieve. The following diagram represents the elements of such a formulation at a high level:

Breast cancer classification formulation

The preceding diagram represents a high-level formulation of a logistic classifier pipeline that we are aware...

Summary

In this chapter, we learned how to implement a binary classification task using two approaches such as, an ML pipeline using the Random Forest algorithm and an secondly using the logistic regression method. 

Both pipelines combined several stages of data analysis into one workflow. In both pipelines, we calculated metrics to give us an estimate of how well our classifier performed. Early on in our data analysis task, we introduced a data preprocessing step to get rid of rows that were missing attribute values that were filled in by a placeholder, ?. With 16 rows of unavailable attribute values eliminated and 683 rows with attribute values still available, we constructed a new DataFrame.

In each pipeline, we also created training, training, and validation datasets, followed by a training phase where we fit the models on training data. As with every ML task...

Questions

We will now list a set of questions to test your knowledge of what you have learned so far:

  • What do you understand by logistical regression? Why is it important?
  • How does logistical regression differ from linear regression?
  • Name one powerful feature of BinaryClassifier.
  • What are the feature variables in relation to the breast cancer dataset?

The breast cancer dataset problem is a classification task that can be approached with other machine learning algorithms as well. Prominent among other techniques are Support Vector Machine (SVM), k-nearest neighbor, and decision trees. When you run the pipelines developed in this chapter, compare the time it took to build a model in each case and how many of the input rows of the dataset were classified correctly by each algorithm.

This concludes this chapter. The next chapter implements a new kind of pipeline, which is a stock...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Gain hands-on experience in building data science projects with Scala
  • Exploit the powerful functionalities of machine learning libraries
  • Use machine learning algorithms and decision tree models for enterprise apps

Description

Scala is both a functional programming and object-oriented programming language designed to express common programming patterns in a concise, readable, and type-safe way. Complete with step-by-step instructions, Modern Scala Projects will guide you in exploring Scala capabilities and learning best practices. Along the way, you'll build applications for professional contexts while understanding the core tasks and components. You’ll begin with a project for predicting the class of a flower by implementing a simple machine learning model. Next, you'll create a cancer diagnosis classification pipeline, followed by tackling projects delving into stock price prediction, spam filtering, fraud detection, and a recommendation engine. The focus will be on application of ML techniques that classify data and make predictions, with an emphasis on automating data workflows with the Spark ML pipeline API. The book also showcases the best of Scala’s functional libraries and other constructs to help you roll out your own scalable data processing frameworks. By the end of this Scala book, you’ll have a firm foundation in Scala programming and have built some interesting real-world projects to add to your portfolio.

Who is this book for?

If you’re a Scala developer looking to gain hands-on experience building some interesting real-world projects, this book is for you. Prior programming experience with Scala is necessary to understand the concepts covered in this book.

What you will learn

  • Create pipelines to extract data for analytics and visualizations
  • Automate your process pipeline with jobs that are reproducible
  • Extract intelligent data efficiently from large, disparate datasets
  • Automate the extraction, transformation, and loading of data
  • Develop tools that collate, model, and analyze data
  • Maintain data integrity as data flows become more complex
  • Develop tools that predict outcomes based on pattern discovery
  • Build fast and accurate machine learning models in Scala
Estimated delivery fee Deliver to Australia

Economy delivery 7 - 10 business days

AU$19.95

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 30, 2018
Length: 334 pages
Edition : 1st
Language : English
ISBN-13 : 9781788624114
Category :
Languages :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Estimated delivery fee Deliver to Australia

Economy delivery 7 - 10 business days

AU$19.95

Product Details

Publication date : Jul 30, 2018
Length: 334 pages
Edition : 1st
Language : English
ISBN-13 : 9781788624114
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
AU$24.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
AU$249.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just AU$5 each
Feature tick icon Exclusive print discounts
AU$349.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just AU$5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total AU$ 212.97
Scala Programming Projects
AU$75.99
Professional Scala
AU$60.99
Modern Scala Projects
AU$75.99
Total AU$ 212.97 Stars icon

Table of Contents

8 Chapters
Predict the Class of a Flower from the Iris Dataset Chevron down icon Chevron up icon
Build a Breast Cancer Prognosis Pipeline with the Power of Spark and Scala Chevron down icon Chevron up icon
Stock Price Predictions Chevron down icon Chevron up icon
Building a Spam Classification Pipeline Chevron down icon Chevron up icon
Build a Fraud Detection System Chevron down icon Chevron up icon
Build Flights Performance Prediction Model Chevron down icon Chevron up icon
Building a Recommendation Engine Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela