Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Hands-On Data Analysis with Scala Perform data collection, processing, manipulation, and visualization with Scala

Product type Paperback

Published in May 2019

Publisher Packt

ISBN-13 9781789346114

Length 298 pages

Edition 1st Edition

Languages

Scala

Tools

Apache Spark

Concepts

Data Analysis

Author (1):

Rajesh Gupta

View More author details

Table of Contents (14) Chapters

Preface

1. Section 1: Scala and Data Analysis Life Cycle FREE CHAPTER

2. Scala Overview

3. Data Analysis Life Cycle

4. Data Ingestion

5. Data Exploration and Visualization

6. Applying Statistics and Hypothesis Testing

7. Section 2: Advanced Data Analysis and Machine Learning

8. Introduction to Spark for Distributed Data Analysis

9. Traditional Machine Learning for Data Analysis

10. Section 3: Real-Time Data Analysis and Scalability

11. Near Real-Time Data Analysis Using Streaming

12. Working with Data at Scale

13. Another Book You May Enjoy

Leave a review - let other readers know what you think

Streaming a k-means clustering algorithm using Spark

The k-means algorithm is an unsupervised machine learning (ML) clustering algorithm. The objective of this algorithm is to build k centers around which data points are centered, thereby forming k clusters. The most common implementation of this algorithm is generally done using batch-oriented processing. Streaming-based clustering algorithms are also available for this, with the following properties:

The k clusters are built using initial data
As new data arrives in minibatches, existing k clusters are updated to compute new k clusters
It also possible to control the decay or decrease in the significance of older data

At a high level, the preceding steps are quite similar to the word count problem that we solved using the streaming solution. The goal of the k-means algorithm is to partition the data into k clusters. If the...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (1)

Gupta

Rakesh Gupta is best known as an automation champion in the Salesforce ecosystem. He has written over 150 articles on Visual Workflow and Process Builder to show how someone can use Visual Workflow and Process Builder to minimize code usage. He is one of the Visual Workflow and Process Builder experts in the industry. He has trained more than 700 individual professionals around the globe and conducted corporate training. Currently, Rakesh is working as a Salesforce solution architect consultant.

See other products by Gupta