Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Hands-On Data Analysis with Scala Perform data collection, processing, manipulation, and visualization with Scala

Product type Paperback

Published in May 2019

Publisher Packt

ISBN-13 9781789346114

Length 298 pages

Edition 1st Edition

Languages

Scala

Tools

Apache Spark

Concepts

Data Analysis

Author (1):

Rajesh Gupta

View More author details

Table of Contents (14) Chapters

Preface

1. Section 1: Scala and Data Analysis Life Cycle FREE CHAPTER

2. Scala Overview

3. Data Analysis Life Cycle

4. Data Ingestion

5. Data Exploration and Visualization

6. Applying Statistics and Hypothesis Testing

7. Section 2: Advanced Data Analysis and Machine Learning

8. Introduction to Spark for Distributed Data Analysis

9. Traditional Machine Learning for Data Analysis

10. Section 3: Real-Time Data Analysis and Scalability

11. Near Real-Time Data Analysis Using Streaming

12. Working with Data at Scale

13. Another Book You May Enjoy

Leave a review - let other readers know what you think

Finding a relationship between data elements

Once we have a decent understanding of the data and some of its main properties, the next step is to find a concrete relationship between data elements. We can use some of the well-established statistical techniques to understand the distribution of data.

Let's continue with our Spark example from the previous section by comparing Total Population to Total Households. We can expect the two numbers to be strongly correlated:

println("Covariance: " + df.stat.cov("Total Population", "Total Households"))
println("Correlation: " + df.stat.corr("Total Population", "Total Households"))

The output from this would be something like this:

Covariance: 1.2338126298368526E8
Correlation: 0.9090567549637986

As expected, we see the correlation coefficient value closer to 1, indicating a...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (1)

Gupta

Rakesh Gupta is best known as an automation champion in the Salesforce ecosystem. He has written over 150 articles on Visual Workflow and Process Builder to show how someone can use Visual Workflow and Process Builder to minimize code usage. He is one of the Visual Workflow and Process Builder experts in the industry. He has trained more than 700 individual professionals around the globe and conducted corporate training. Currently, Rakesh is working as a Salesforce solution architect consultant.

See other products by Gupta