Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Hands-On Data Analysis with Scala Perform data collection, processing, manipulation, and visualization with Scala

Product type Paperback

Published in May 2019

Publisher Packt

ISBN-13 9781789346114

Length 298 pages

Edition 1st Edition

Languages

Scala

Tools

Apache Spark

Concepts

Data Analysis

Author (1):

Rajesh Gupta

View More author details

Table of Contents (14) Chapters

Preface

1. Section 1: Scala and Data Analysis Life Cycle FREE CHAPTER

2. Scala Overview

3. Data Analysis Life Cycle

4. Data Ingestion

5. Data Exploration and Visualization

6. Applying Statistics and Hypothesis Testing

7. Section 2: Advanced Data Analysis and Machine Learning

8. Introduction to Spark for Distributed Data Analysis

9. Traditional Machine Learning for Data Analysis

10. Section 3: Real-Time Data Analysis and Scalability

11. Near Real-Time Data Analysis Using Streaming

12. Working with Data at Scale

13. Another Book You May Enjoy

Leave a review - let other readers know what you think

Sampling data

To explore large datasets, it is generally useful to work with a smaller sample of data first. For example, from a dataset consisting of 100 million records, we could take a sample of 1,000 records and start exploring some important properties of this data. Exploring the entire dataset would be ideal; however, the time required to do so would increase manifold.

Selecting the sample

For working with samples, it is important that sample selection is done carefully and biases are not introduced unnecessarily. Randomness plays a very important role in this.

Let's look at how we can make use of the Scala collection API to select sample data from a dataset:

Create a list of 1000 numbers using Scala's Range...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (1)

Gupta

Rakesh Gupta is best known as an automation champion in the Salesforce ecosystem. He has written over 150 articles on Visual Workflow and Process Builder to show how someone can use Visual Workflow and Process Builder to minimize code usage. He is one of the Visual Workflow and Process Builder experts in the industry. He has trained more than 700 individual professionals around the globe and conducted corporate training. Currently, Rakesh is working as a Salesforce solution architect consultant.

See other products by Gupta