Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Hands-On Data Analysis with Scala Perform data collection, processing, manipulation, and visualization with Scala

Product type Paperback

Published in May 2019

Publisher Packt

ISBN-13 9781789346114

Length 298 pages

Edition 1st Edition

Languages

Scala

Tools

Apache Spark

Concepts

Data Analysis

Author (1):

Rajesh Gupta

View More author details

Table of Contents (14) Chapters

Preface

1. Section 1: Scala and Data Analysis Life Cycle FREE CHAPTER

2. Scala Overview

3. Data Analysis Life Cycle

4. Data Ingestion

5. Data Exploration and Visualization

6. Applying Statistics and Hypothesis Testing

7. Section 2: Advanced Data Analysis and Machine Learning

8. Introduction to Spark for Distributed Data Analysis

9. Traditional Machine Learning for Data Analysis

10. Section 3: Real-Time Data Analysis and Scalability

11. Near Real-Time Data Analysis Using Streaming

12. Working with Data at Scale

13. Another Book You May Enjoy

Leave a review - let other readers know what you think

Sourcing data using Spark

Spark provides a mechanism to work with a variety of data sources and formats. It also has excellent support for integrating with the Hadoop Distributed File System (HDFS), as well as several other popular storage systems, such as Amazon S3. In this section, we will focus on the variety of data sources and formats supported by Spark.

Parquet file format

Apache Parquet (https://parquet.apache.org/) is an open source project and defines the specifications of a columnar data storage format. This storage format is extremely popular in the big data world for the following reasons:

It supports nested data structures, which is good because most real-world data fits more naturally into a nested structure...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (1)

Gupta

Rakesh Gupta is best known as an automation champion in the Salesforce ecosystem. He has written over 150 articles on Visual Workflow and Process Builder to show how someone can use Visual Workflow and Process Builder to minimize code usage. He is one of the Visual Workflow and Process Builder experts in the industry. He has trained more than 700 individual professionals around the globe and conducted corporate training. Currently, Rakesh is working as a Salesforce solution architect consultant.

See other products by Gupta