Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from R Data Analysis Cookbook, Second Edition Customizable R Recipes for data mining, data visualization and time series analysis

Product type Paperback

Published in Sep 2017

Publisher Packt

ISBN-13 9781787124479

Length 560 pages

Edition 2nd Edition

Languages

Tools

MongoDB

Concepts

Data Analysis

Authors (3):

Kuntal Ganguly

Shanthi Viswanathan

Viswa Viswanathan

View More author details

Table of Contents (14) Chapters

Preface

1. Acquire and Prepare the Ingredients - Your Data FREE CHAPTER

2. What's in There - Exploratory Data Analysis

3. Where Does It Belong? Classification

4. Give Me a Number - Regression

5. Can you Simplify That? Data Reduction Techniques

6. Lessons from History - Time Series Analysis

7. How does it look? - Advanced data visualization

8. This may also interest you - Building Recommendations

9. It's All About Your Connections - Social Network Analysis

10. Put Your Best Foot Forward - Document and Present Your Analysis

11. Work Smarter, Not Harder - Efficient and Elegant R Code

12. Where in the World? Geospatial Analysis

13. Playing Nice - Connecting to Other Systems

Introduction

Data is everywhere and the amount of digital data that exists is growing rapidly, that is projected to grow to 180 zettabytes by 2025. Data Science is a field that tries to extract insights and meaningful information from structured and unstructured data through various stages such as asking questions, getting the data, exploring the data, modeling the data, and communicating result as shown in the following diagaram:

Data scientists or analysts often need to load or collect data from various resources having different input formats into R. Although R has its own native data format, data usually exists in text formats, such as Comma Separated Values (CSV), JavaScript Object Notation (JSON), and Extensible Markup Language (XML). This chapter provides recipes to load such data into your R system for processing.

Raw, real-world datasets are often messy with missing values, unusable format, and outliers. Very rarely can we start analyzing data immediately after loading it. Often, we will need to preprocess the data to clean, impute, wrangle, and transform it before embarking on analysis. This chapter provides recipes for some common cleaning, missing value imputation, outlier detection, and preprocessing steps.

You have been reading a chapter from

R Data Analysis Cookbook, Second Edition - Second Edition

Published in: Sep 2017

Publisher: Packt

ISBN-13: 9781787124479

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (3)

Kuntal Ganguly

Kuntal Ganguly is a big data analytics engineer focused on building large-scale, data-driven systems using big data frameworks and machine learning. He has around 7 years experience of building big data and machine learning applications. Kuntal provides solutions to cloud customers in building real-time analytics systems using managed cloud services and open source Hadoop ecosystem technologies such as Spark, Kafka, Storm, Solr, and so on, along with machine learning and deep learning frameworks. Kuntal enjoys hands-on software development and has single-handedly conceived, architected, developed, and deployed several large-scale distributed applications. He is a machine learning and deep learning practitioner and is very passionate about building intelligent applications.

See other products by Kuntal Ganguly

Viswanathan

nan

See other products by Viswanathan

Viswa Viswanathan

Viswa Viswanathan is an associate professor of Computing and Decision Sciences at the Stillman School of Business in Seton Hall University. After completing his PhD in Artificial Intelligence, Viswa spent a decade in academia and then switched to a leadership position in the software industry for another decade during which he worked for Infosys, Igate, and Starbase. He embraced academia once again in 2001. Viswa has taught extensively in fields ranging from operations research, computer science, software engineering, management information systems, and enterprise systems. In addition to university teaching, Viswa has conducted training programs for industry professionals and has written several peer-reviewed research publications in journals such as Operations Research, IEEE Software, Computers and Industrial Engineering, and International Journal of Artificial Intelligence in Education. He has authored a book titled Data Analytics with R:A hands-on approach. Viswa thoroughly enjoys hands-on software development and has single-handedly conceived, architected, developed, and deployed several web-based applications. Apart from his deep interest in technical fields such as data analytics, artificial intelligence, computer science, and software engineering, Viswa harbors a deep interest in education with special emphasis on the roots of learning and methods to foster deeper learning. He has done research in this area and hopes to pursue the subject further. Viswa would like to express deep gratitude to professors Amitava Bagchi and Anup Sen, who were inspirational forces during his early research career. He is also grateful to several extremely intelligent colleagues, notable among them being Rajesh Venkatesh, Dan Richner, and Sriram Bala, who significantly shaped his thinking. His aunt, Analdavalli; his sister, Sankari; and his wife, Shanthi, taught him much about hard work, and even the little he has absorbed has helped him immensely. His sons, Nitin and Siddarth, have helped with numerous insightful comments on various topics.

See other products by Viswa Viswanathan