0

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

R Data Analysis Cookbook, Second Edition

You're reading from R Data Analysis Cookbook, Second Edition Customizable R Recipes for data mining, data visualization and time series analysis

Product type Paperback

Published in Sep 2017

Publisher Packt

ISBN-13 9781787124479

Length 560 pages

Edition 2nd Edition

Languages

R

Tools

MongoDB

Concepts

Data Analysis

Authors (3):

Kuntal Ganguly

Shanthi Viswanathan

Viswa Viswanathan

View More author details

Table of Contents (14) Chapters

Preface

1. Acquire and Prepare the Ingredients - Your Data FREE CHAPTER

2. What's in There - Exploratory Data Analysis

3. Where Does It Belong? Classification

4. Give Me a Number - Regression

5. Can you Simplify That? Data Reduction Techniques

6. Lessons from History - Time Series Analysis

7. How does it look? - Advanced data visualization

8. This may also interest you - Building Recommendations

9. It's All About Your Connections - Social Network Analysis

10. Put Your Best Foot Forward - Document and Present Your Analysis

11. Work Smarter, Not Harder - Efficient and Elegant R Code

12. Where in the World? Geospatial Analysis

13. Playing Nice - Connecting to Other Systems

Removing duplicate cases

We sometimes end up with duplicate cases in our datasets and want to retain only one among them.

Getting ready

Create a sample data frame:

> salary <- c(20000, 30000, 25000, 40000, 30000, 34000, 30000) 
> family.size <- c(4,3,2,2,3,4,3) 
> car <- c("Luxury", "Compact", "Midsize", "Luxury",     "Compact", "Compact", "Compact") 
> prospect <- data.frame(salary, family.size, car)

How to do it...

The unique() function can do the job. It takes a vector or data frame as an argument and returns an object of the same type as its argument, but with duplicates removed.

Remove duplicates to get unique values:

> prospect.cleaned <- unique(prospect) 
> nrow(prospect) 
[1] 7 
> nrow(prospect.cleaned) 
[1] 5

How it works...

The unique() function takes a vector or data frame as an argument and returns a similar object with the duplicate eliminated. It returns the non-duplicated cases as is. For repeated cases, the unique() function includes one copy in the returned result.

There's more...

Sometimes we just want to identify the duplicated values without necessarily removing them.

Identifying duplicates without deleting them

For this, use the duplicated() function:

> duplicated(prospect) 
[1] FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE

From the data, we know that cases 2, 5, and 7 are duplicates. Note that only cases 5 and 7 are shown as duplicates. In the first occurrence, case 2 is not flagged as a duplicate.

To list the duplicate cases, use the following code:

> prospect[duplicated(prospect), ] 
 
  salary family.size     car 
5  30000           3 Compact 
7  30000           3 Compact

You have been reading a chapter from

R Data Analysis Cookbook, Second Edition - Second Edition

Published in: Sep 2017

Publisher: Packt

ISBN-13: 9781787124479

© 2017 Packt Publishing Limited All Rights Reserved

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (3)

Kuntal Ganguly

Kuntal Ganguly

Kuntal Ganguly is a big data analytics engineer focused on building large-scale, data-driven systems using big data frameworks and machine learning. He has around 7 years experience of building big data and machine learning applications. Kuntal provides solutions to cloud customers in building real-time analytics systems using managed cloud services and open source Hadoop ecosystem technologies such as Spark, Kafka, Storm, Solr, and so on, along with machine learning and deep learning frameworks. Kuntal enjoys hands-on software development and has single-handedly conceived, architected, developed, and deployed several large-scale distributed applications. He is a machine learning and deep learning practitioner and is very passionate about building intelligent applications.

See other products by Kuntal Ganguly

Viswanathan

Viswanathan

nan

See other products by Viswanathan

Viswa Viswanathan

Viswa Viswanathan

Viswa Viswanathan is an associate professor of Computing and Decision Sciences at the Stillman School of Business in Seton Hall University. After completing his PhD in Artificial Intelligence, Viswa spent a decade in academia and then switched to a leadership position in the software industry for another decade during which he worked for Infosys, Igate, and Starbase. He embraced academia once again in 2001. Viswa has taught extensively in fields ranging from operations research, computer science, software engineering, management information systems, and enterprise systems. In addition to university teaching, Viswa has conducted training programs for industry professionals and has written several peer-reviewed research publications in journals such as Operations Research, IEEE Software, Computers and Industrial Engineering, and International Journal of Artificial Intelligence in Education. He has authored a book titled Data Analytics with R:A hands-on approach. Viswa thoroughly enjoys hands-on software development and has single-handedly conceived, architected, developed, and deployed several web-based applications. Apart from his deep interest in technical fields such as data analytics, artificial intelligence, computer science, and software engineering, Viswa harbors a deep interest in education with special emphasis on the roots of learning and methods to foster deeper learning. He has done research in this area and hopes to pursue the subject further. Viswa would like to express deep gratitude to professors Amitava Bagchi and Anup Sen, who were inspirational forces during his early research career. He is also grateful to several extremely intelligent colleagues, notable among them being Rajesh Venkatesh, Dan Richner, and Sriram Bala, who significantly shaped his thinking. His aunt, Analdavalli; his sister, Sankari; and his wife, Shanthi, taught him much about hard work, and even the little he has absorbed has helped him immensely. His sons, Nitin and Siddarth, have helped with numerous insightful comments on various topics.

See other products by Viswa Viswanathan

Other recommended products

Related to this chapter

R Programming Fundamentals

R Programming Fundamentals

Data analysis is crucial to accurately predict the performance of an application. The book begins by getting you started with R, including basic programming and data import, data visualization, pivoting, merging, aggregating, and joins. Once you are comfortable with the basics, you can read ahead and learn all about data visualization and graphics. You can learn data management, statistics and applications, forecasting, and reporting. With this various case studies and examples, this book gives you the knowledge to confidently start your career in the field of data science.

Sep 2018 6h 52m

Machine Learning with R Cookbook

Machine Learning with R Cookbook

The R language is a powerful open source functional programming language. At its core, R is a statistical language that provides impressive tools to analyze data and create high-level graphics. This book covers the basics of R by setting up a user-friendly programming environment and programming ETL in R. Data exploration examples are provided that demonstrate how powerful data visualisation and machine learning is in discovering hidden relationships. You will also explore air quality data, steps to fix the missing values and visualising the same. You will then dive into important machine learning topics, including data classification, regression, survival analysis, time series analysis, clustering association rule mining, and dimension reduction.This book will include the latest code and examples based on R 3.3 and above—updated for better computation, accuracy, and speed with R.

Oct 2017 19h 4m

Hands-On Exploratory Data Analysis with R

Hands-On Exploratory Data Analysis with R

Hands-On Exploratory Data Analysis with R puts the complete process of exploratory data analysis into a practical demonstration in one nutshell. You will understand the concepts of data analysis right from data ingestion, data cleaning, data manipulation to applying statistical techniques and visualizing hidden patterns.

May 2019 8h 52m

Hands-On Exploratory Data Analysis with R

Hands-On Exploratory Data Analysis with R

Hands-On Exploratory Data Analysis with R puts the complete process of exploratory data analysis into a practical demonstration in one nutshell. You will understand the concepts of data analysis right from data ingestion, data cleaning, data manipulation to applying statistical techniques and visualizing hidden patterns.

May 2019 8h 52m

Hands-On Exploratory Data Analysis with R

Hands-On Exploratory Data Analysis with R

Hands-On Exploratory Data Analysis with R puts the complete process of exploratory data analysis into a practical demonstration in one nutshell. You will understand the concepts of data analysis right from data ingestion, data cleaning, data manipulation to applying statistical techniques and visualizing hidden patterns.

May 2019 8h 52m

Applied Data Visualization with R and ggplot2

Applied Data Visualization with R and ggplot2

When data is presented to you in a graphical or pictorial format, you can analyze it more effectively. This book begins by introducing you to basic concepts, such as grammar of graphics and geometric objects. It then goes on to explain these concepts in detail with examples. Once you are comfortable with basics, you can learn all about the advanced plotting techniques, such as box plots and density plots. With this book, you can transform data into useful material and make data analysis interesting and fun.

Sep 2018 4h 40m

Applied Unsupervised Learning with R

Applied Unsupervised Learning with R

Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and all features of R that enable you to understand your data better and get answers to all your business questions.

Mar 2019 10h 40m

Applied Unsupervised Learning with R

Applied Unsupervised Learning with R

Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and all features of R that enable you to understand your data better and get answers to all your business questions.

Mar 2019 10h 40m

Applied Unsupervised Learning with R

Applied Unsupervised Learning with R

Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and all features of R that enable you to understand your data better and get answers to all your business questions.

Mar 2019 10h 40m

Applied Unsupervised Learning with R

Applied Unsupervised Learning with R

Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and all features of R that enable you to understand your data better and get answers to all your business questions.

Mar 2019 10h 40m

Applied Unsupervised Learning with R

Applied Unsupervised Learning with R

Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and all features of R that enable you to understand your data better and get answers to all your business questions.

Mar 2019 10h 40m

Applied Unsupervised Learning with R

Applied Unsupervised Learning with R

Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and all features of R that enable you to understand your data better and get answers to all your business questions.

Mar 2019 10h 40m

Personalised recommendations for you

Based on your interests and search pattern

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m