Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Data Wrangling with R
Data Wrangling with R

Data Wrangling with R: Load, explore, transform and visualize data for modeling with tidyverse libraries

eBook
$9.99 $35.99
Paperback
$44.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Colour book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Data Wrangling with R

Fundamentals of Data Wrangling

The relationship between humans and data is age old. Knowing that our brains can capture and store only a limited amount of information, we had to create ways to keep and organize data.

The first idea of keeping and storing data goes back to 19000 BC (as stated in https://www.thinkautomation.com/histories/the-history-of-data/) when a bone stick is believed to have been used to count things and keep information engraved on it, serving as a tally stick. Since then, words, writing, numbers, and many other forms of data collection have been developed and evolved.

In 1663, John Graunt performed one of the first recognized data analyses, studying births and deaths by gender in the city of London, England.

In 1928, Fritz Pfleumer received the patent for magnetic tapes, a solution to store sound that enabled other researchers to create many of the storage technologies that are still used, such as hard disk drives.

Fast forward to the modern world, at the beginning of the computer age, in the 1970s, when IBM researchers Raymond Boyce and Donald Chamberlin created the Structured Query Language (SQL) for getting access to and modifying data held in databases. The language is still used, and, as a matter of fact, many data-wrangling concepts come from it. Concepts such as SELECT, WHERE, GROUP BY, and JOIN are heavily present in any work you want to perform with datasets. Therefore, a little knowledge of those basic commands might help you throughout this book, although it is not mandatory.

In this chapter, we will cover the following main topics:

  • What is data wrangling?
  • Why data wrangling?
  • The key steps of data wrangling

What is data wrangling?

Data wrangling is the process of modifying, cleaning, organizing, and transforming data from one given state to another, with the objective of making it more appropriate for use in analytics and data science.

This concept is also referred to as data munging, and both words are related to the act of changing, manipulating, transforming, and incrementing your dataset.

I bet you’ve already performed data wrangling. It is a common task for all of us. Since our primary school years, we have been taught how to create a table and make counts to organize people’s opinions in a dataset. If you are familiar with MS Excel or similar tools, remember all the times you have sorted, filtered, or added columns to a table, not to mention all of those lookups that you may have performed. All of that is part of the data-wrangling process. Every task performed to somehow improve the data and make it more suitable for analysis can be considered data wrangling.

As a data scientist, you will constantly be provided with different kinds of data, with the mission of transforming the dataset into insights that will, consequentially, form the basis for business decisions. Unlike a few years ago, when the majority of data was presented in a structured form such as text or tables, nowadays, data can come in many other forms, including unstructured formats such as video, audio, or even a combination of those. Thus, it becomes clear that most of the time, data will not be presented ready to work and will require some effort to get it in a ready state, sometimes more than others.

Figure 1.1 – Data before and after wrangling

Figure 1.1 – Data before and after wrangling

Figure 1.1 is a visual representation of data wrangling. We see on the left-hand side three kinds of data points combined, and after sorting and tabulating, the data is clearer to be analyzed.

A wrangled dataset is easier to understand and to work with, creating the path to better analysis and modeling, as we shall see in the next section when we will learn why data wrangling is important to a data science project.

Why data wrangling?

Now you know what data wrangling means, and I am sure that you share the same view as me that this is a tremendously important subject – otherwise, I don’t think you would be reading this book.

In statistics and data science areas, there is this frequently repeated phrase: garbage in, garbage out. This popular saying represents the central idea of the importance of wrangling data because it teaches us that our analysis or even our model will only be as good as the data that we present to it. You could also use the weakest link in the chain analogy to describe that importance, meaning that if your data is weak, the rest of the analysis could be easily broken by questions and arguments.

Let me give you a naïve example, but one that is still very precise, to illustrate my point. If we receive a dataset like in Figure 1.2, everything looks right at first glance. There are city names and temperatures, and it is a common format used to present data. However, for data science, this data may not be ideal for use just yet.

Figure 1.2 – Temperatures for cities

Figure 1.2 – Temperatures for cities

Notice that all the columns are referring to the same variable, which is Temperature. We would have trouble plotting simple graphics in R with a dataset presented as in Figure 1.2, as well as using the dataset for modeling.

In this case, a simple transformation of the table from wide to long format would be enough to complete the data-wrangling task.

Figure 1.3 – Dataset ready for use

Figure 1.3 – Dataset ready for use

At first glance, Figure 1.2 might appear to be the better-looking option. And, in fact, it is for human eyes. The presentation of the dataset in Figure 1.2 makes it much easier for us to compare values and draw conclusions. However, we must not forget that we are dealing with computers, and machines don’t process data the same way humans do. To a computer, Figure 1.2 has seven variables: City, Jan, Feb, Mar, Apr, May, and Jun, while Figure 1.3 has only three: City, Month, and Temperature.

Now comes the fun part; let’s compare how a computer would receive both sets of data. A command to plot the temperature timeline by city for Figure 1.2 would be as follows: Computer, take a city and the temperatures during the months of Jan, Feb, Mar, Apr, May, and Jun in that city. Then consider each of the names of the months as a point on the x axis and the temperature associated as a point on the y axis. Plot a line for the temperature throughout the months for each of the cities.

Figure 1.3 is much clearer to the computer. It does not need to separate anything. The dataset is ready, so look how the command would be given: Computer, for each city, plot the month on the x axis and the temperature on the y axis.

Much simpler, agree? That is the importance of data wrangling for Data Science.

Benefits

Performing good data wrangling will improve the overall quality of the entire analysis process. Here are the benefits:

  • Structured data: Your data will be organized and easily understandable by other data scientists.
  • Faster results: If the data is already in a usable state, creating plots or using it as input to an algorithm will certainly be faster.
  • Better data flow: To be able to use the data for modeling or for a dashboard, it needs to be properly formatted and cleaned. Good data wrangling enables the data to follow to the next steps of the process, making data pipelines and automation possible.
  • Aggregation: As we saw in the example in the previous section, the data must be in a suitable format for the computer to understand. Having well-wrangled datasets will help you to be able to aggregate them quickly for insight extraction.
  • Data quality: Data wrangling is about transforming the data to the ready state. During this process, you will clean, aggregate, filter, and sort it accordingly, visualize the data, assess its quality, deal with outliers, and identify faulty or incomplete data.
  • Data enriching: During wrangling, you might be able to enrich the data by creating new variables out of the original ones or joining other datasets to make your data more complete.

Every project, being related with Data Science or not, can benefit from data wrangling. As we just listed, it brings many benefits to the analysis, impacting the quality of the deliverables in the end. But to get the best from it, there are steps to follow.

The key steps of data wrangling

There are some basic steps to help data scientists and analysts to work through the data-wrangling part of the process. Naturally, once you first see a dataset, it is important to understand it, then organize, clean, enrich, and validate it before using it as input for a model.

Figure 1.4 – Steps of data wrangling

Figure 1.4 – Steps of data wrangling

  1. Understand: The first step to take once we get our hands on new data is to understand it. Take some time to read the data dictionary, which is a document with the descriptions of the variables, if available, or talk to the owner(s) of the data to really understand what each data point represents and how they do or do not connect to your main purpose and to the business questions you are trying to answer. This will make the following steps clearer.
  2. Format: Step two is to format or organize the data. Raw data may come unstructured or unformatted in a way that is not usable. Therefore, it is important to be familiar with the tidy format. Tidy data is a concept developed by Hadley Wickham in 2014 in a paper with the same name – Tidy data (Tidy data. The Journal of Statistical Software, vol. 59, 2014) – where he presents a standard method to organize and structure datasets, making the cleaning and exploration steps easier. Another benefit is facilitating the transference of the dataset between different tools that use the same format. Currently, the tidy data concept is widely accepted, so that helps you to focus on the analysis instead of munging the dataset every time you need to move it down the pipeline.

Tidy data standardizes the way the structure of the data is linked to the semantics, in other words, how the layout is linked with the meaning of the values. More specifically, structure means the rows and columns that can be labeled. Most of the time, the columns are labeled, but the rows are not. On the other hand, every value is related to a variable and an observation. This is the data semantics. On a tidy dataset, the variable will be a column that holds all the values for an attribute, and each row associated with one observation. Take the dataset extract from Figure 1.5 as an example. With regard to the horsepower column, we would see values such as 110, 110, 93, and 110 for four different cars. Looking at the observations level, each row is one observation, having one value for each attribute or variable, so a car could be associated with HP=110, 6 cylinders, 21 miles per gallon, and so on.

Figure 1.5 – Tidy data. Each row is one observation; each column is a variable

Figure 1.5 – Tidy data. Each row is one observation; each column is a variable

According to Wickham (https://tinyurl.com/2dh75y56), here are the three rules of tidy data:

  • Every column is a variable
  • Every row is an observation
  • Every cell is a single value
  1. Clean: This step is relevant to determine the overall quality of the data. There are many forms of data cleaning, such as splitting, parsing variables, handling missing values, dealing with outliers, and removing erroneous entries.
  2. Enrich: As you work through the data-wrangling steps and become more familiar with the data, questions will arise and, sometimes, more data will be needed. That can be solved by either joining another dataset to the original one to bring new variables or creating new ones using those you have.
  3. Validate: To validate is to make sure that the cleaning, formatting, and transformations are all in place and the data is ready for modeling or other analysis.
  4. Analysis/Model: Once everything is complete, your dataset is now ready for use in the next phases of the project, such as the creation of a dashboard or modeling.

As with every process, we must follow steps to reach the best performance and be able to standardize our efforts and allow them to be reproduced and scaled if needed. Next, we will look at three frameworks for Data Science projects that help to make a process easy to follow and reproduce.

Frameworks in Data Science

Data Science is no different from other sciences, and it also follows some common steps. Ergo, frameworks can be designed to guide people through the process, as well as to help implement a standardized process in a company.

It is important that a Data Scientist has a holistic understanding of the flow of the data from the moment of the acquisition until the end point since the resultant business knowledge is what will support decisions.

In this section, we will take a closer look at three known frameworks that can be used for Data Science projects: KDD, SEMMA, or CRISP-DM. Let’s get to know more about them.

KDD

KDD stands for Knowledge Discovery in Databases. It is a framework to extract knowledge from data in the context of large databases.

Figure 1.6 – KDD process

Figure 1.6 – KDD process

The process is iterative and follows these steps:

  1. Data: Acquiring the data from a database
  2. Selection: Creating a representative target set that is a subset of the data with selected variables or samples of interest
  3. Preprocessing: Data cleaning and preprocessing to remove outliers and handle missing and noisy data
  4. Transformation: Transforming and using dimensionality reduction to format the data
  5. Data Mining: Using algorithms to analyze and search for patterns of interest (for example, classification and clustering)
  6. Interpretation/Evaluation: Interpreting and evaluating the mined patterns

After the evaluation, if the results are not satisfactory, the process can be repeated with enhancements such as more data, a different subset, or a tweaked algorithm.

SEMMA

SEMMA stands for Sample, Explore, Modify, Model, and Assess. These are the steps of the process.

Figure 1.7 – SEMMA process

Figure 1.7 – SEMMA process

SEMMA is a cyclic process that flows more naturally with Data Science. It does not contain stages like KDD. The steps are as follows:

  1. Sample: Based on statistics, it requires a sample large enough to be representative but small enough to be quick to work with
  2. Explore: During this step, the goal is to understand the data and generate visualizations and descriptive statistics, looking for patterns and anomalies
  3. Modify: Here is where data wrangling plays a more intensive role, where the transformations occur to make the data ready for modeling
  4. Model: This step is where algorithms are used to generate estimates, predictions, or insights from the data
  5. Assess: Evaluate the results

CRISP-DM

The acronym for this framework means Cross-Industry Standard Process for Data Mining. It provides the data scientist with the typical phases of the project and also an overview of the data mining life cycle.

Figure 1.8 – CRISP-DM life cycle

Figure 1.8 – CRISP-DM life cycle

The CRISP-DM life cycle has six phases, with the arrows indicating the dependencies between each one of them, but the key point here is that there is not a strict order to follow. The project can move back and forth during the process, making it a flexible framework. Let’s go through the steps:

  • Business understanding: Like the other two frameworks presented, it all starts with understanding the problem, the business. Understanding the business rules and specificities is often even more important than getting to the solution fast. That is because a solution may not be ideal for that kind of business. The business rules must always drive the solution.
  • Data understanding: This involves collecting and exploring the data. Make sure the data collected is representative of the whole and get familiar with it to be able to find errors, faulty data, and missing values and to assess quality. All these tasks are part of data understanding.
  • Data preparation: Once you are familiar with the data collected, it is time to wrangle it and prepare it for modeling.
  • Modeling: This involves applying Data Science algorithms or performing the desired analysis on the processed data.
  • Evaluation: This step is used to assess whether the solution is aligned with the business requirement and whether it is performing well.
  • Deployment: In this step, the model reaches its purpose (for example, an application that predicts a group or a value, a dashboard, and so on).

These three frameworks have a lot in common if you look closer. They start with understanding the data, go over data wrangling with cleaning and transforming, then move on to the modeling phase, and end with the evaluation of the model, usually working with iterations to assess flaws and improve the results.

Summary

In this chapter, we learned a little about the history of data wrangling and became familiar with its definition. Every task performed in order to transform or enhance the data and to make it ready for analysis and modeling is what we call data wrangling or data munging.

We also discussed some topics stating the importance of wrangling data before modeling it. A model is a simplified representation of reality, and an algorithm is like a student that needs to understand that reality to give us the best answer about the subject matter. If we teach this student with bad data, we cannot expect to receive a good answer. A model is as good as its input data.

Continuing further in the chapter, we reviewed the benefits of data wrangling, proving that we can improve the quality of our data, resulting in faster results and better outcomes.

In the final sections, we reviewed the basic steps of data wrangling and learned more about three of the most commonly used frameworks for Data Science – KDD, SEMMA, and CRISP-DM. I recommend that you review more information about them to have a holistic view of the life cycle of a Data Science project.

Now, it is important to notice how these three frameworks preach the selection of a representative dataset or subset of data. A nice example is given by Aurélien Géron (Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow, 2nd edition, (2019): 32-33). Suppose you want to build an app to take pictures of flowers and recognize and classify them. You could go to the internet and download thousands of pictures; however, they will probably not be representative of the kind of pictures that your model will receive from the app users. Ergo, the model could underperform. This example is relevant to illustrate the garbage in, garbage out idea. That is, if you don’t explore and understand your data thoroughly, you won’t know whether it is good enough for modeling.

The frameworks can lead the way, like a map, to explore, understand, and wrangle the data and to make it ready for modeling, decreasing the risk of having a frustrating outcome.

In the next chapter, let’s get our hands on R and start coding.

Exercises

  1. What is data wrangling?
  2. Why is data wrangling important?
  3. What are the steps for data wrangling?
  4. List three Data Science frameworks.

Further reading

Left arrow icon Right arrow icon

Key benefits

  • Explore state-of-the-art libraries for data wrangling in R and learn to prepare your data for analysis
  • Find out how to work with different data types such as strings, numbers, date, and time
  • Build your first model and visualize data with ease through advanced plot types and with ggplot2

Description

In this information era, where large volumes of data are being generated every day, companies want to get a better grip on it to perform more efficiently than before. This is where skillful data analysts and data scientists come into play, wrangling and exploring data to generate valuable business insights. In order to do that, you’ll need plenty of tools that enable you to extract the most useful knowledge from data. Data Wrangling with R will help you to gain a deep understanding of ways to wrangle and prepare datasets for exploration, analysis, and modeling. This data book enables you to get your data ready for more optimized analyses, develop your first data model, and perform effective data visualization. The book begins by teaching you how to load and explore datasets. Then, you’ll get to grips with the modern concepts and tools of data wrangling. As data wrangling and visualization are intrinsically connected, you’ll go over best practices to plot data and extract insights from it. The chapters are designed in a way to help you learn all about modeling, as you will go through the construction of a data science project from end to end, and become familiar with the built-in RStudio, including an application built with Shiny dashboards. By the end of this book, you’ll have learned how to create your first data model and build an application with Shiny in R.

Who is this book for?

If you are a professional data analyst, data scientist, or beginner who wants to learn more about data wrangling, this book is for you. Familiarity with the basic concepts of R programming or any other object-oriented programming language will help you to grasp the concepts taught in this book. Data analysts looking to improve their data manipulation and visualization skills will also benefit immensely from this book.

What you will learn

  • Discover how to load datasets and explore data in R
  • Work with different types of variables in datasets
  • Create basic and advanced visualizations
  • Find out how to build your first data model
  • Create graphics using ggplot2 in a step-by-step way in Microsoft Power BI
  • Get familiarized with building an application in R with Shiny
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Feb 23, 2023
Length: 384 pages
Edition : 1st
Language : English
ISBN-13 : 9781803235400
Category :
Languages :
Concepts :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Colour book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Publication date : Feb 23, 2023
Length: 384 pages
Edition : 1st
Language : English
ISBN-13 : 9781803235400
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 144.97
Data Wrangling with R
$44.99
Machine Learning with R
$49.99
The Statistics and Machine Learning with R Workshop
$49.99
Total $ 144.97 Stars icon
Banner background image

Table of Contents

20 Chapters
Part 1: Load and Explore Data Chevron down icon Chevron up icon
Chapter 1: Fundamentals of Data Wrangling Chevron down icon Chevron up icon
Chapter 2: Loading and Exploring Datasets Chevron down icon Chevron up icon
Chapter 3: Basic Data Visualization Chevron down icon Chevron up icon
Part 2: Data Wrangling Chevron down icon Chevron up icon
Chapter 4: Working with Strings Chevron down icon Chevron up icon
Chapter 5: Working with Numbers Chevron down icon Chevron up icon
Chapter 6: Working with Date and Time Objects Chevron down icon Chevron up icon
Chapter 7: Transformations with Base R Chevron down icon Chevron up icon
Chapter 8: Transformations with Tidyverse Libraries Chevron down icon Chevron up icon
Chapter 9: Exploratory Data Analysis Chevron down icon Chevron up icon
Part 3: Data Visualization Chevron down icon Chevron up icon
Chapter 10: Introduction to ggplot2 Chevron down icon Chevron up icon
Chapter 11: Enhanced Visualizations with ggplot2 Chevron down icon Chevron up icon
Chapter 12: Other Data Visualization Options Chevron down icon Chevron up icon
Part 4: Modeling Chevron down icon Chevron up icon
Chapter 13: Building a Model with R Chevron down icon Chevron up icon
Chapter 14: Build an Application with Shiny in R Chevron down icon Chevron up icon
Conclusion Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.9
(7 Ratings)
5 star 85.7%
4 star 14.3%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Daksh Mar 21, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
"Data Wrangling with R: Load, explore, transform and visualize data for modeling with tidyverse libraries" is an excellent guidebook for those interested in learning R programming language for data science purposes. The book provides a detailed overview of how to efficiently work with data in R by leveraging the tidyverse libraries.Part one of the book provides an introduction to the R programming language and its features. The author covers data structures, functions, and programming concepts that are essential for data analysis in R. He also explains how to use RStudio, the most popular integrated development environment for R.In part two, he delves into data loading and cleaning. He explains how to import data from different sources such as CSV, Excel, and databases. The author also provides numerous examples of how to clean and transform data using tidyverse libraries such as dplyr and tidyr.Part three of the book covers data exploration and visualization. The author demonstrates how to use ggplot2 to create various data visualizations such as scatterplots, histograms, and boxplots. He also shows how to use other visualization libraries like plotly and leaflet.Finally, in part four, the author covers modeling and machine learning using R. He explains how to use the caret package for model training and evaluation. He also covers topics like regression, classification, clustering, and time series analysis.Overall, "Data Wrangling with R: Load, explore, transform and visualize data for modeling with tidyverse libraries" is a must-read for anyone interested in learning R programming language for data analysis. The author's writing style is clear and concise, making it easy for readers to understand the concepts and follow along with the examples. The book is well-organized, and the examples are practical and relevant to real-world data science problems. I highly recommend this book to anyone looking to upskill in data wrangling with R.
Amazon Verified review Amazon
Sunil K. Gupta Mar 10, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have the honor to review Data Wrangling with R by Gustavo Santos! Being new to R, I was afraid this book was going to be too technical and only on statistical modeling. On the contrary, I was happy to find this R book easy to read, understand and apply whether you are new or an intermediate R programmer. The logical flow between chapters and sections with useful summaries and tips showcase Gustavo's in-depth knowledge and expertise in R. I really like how Gustavo compares similar R methods which help to reinforce R syntax understanding. Gustavo's approach will be liked by Data Scientists since all traditional methods of data cleaning, structure and operations before statistical modeling are followed. The graph gallery makes it easy to look up the correct R graph code without writing R code from scratch. I highly recommend Data Wrangling with R book to establish a strong foundation of R learning!
Amazon Verified review Amazon
T. Zwingmann Apr 16, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Ich hatte kürzlich die Gelegenheit, ein kostenfreies Exemplar diese Buches zu lesen. "Data Wrangling with R" ist eine tolle Ressource für jeden, der sich mit Data Wrangling befasst, vor allem, wenn man mit R anfängt. Die Stärken des Buches liegen in der Verwendung von tidyverse, das die Arbeitsabläufe für Data Wrangling im Vergleich zu Base-R vereinfacht und es für Anfänger viel zugänglicher macht. Ich selbst würde immer anfangen mit Tidyverse zu lernen.Die Struktur des Buches ist hervorragend und führt den Leser vom Einlesen von Daten unterschiedlicher Quellen, über Data Cleaning und Data Preparation bis hin zur Erstellung interaktiver Anwendungen mit Shiny. Das Buch liefert einen klaren Fahrplan und bildet eine solide Grundlage für weiteres Lernen. Mir hat besonders der Schreibstil des Autors gefallen. Er ist leicht zu lesen und verständlich, mit zahlreichen Beispielen.Zusammenfassend kann ich das Buch jedem empfehlen, der seine Fähigkeiten im Umgang mit Daten in R erlernen oder verbessern möchte!
Amazon Verified review Amazon
Jim Heidinger Mar 22, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
As I spend a lot of time working with R in my work, I was interested in the main premise about data wrangling and especially using visualizations and statistics to start the process of analysis. I was very much interested in the areas where R native packages could accomplish these tasks and the author gave a pretty deep walk through of techniques that spanned creating datasets to parsing strings, working with numbers and classes to data transformations.I found quite a few areas where I would come back to the book as a reference and study guide for problems I often face. The author had a very logical approach to coding and documenting that can serve as a model for a more coherent approach to problem solving.My favorite areas were also addressed with date conversions, parsing, data tables and visualizations. He pretty much covered it all in a very straight forward and comprehensive manner building on each prerequisite step in coming to a deeper understanding of data. This was quite useful for modeling and learning about data transformation concepts as it all starts with the structure of the data in order to move forward.The final two chapters showed the strength of taking R further with model building and shiny apps. While just a taste of what can be accomplished the author gave plenty of references to material to refer to for further study.All in all, I enjoyed the book immensely. Not only was it a deep reference to the basics in data transformation and wrangling, but it gave plenty of useful examples of ways to accomplish my work by providing what I thought were innovative approaches to solving problems.
Amazon Verified review Amazon
bskkarthik Feb 25, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book is very good and it covered different libraries like tidyverse. The good part of book is it also covers on Shiny which is popular library for creating web app. The author did a good job in writing this book.I highly recommend this book to start on R.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela