Search icon CANCEL
Subscription
0
Cart icon
Cart
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Wrangling with R

You're reading from  Data Wrangling with R

Product type Book
Published in Feb 2023
Publisher Packt
ISBN-13 9781803235400
Pages 384 pages
Edition 1st Edition
Languages
Concepts
Author (1):
Gustavo R Santos Gustavo R Santos
Profile icon Gustavo R Santos
Toc

Table of Contents (21) Chapters close

Preface 1. Part 1: Load and Explore Data
2. Chapter 1: Fundamentals of Data Wrangling 3. Chapter 2: Loading and Exploring Datasets 4. Chapter 3: Basic Data Visualization 5. Part 2: Data Wrangling
6. Chapter 4: Working with Strings 7. Chapter 5: Working with Numbers 8. Chapter 6: Working with Date and Time Objects 9. Chapter 7: Transformations with Base R 10. Chapter 8: Transformations with Tidyverse Libraries 11. Chapter 9: Exploratory Data Analysis 12. Part 3: Data Visualization
13. Chapter 10: Introduction to ggplot2 14. Chapter 11: Enhanced Visualizations with ggplot2 15. Chapter 12: Other Data Visualization Options 16. Part 4: Modeling
17. Chapter 13: Building a Model with R 18. Chapter 14: Build an Application with Shiny in R 19. Conclusion 20. Other Books You May Enjoy

Summary

In this chapter, we learned a little about the history of data wrangling and became familiar with its definition. Every task performed in order to transform or enhance the data and to make it ready for analysis and modeling is what we call data wrangling or data munging.

We also discussed some topics stating the importance of wrangling data before modeling it. A model is a simplified representation of reality, and an algorithm is like a student that needs to understand that reality to give us the best answer about the subject matter. If we teach this student with bad data, we cannot expect to receive a good answer. A model is as good as its input data.

Continuing further in the chapter, we reviewed the benefits of data wrangling, proving that we can improve the quality of our data, resulting in faster results and better outcomes.

In the final sections, we reviewed the basic steps of data wrangling and learned more about three of the most commonly used frameworks for Data Science – KDD, SEMMA, and CRISP-DM. I recommend that you review more information about them to have a holistic view of the life cycle of a Data Science project.

Now, it is important to notice how these three frameworks preach the selection of a representative dataset or subset of data. A nice example is given by Aurélien Géron (Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow, 2nd edition, (2019): 32-33). Suppose you want to build an app to take pictures of flowers and recognize and classify them. You could go to the internet and download thousands of pictures; however, they will probably not be representative of the kind of pictures that your model will receive from the app users. Ergo, the model could underperform. This example is relevant to illustrate the garbage in, garbage out idea. That is, if you don’t explore and understand your data thoroughly, you won’t know whether it is good enough for modeling.

The frameworks can lead the way, like a map, to explore, understand, and wrangle the data and to make it ready for modeling, decreasing the risk of having a frustrating outcome.

In the next chapter, let’s get our hands on R and start coding.

You have been reading a chapter from
Data Wrangling with R
Published in: Feb 2023 Publisher: Packt ISBN-13: 9781803235400
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime