Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
R Bioinformatics Cookbook

You're reading from   R Bioinformatics Cookbook Utilize R packages for bioinformatics, genomics, data science, and machine learning

Arrow left icon
Product type Paperback
Published in Oct 2023
Publisher Packt
ISBN-13 9781837634279
Length 396 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Dan MacLean Dan MacLean
Author Profile Icon Dan MacLean
Dan MacLean
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Chapter 1: Setting Up Your R Bioinformatics Working Environment 2. Chapter 2: Loading, Tidying, and Cleaning Data in the tidyverse FREE CHAPTER 3. Chapter 3: ggplot2 and Extensions for Publication Quality Plots 4. Chapter 4: Using Quarto to Make Data-Rich Reports, Presentations, and Websites 5. Chapter 5: Easily Performing Statistical Tests Using Linear Models 6. Chapter 6: Performing Quantitative RNA-seq 7. Chapter 7: Finding Genetic Variants with HTS Data 8. Chapter 8: Searching Gene and Protein Sequences for Domains and Motifs 9. Chapter 9: Phylogenetic Analysis and Visualization 10. Chapter 10: Analyzing Gene Annotations 11. Chapter 11: Machine Learning with mlr3 12. Chapter 12: Functional Programming with purrr and base R 13. Chapter 13: Turbo-Charging Development in R with ChatGPT 14. Index 15. Other Books You May Enjoy

Loading, Tidying, and Cleaning Data in the tidyverse

Cleaning data is a crucial step in the data science process. It involves identifying and correcting errors, inconsistencies, and missing values in the data, as well as formatting and structuring the data in a way that makes it easy to work with. This allows the data to be used effectively for analysis, modeling, and visualization. The R tidyverse is a collection of packages designed for data science and includes tools for data manipulation, visualization, and modeling. The dplyr and tidyr packages are two of the most widely used packages within the tidyverse for data cleaning. dplyr provides a set of functions for efficiently manipulating large datasets, such as filtering, grouping, and summarizing data. tidyr is specifically designed for tidying (or restructuring) data, making it easier to work with. It provides functions for reshaping data, such as gathering and spreading columns, and allows for the creation of a consistent structure in the data. This makes it easier to perform data analysis and visualization. Together, these packages provide powerful tools for cleaning and manipulating data in R, making it a popular choice among data scientists. In this chapter, we will look at tools and techniques for preparing data in the tidyverse set of packages. You will learn how to deal with different formats and quickly interconvert them, merge different datasets, and summarize them. You will also learn how to bring data from outside sources not in handy files into your work.

In this chapter, we will cover the following recipes:

  • Loading data from files with readr
  • Tidying a wide format table into a tidy table with tidyr
  • Tidying a long format table into a tidy table with tidyr
  • Combining tables using join functions
  • Reformatting and extracting existing data into new columns using stringr
  • Computing new data columns from existing ones and applying arbitrary functions using mutate()
  • Using dplyr to summarize data in large tables
  • Using datapasta to create R objects from cut-and-paste data
You have been reading a chapter from
R Bioinformatics Cookbook - Second Edition
Published in: Oct 2023
Publisher: Packt
ISBN-13: 9781837634279
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime