Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
R Bioinformatics Cookbook

You're reading from   R Bioinformatics Cookbook Utilize R packages for bioinformatics, genomics, data science, and machine learning

Arrow left icon
Product type Paperback
Published in Oct 2023
Publisher Packt
ISBN-13 9781837634279
Length 396 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Dan MacLean Dan MacLean
Author Profile Icon Dan MacLean
Dan MacLean
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Chapter 1: Setting Up Your R Bioinformatics Working Environment 2. Chapter 2: Loading, Tidying, and Cleaning Data in the tidyverse FREE CHAPTER 3. Chapter 3: ggplot2 and Extensions for Publication Quality Plots 4. Chapter 4: Using Quarto to Make Data-Rich Reports, Presentations, and Websites 5. Chapter 5: Easily Performing Statistical Tests Using Linear Models 6. Chapter 6: Performing Quantitative RNA-seq 7. Chapter 7: Finding Genetic Variants with HTS Data 8. Chapter 8: Searching Gene and Protein Sequences for Domains and Motifs 9. Chapter 9: Phylogenetic Analysis and Visualization 10. Chapter 10: Analyzing Gene Annotations 11. Chapter 11: Machine Learning with mlr3 12. Chapter 12: Functional Programming with purrr and base R 13. Chapter 13: Turbo-Charging Development in R with ChatGPT 14. Index 15. Other Books You May Enjoy

Using datapasta to create R objects from cut-and-paste data

Being able to paste data into source code documents is useful for all sorts of reasons, not least because it allows for a reproducible example, also known as a reprex—a minimal, self-containedexample that demonstrates a problem or a concept. By including data in the source code, others can run the code and see the results for themselves, making it easier to understand and replicate the results.

The R datapasta package makes it easy to paste data into R source code documents. It provides a set of functions for converting data to and from R definitions and is extremely useful when creating static data objects in code examples, tests, or when sharing. In this recipe, you will learn how to use datapasta to bring external data into your source code documents by typing them in longhand.

Getting ready

We will use the datapasta package, though installing it is non-standard. Use this command:

renv::install("datapasta", repos = c(mm = "https://milesmcbain.r-universe.dev", getOption("repos")))

This should install the package using renv and make us ready to go. Remember to install renv the usual way if you don’t already have it.

How to do it…

The datapasta tool is implemented as an add-in for RStudio, so we begin by setting that up:

  1. Use the RStudio Tools | Addins | Browse Addins menu and then Keyboard Shortcuts. You get to choose which key combination you want to use for pasting.
  2. Click the middle column next to the operations, as shown in the following screenshot, and press the keys you want to use. The combination in Figure 2.1 is a good choice:
Figure 2.1 – Selecting key shortcuts

Figure 2.1 – Selecting key shortcuts

  1. Get a web table—for example, go to this page on Wikipedia: https://en.wikipedia.org/wiki/Tab-separated_values—and copy the whole text-based example table using the browser’s right-click Copy feature. This should put the text table in your copy/paste buffer. It should look like what’s shown in Figure 2.2:
Figure 2.2 – Web data on Wikipedia

Figure 2.2 – Web data on Wikipedia

  1. Paste the table now in your copy/paste buffer into an R source document. Place the typing cursor at a suitable place in the source R document you’re working in and use the key combo to paste in the table.

So, with the preceding setup, we should be able to quickly take data from varied sources and coerce them into R objects for analysis.

How it works…

The datapasta package is really useful. The first step sets up our preferred paste keys for later use; the second step is simple, and we just go somewhere and find some data in the world we would like in our R source; and by the third step, we’re selecting the place to put the data definition and pasting it in. Our example goes from a table in a web page to this definition for an R object:

data.frame(  stringsAsFactors = FALSE,
      Sepal.length = c(5.1, 4.9, 4.7, 4.6, 5),
       Sepal.width = c(3.5, 3, 3.2, 3.1, 3.6),
      Petal.length = c(1.4, 1.4, 1.3, 1.5, 1.4),
       Petal.width = c(0.2, 0.2, 0.2, 0.2, 0.2),
           Species = c("I. setosa","I. setosa",
                       "I. setosa","I. setosa","I. setosa")
)

And that powerful little operation is how we can convert web data to source code very easily.

There’s more…

It’s possible to go the other way around, from some R object to a definition of that object using the dpasta() function, which can coerce data frames, tibbles, and vectors into definitions. This is really useful for reproducible examples. This example shows how to actually build the object with code so that we don’t have to share the file and everything is in one source document:

library(datapasta)mtcars |> 
   dpasta()

This shows how the datapasta package is a super-useful tool for turning objects from the web and within R into code.

You have been reading a chapter from
R Bioinformatics Cookbook - Second Edition
Published in: Oct 2023
Publisher: Packt
ISBN-13: 9781837634279
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime