You're reading from R Bioinformatics Cookbook Use R and Bioconductor to perform RNAseq, genomics, data visualization, and bioinformatic analysis

Product type Paperback

Published in Oct 2019

Publisher Packt

ISBN-13 9781789950694

Length 316 pages

Edition 1st Edition

Languages

Tools

ggplot

Concepts

Bioinformatics

Authors (2):

Dr Dan Maclean

Dan MacLean

View More author details

Differential peak analysis

When you've discovered unannotated transcripts you may want to see whether they are differentially expressed between experiments. We've already looked at how we might do that with edgeR and DESeq, but one problem is going from an object such as a RangedSummarizedExperiment, comprised of the data and a GRanges object that describes the peak regions, to the internal DESeq object. In this recipe, we'll look at how we can summarise the data in those objects and get them into the correct format.

Getting ready

For this recipe, you'll need the RangedSummarizedExperiment version of the Arabidopsis thaliana RNAseq in datasets/ch1/arabidopsis_rse.RDS in this book's repository. We'll use the DESeq and SummarizedExperiment Bioconductor packages we used earlier too.

How to do it...

Load data and set up a function that creates region tags:

library(SummarizedExperiment) 
arab_rse <- readRDS(file.path(getwd(), "datasets", "ch1", "arabidopsis_rse.RDS") ) 

 make_tag <- function(grange_obj){
     paste0( 
        grange_obj@seqnames, 
        ":", 
        grange_obj@ranges@start, 
        "-", 
        (grange_obj@ranges@start + grange_obj@ranges@width) 
    ) 
}

Extract data and annotate rows:

counts <- assay(arab_rse)

if ( ! is.null(names(rowRanges(arab_rse))) ){
  rownames(counts) <- names(rowRanges(arab_rse))
} else {
  rownames(counts) <- make_tag(rowRanges(arab_rse))
}

How it works...

Step 1 starts by loading in our pre-prepared RangedSummarized experiment; note that the names slot of the GRanges object in there is not populated. We next create a custom function, make_tag(), which works by pasting together seqnames, starts and the computed end (start + width) from a passed GRanges object. Note the @ sign syntax: this is used because GRange is an S4 object and the slots are accessed with @ rather than the more familiar $.

In step 2, the code pulls out the actual data from RangedSummarizedExperiment using the assay() function. The matrix returned has no row names, which is unuseful, so we use the if clause to check the names slot—we use that as row names if it's available; if it, isn't we make a row name tag using the position information in the GRanges object in the make_tag() function we have created. This will give the following output—a count matrix that has the location tag as the row name that can be used in DESeq and edgeR as described in Recipes 1 and 2 in this chapter:

head(counts)
##                  mock1 mock2 mock3 hrcc1 hrcc2 hrcc3
## Chr1:3631-5900      35    77    40    46    64    60
## Chr1:5928-8738      43    45    32    43    39    49
## Chr1:11649-13715    16    24    26    27    35    20
## Chr1:23146-31228    72    43    64    66    25    90
## Chr1:31170-33154    49    78    90    67    45    60
## Chr1:33379-37872     0    15     2     0    21     8

You're reading from R Bioinformatics Cookbook Use R and Bioconductor to perform RNAseq, genomics, data visualization, and bioinformatic analysis

Table of Contents (13) Chapters

Differential peak analysis

Getting ready

How to do it...

How it works...

Authors (2)

Other recommended products

Personalised recommendations for you