When you've discovered unannotated transcripts you may want to see whether they are differentially expressed between experiments. We've already looked at how we might do that with edgeR and DESeq, but one problem is going from an object such as a RangedSummarizedExperiment, comprised of the data and a GRanges object that describes the peak regions, to the internal DESeq object. In this recipe, we'll look at how we can summarise the data in those objects and get them into the correct format.
Differential peak analysis
Getting ready
For this recipe, you'll need the RangedSummarizedExperiment version of the Arabidopsis thaliana RNAseq in datasets/ch1/arabidopsis_rse.RDS in this book's repository. We'll use the DESeq and SummarizedExperiment Bioconductor packages we used earlier too.
How to do it...
- Load data and set up a function that creates region tags:
library(SummarizedExperiment)
arab_rse <- readRDS(file.path(getwd(), "datasets", "ch1", "arabidopsis_rse.RDS") )
make_tag <- function(grange_obj){
paste0(
grange_obj@seqnames,
":",
grange_obj@ranges@start,
"-",
(grange_obj@ranges@start + grange_obj@ranges@width)
)
}
- Extract data and annotate rows:
counts <- assay(arab_rse) if ( ! is.null(names(rowRanges(arab_rse))) ){ rownames(counts) <- names(rowRanges(arab_rse)) } else { rownames(counts) <- make_tag(rowRanges(arab_rse)) }
How it works...
Step 1 starts by loading in our pre-prepared RangedSummarized experiment; note that the names slot of the GRanges object in there is not populated. We next create a custom function, make_tag(), which works by pasting together seqnames, starts and the computed end (start + width) from a passed GRanges object. Note the @ sign syntax: this is used because GRange is an S4 object and the slots are accessed with @ rather than the more familiar $.
In step 2, the code pulls out the actual data from RangedSummarizedExperiment using the assay() function. The matrix returned has no row names, which is unuseful, so we use the if clause to check the names slot—we use that as row names if it's available; if it, isn't we make a row name tag using the position information in the GRanges object in the make_tag() function we have created. This will give the following output—a count matrix that has the location tag as the row name that can be used in DESeq and edgeR as described in Recipes 1 and 2 in this chapter:
head(counts)
## mock1 mock2 mock3 hrcc1 hrcc2 hrcc3 ## Chr1:3631-5900 35 77 40 46 64 60 ## Chr1:5928-8738 43 45 32 43 39 49 ## Chr1:11649-13715 16 24 26 27 35 20 ## Chr1:23146-31228 72 43 64 66 25 90 ## Chr1:31170-33154 49 78 90 67 45 60 ## Chr1:33379-37872 0 15 2 0 21 8