A draft genome assembly of a previously unsequenced genome can be a rich source of biological knowledge, but when genomics resources such as gene annotations aren't available, it can be tricky to proceed. Here, we'll look at a first stage pipeline for finding potential genes and genomic loci of interest absolutely de novo and without information beyond the sequence. We'll use a very simple set of rules to find open reading frames—sequences that begin with a start codon and end with a stop codon. The tools for doing this are encapsulated within a single function in the Bioconductor package, systemPipeR. We'll end up with yet another GRanges object that we can integrate into processes downstream that allow us to cross-reference other data, such as RNAseq, as we saw in the Finding unannotated transcribed...




















































