Finding Genetic Variants with HTS Data
High-throughput sequencing (HTS) has made it possible to discover genetic variants and carry out genome-wide genotyping and haplotyping in many samples in a short space of time. The deluge of data that this technology has released has created some unique opportunities for bioinformaticians and computer scientists, and some really innovative new data storage and data analysis pipelines have been created. The fundamental pipeline in variant calling starts with the quality control of HTS reads and aligning those reads to a reference genome. These steps invariably take place before analysis in R and typically result in a BAM file of read alignments or a variant call file (VCF) of variant positions that we’ll want to process in our R code.
As variant calling and analysis is such a fundamental technique in bioinformatics, Bioconductor is well-equipped with the tools we need to construct our software and perform our analysis. The key questions...