Reference architecture for genomics data analytics
Before going into the technical implementation details of genomics data analytics, let's understand what genomics means. It is a field of study of biology that focuses on the evolution, mapping, structure, and functions of genomes. A genome is a complete set of DNA of a living being, which includes all of its genes.
In recent times, there have been significant investments in genomics and clinical data to explore more about living beings' genes and their characteristics, which can help diagnose any disease beforehand or predict new features. Technology continues to play a vital role in genomics studies: as the data volume grows, you can use big data technologies for distributed processing.
Genomics datasets are available in complex data formats, such as VCF and gVCF, and to parse them, there are several popular frameworks available, such as Glow and Hail.
Use case overview
Let's assume your organization...