Understanding variations in genome sequences assists us in identifying people who are predisposed to common diseases, curing rare diseases, and finding the corresponding population group of individuals from a larger population group. Although classical machine learning techniques allow researchers to identify groups (that is, clusters) of related variables, the accuracy and effectiveness of these methods diminish for large and high-dimensional datasets such as the whole human genome.
On the other hand, Deep Neural Networks (DNNs) form the core of deep learning (DL) and provide algorithms to model complex, high-level abstractions in data. They can better exploit large-scale datasets to build complex models.
In this chapter, we apply the K-means algorithm to large-scale genomic data from the 1000 Genomes project analysis...