The data from the 1000 Genomes project is a very large catalog of human genetic variants. The project aims to determine genetic variants with frequencies higher than 1% in the populations studied. The data has been made openly available and freely accessible through public data repositories to scientists worldwide. Also, the data from the 1000 Genomes project is widely used to screen variants discovered in exome data from individuals with genetic disorders and in cancer genome projects.
The genotype dataset in Variant Call Format (VCF) provides the data of human individuals (that is, samples) and their genetic variants, and in addition, the global allele frequencies as well as the ones for the super populations. The data denotes the population's region for each sample which is used for the predicted category in our approach. Specific...