Analyzing population structure
Previously, we introduced data analysis with sgkit ignoring the population structure. Most datasets, including the one we are using, actually do have a population structure. Sgkit provides functionality to analyze genomic datasets with population structure and that is what we are going to investigate here.
Getting ready
You will need to have run the first recipe, and should have the hapmap10_auto_noofs_ld
data we produced and also the original population meta data relationships_w_pops_041510.txt
file downloaded. There is a Notebook file with the 06_PopGen/Pop_Stats.py
recipe in it.
How to do it...
Take a look at the following steps:
- First, let’s load the PLINK data with sgkit:
from collections import defaultdict from pprint import pprint import numpy as np import matplotlib.pyplot as plt import seaborn as sns import pandas as pd import xarray as xr import sgkit as sg from sgkit.io import plink data = plink.read_plink(path=...