Exploring a dataset with sgkit
In this recipe, we will perform an initial exploratory analysis of one of our generated datasets. Now that we have some basic knowledge of xarray, we can actually try to do some data analysis. In this recipe, we will ignore population structure, an issue we will return to in the following one.
Getting ready
You will need to have run the first recipe and should have the hapmap10_auto_noofs_ld
files available. There is a Notebook file with this recipe called Chapter06/Exploratory_Analysis.py
. You will need the software that you installed for the previous recipe.
How to do it...
Take a look at the following steps:
- We start by loading the PLINK data with sgkit, exactly as in the previous recipe:
import numpy as np import xarray as xr import sgkit as sg from sgkit.io import plink data = plink.read_plink(path='hapmap10_auto_noofs_ld', fam_sep='\t')
- Let’s ask sgkit for
variant_stats
:variant_stats = sg.variant_stats...