Exploring the data with standard statistics
Now that we have the insights for our Mendelian error analysis, let’s explore the data in order to get more insights that might help us to better filter the data. You can find this content in Chapter04/Exploration.py
.
How to do it…
- We start, as usual, with the necessary imports:
import gzip import pickle import random import numpy as np import matplotlib.pyplot as plt import pandas as pd from pandas.plotting import scatter_matrix
- Then we load the data. We will use pandas to navigate it:
fit = np.load(gzip.open('balanced_fit.npy.gz', 'rb')) ordered_features = np.load(open('ordered_features', 'rb')) num_features = len(ordered_features) fit_df = pd.DataFrame(fit, columns=ordered_features + ['pos', 'error']) num_samples = 80 del fit
- Let’s ask pandas to show a histogram of all annotations:
fig,ax = plt.subplots(figsize=(16,9)) fit_df.hist(column=ordered_features...