Problem 6 – Using Python to analyze genetic data
Let's shift focus to looking at a larger dataset. You're working with laboratory mice and get data for trisomy mice and protein expressions in these mice. We've truncated some of the data from the public domain file in Kaggle for this due to its huge size. We're only focusing on six protein expressions for the mice and again, only the trisomy (Down syndrome) mice in the study. The full file can be found in Kaggle at https://www.kaggle.com/ruslankl/mice-protein-expression. The truncated file can be found in our GitHub repository.
Let's say you don't know where to start with this data. What should you even be looking at? Well, that's often the first thing we encounter in data science. We don't always get to be part of the study design or data collection. Many times, we receive large data files and need to figure out what to look for, how to tackle the problem, whatever we decide the problem...