ML challenges in genomics
ML is the backbone of the large-scale analysis of genomic data. ML algorithms can be used to mine biological insights from genomics big data and discover predictable patterns that may be hard to extract by experts. However, there are a few challenges the current ML algorithms face in the analysis of genomic data:
- Although the amount of data coming out of biological systems and genome sequencing is huge and ever-growing, integrating these diverse datasets from multiple sources, platforms, and technologies into ML algorithms is not trivial.
- Because of this huge variation in trained data, models tend to overfit and they generalize very poorly on new data that is different from the training data. We can use methods such as L1 and L2 regularization to address this poor generalization, which we will see in future chapters.
- The nature of ML models, which are mostly “black boxes”, may bring new challenges to biological applications in...