Summary
This chapter started with what ML is and how ML algorithms can help genomic applications through their inherent nature of uncovering hidden patterns in the dataset, automating human tasks, and making predictions on unseen data. We looked at the several types of ML algorithms—namely, supervised and unsupervised methods—and understood the main steps in ML methods. Then, we understood the ML workflow for genomic applications.
In the second half of the chapter, we spent quite a bit of time understanding the different steps in ML and what is involved in each step of the workflow. We also introduced the most popular Python
packages Pandas
and scikit-learn to work on the ML workflow. Finally, we worked on a real-world application of ML on a genomic dataset for identifying the disease state of cancer patients.
This chapter and the preceding chapters are meant for a quick primer on ML for genomics, and with this knowledge and understanding of fundamentals, in the...