You're reading from Deep Learning for Genomics Data-driven approaches for genomics applications in life sciences and biotechnology

Product type Paperback

Published in Nov 2022

Publisher Packt

ISBN-13 9781804615447

Length 270 pages

Edition 1st Edition

Concepts

Deep Learning

Author (1):

Upendra Kumar Devisetty

View More author details

Table of Contents (18) Chapters

Preface

1. Part 1 – Machine Learning in Genomics

2. Chapter 1: Introducing Machine Learning for Genomics FREE CHAPTER

3. Chapter 2: Genomics Data Analysis

4. Chapter 3: Machine Learning Methods for Genomic Applications

5. Part 2 – Deep Learning for Genomic Applications

6. Chapter 4: Deep Learning for Genomics

7. Chapter 5: Introducing Convolutional Neural Networks for Genomics

8. Chapter 6: Recurrent Neural Networks in Genomics

9. Chapter 7: Unsupervised Deep Learning with Autoencoders

10. Chapter 8: GANs for Improving Models in Genomics

11. Part 3 – Operationalizing models

12. Chapter 9: Building and Tuning Deep Learning Models

13. Chapter 10: Model Interpretability in Genomics

14. Chapter 11: Model Deployment and Monitoring

15. Chapter 12: Challenges, Pitfalls, and Best Practices for Deep Learning in Genomics

16. Index

Why subscribe?

17. Other Books You May Enjoy

Machine learning for genomics in life sciences and biotechnology

Because of the incredible promise that ML has shown for genomics applications such as drug discovery, diagnostics, precision medicine, agriculture, and biological research, more and more life science and biotech organizations are leveraging ML to analyze genomic data for population health and predictive analytics. As per the market research study, which takes into account technology, functionality, application, and region, the global AI in the genomics market is forecasted to reach $1.671 billion by 2025 from $202 million in 2020 (https://www.marketsandmarkets.com/Market-Reports/artificial-intelligence-in-genomics-market-36649899.html). The main drivers for this growth can be attributed to the need to control spiraling drug costs, increasing public and private investments, and, most importantly, the adoption of AI solutions in precision medicine. The recent COVID-19 pandemic has played its part in accelerating the adoption of AI for genomics as well (https://www.jmir.org/2021/3/e22453/). Even though the outlook for ML for genomics is exciting, there is a lack of a skilled workforce to develop, manage, and apply these ML methodologies in genomics. Additionally, integrating these ML systems into existing systems is a challenging task that requires a proper understanding of the concepts and techniques. For researchers to stand out from the crowd and contribute to data-driven decisions by the company, they must have the necessary skill set.

This book will address the problem of the skill gap that currently exists in the market. This book is a Swiss Army knife for any research professional, data scientist, or manager who is getting started with genomic data analysis using ML. This book highlights the power of ML approaches in handling genomics big data by introducing key concepts, employing real-life business examples, use cases, best practices, and so on to help fill the gaps in both the technical skill set as well as general mentality within the field.

Exploring machine learning software

Before we start the tutorials, we will need some tools. To accommodate users regarding their specific operating system requirements, we will use ML software that is compatible across all operating systems, whether it’s Windows, macOS, or Linux. We will be using Python programming language and the Python libraries such as BioPython for genomic data analysis, Scikit-learn for ML building, and Keras to train our DL models. Let’s take a closer look at these pieces of ML software.

Python programming language

We will be using the Python programming language throughout this book. Python is a widely used programming language for researchers because of its popularity, the available packages that support all types of data analysis, and its user-friendliness. More importantly, ML, DL, and the genomic community routinely use Python for their own analysis needs. Throughout this book, we will use Python version 3.7 and look at a few ways of installing Python using Pip, Conda, and Anaconda.

Visualization

We will be using the Matplotlib and Seaborn Python packages, which are the two most popular visualization libraries in Python. They are quick to install, easy to use, and easy to import in the Python script. They both come with a variety of functions and methods to use on the data. Throughout this book, we will use Matplotlib version 3.5.1 and Seaborn version 0.11.2. We will look at a few ways of installing these libraries in the subsequent chapters.

Biopython

We will also be using Biopython, a Python module that provides a collection of Python tools for processing genomic data. It creates high-quality, reusable calls for analyzing complex genomic data. It has inherent libraries to connect to databases such as Swiss-Port, NCBI, ENSEMBL, and so on. We will use Biopython version 1.78 and look at separate ways of installing Biopython using Pip, Conda, and Anaconda.

Scikit-learn

Scikit-learn is a Python package written for the sole purpose of performing ML and is one of the most popular ML libraries used by data scientists. It has a rich collection of ML algorithms, extensive tutorials, good documentation, and, most importantly, an excellent user community. For this introductory chapter, we will use scikit-learn for developing ML models in Python. Wherever applicable, we will use scikit-learn version 1.0.2 and look at separate ways of installing scikit-learn in the subsequent chapters.