Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Deep Learning for Genomics
Deep Learning for Genomics

Deep Learning for Genomics: Data-driven approaches for genomics applications in life sciences and biotechnology

Arrow left icon
Profile Icon Upendra Kumar Devisetty
Arrow right icon
€17.99 €26.99
Full star icon Full star icon Full star icon Full star icon Empty star icon 4 (8 Ratings)
eBook Nov 2022 270 pages 1st Edition
eBook
€17.99 €26.99
Paperback
€33.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Upendra Kumar Devisetty
Arrow right icon
€17.99 €26.99
Full star icon Full star icon Full star icon Full star icon Empty star icon 4 (8 Ratings)
eBook Nov 2022 270 pages 1st Edition
eBook
€17.99 €26.99
Paperback
€33.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€17.99 €26.99
Paperback
€33.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Deep Learning for Genomics

Introducing Machine Learning for Genomics

Machine learning (ML) is the field of science that deals with developing computer algorithms and models that can perform certain tasks without explicitly programming them. This is to say, it teaches the machines to “learn” rather than specifying “rules” from input data provided to them. The machine then can convert that learning into expertise or knowledge and use that for predictions. ML is an important tool for leveraging technologies around artificial intelligence (AI), a subfield of computer science that aims to perform tasks automatically that we, as humans, are naturally good at. ML is an important aspect of all modern businesses and research. The adoption of ML for genomics applications is changing recently because of the availability of large genomic datasets, improvement in algorithms, and, most importantly, superior computational power. More and more scientific research organizations and industries are expanding the use of ML across vast volumes of genomic data for predictive diagnostics, as well as to get biological insights at the scale of population health.

Genomics, the study of the genetic constitution of organisms, holds promise in understanding and diagnosing human diseases or improving our agriculture and livestock. The field of genomics has seen exponential growth in the last 15 years, mainly due to recent technological advances in High-throughput sequencing also known as next-generation sequencing (NGS) technologies generating exponential amounts of genomics data. It is estimated that between 100 million and as many as 2 billion human genomes could be sequenced by 2025 (https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195), representing an astounding growth of four to five orders of magnitude in 10 years and far exceeding the growth of many big data domains. This complexity and the sheer amount of data generated create roadblocks not only to the acquisition, storage, and distribution but also to genomic data analysis. The current tools used in the genomic analysis are built on top of deterministic approaches and rely on rules encoded to perform a particular task. To keep up with data growth, we need more and new innovative approaches, such as ML, in genomics to enrich our understanding of basic biology and subject them to applied research. In this chapter, we’ll learn what ML is, why ML is essential for genomics, and what value ML brings to life sciences and biotechnology industries that leverage genome data for the development of genomic-based products. By the end of this chapter, you will understand the limitations of the current conventional algorithms for genomic data analysis, how solving problems with ML is different from conventional approaches, and how ML approaches can fill in those gaps and make generating biological insights very easy.

As such, in this chapter, we’re going to cover the following main topics:

  • What is machine learning?
  • Why machine learning for genomics?
  • Machine learning for genomics in life sciences and biotechnology

What is machine learning?

Before we talk about ML, let’s understand what AI is. In the simplest terms, AI is the ability of a machine to mimic human intelligence and iteratively improve itself based on the information it collects. The goal of AI is to build systems to perform actions that are routinely done by humans such as problem-solving, pattern matching, image recognition, knowledge acquisition, and so on. ML, a subset of AI, is the process of training a model to learn and improve from experience. Deep learning (DL), in turn, is a subfield of ML, in which we leverage artificial neural networks (ANNs) to mimic the human brain and find the nonlinear relationships between the input and output to generate predictions (Figure 1.1):

Figure 1.1 – AI versus ML versus DL – how they are related

Figure 1.1 – AI versus ML versus DL – how they are related

In ML, a model is built based on input data and an underlying algorithm to make useful predictions from real-world data. In a simplified ML, “features” that represent an individual measurable property of the data are provided as input, and “labels” are returned as the predictions. Suppose we want to predict whether a particular sequence of DNA has a binding site for a transcription factor (TF) of your interest or not. Using the traditional approach, we would use a positional weight matrix (PWF) to scan the sequence and identify the potential motifs that are overrepresented. Even though this works, this is extremely difficult, manual, scalable, and so on. Using an ML-based approach, we would give an ML model plenty of DNA sequences until the ML model learns the mathematical relationship between the features from those DNA sequences that either have or don’t have binding sites (labels) based on experimental results. It then uses this knowledge to make decisions on new data and make informed predictions. For example, we could give the ML model an unknown DNA sequence, and it would predict the correct binding site motif if present. This is one such example of why ML is a good fit for genomics problems. Some other ways in which ML can be used in genomics include identifying genetic disorders, predicting the type of cancer from genetic variants, improving disease prognosis, and so on.

Why machine learning for genomics?

One of the most important events in the field of biology was the completion of the human genome sequence in 2003, which is considered one of the significant milestones in genomics. Since then, genomics has been evolving rapidly, from research to clinical practice at scale, especially in oncology and infectious diseases. Genomics, because of its ability to identify root causes of diseases due to tiny changes in the genome, fueled the discovery of many important disease genes – particularly rare disease genes – which brought clinical decision-making one step closer to personalized medicine. As a result, sequencing efforts have exploded globally, and so the amount of genomics data that’s being generated has shot up. Along with sequencing efforts, biological techniques have started to increase in complexity and number, resulting in large-scale genomics data being generated. It is estimated that there will be between 2 and 40 exabytes of genomics data generated in the next decade (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4494865/). This is a lot of data, which the current computational and bioinformatics tools can handle, extract, interpret, and identify biological insights. ML, with its inherent nature of learning from experience, holds incredible promise in analyzing this large and complex genomic data. Since ML algorithms can detect patterns in the data automatically, it is suitable for interpreting this large trove of genomic data.

ML has a strong place in genomics since it uses mathematical and data analysis techniques that are applied to complex multi-dimensional datasets, such as genomic datasets, to build predictive models and uncover insights from those models. ML can transform heterogeneous and large-scale genomic datasets into biological insights. ML approaches rely on sophisticated statistical and computational algorithms to make biological predictions. It does this by mapping the complex association between the input features and the labels or finding complex patterns in the input features and creating groups of samples based on similarities using supervised and unsupervised methods, respectively. They can learn useful and new patterns from data that is hard to find by experts. There is now a huge demand for applying ML to genomic datasets because of their huge success in other domains.

Machine learning for genomics in life sciences and biotechnology

Because of the incredible promise that ML has shown for genomics applications such as drug discovery, diagnostics, precision medicine, agriculture, and biological research, more and more life science and biotech organizations are leveraging ML to analyze genomic data for population health and predictive analytics. As per the market research study, which takes into account technology, functionality, application, and region, the global AI in the genomics market is forecasted to reach $1.671 billion by 2025 from $202 million in 2020 (https://www.marketsandmarkets.com/Market-Reports/artificial-intelligence-in-genomics-market-36649899.html). The main drivers for this growth can be attributed to the need to control spiraling drug costs, increasing public and private investments, and, most importantly, the adoption of AI solutions in precision medicine. The recent COVID-19 pandemic has played its part in accelerating the adoption of AI for genomics as well (https://www.jmir.org/2021/3/e22453/). Even though the outlook for ML for genomics is exciting, there is a lack of a skilled workforce to develop, manage, and apply these ML methodologies in genomics. Additionally, integrating these ML systems into existing systems is a challenging task that requires a proper understanding of the concepts and techniques. For researchers to stand out from the crowd and contribute to data-driven decisions by the company, they must have the necessary skill set.

This book will address the problem of the skill gap that currently exists in the market. This book is a Swiss Army knife for any research professional, data scientist, or manager who is getting started with genomic data analysis using ML. This book highlights the power of ML approaches in handling genomics big data by introducing key concepts, employing real-life business examples, use cases, best practices, and so on to help fill the gaps in both the technical skill set as well as general mentality within the field.

Exploring machine learning software

Before we start the tutorials, we will need some tools. To accommodate users regarding their specific operating system requirements, we will use ML software that is compatible across all operating systems, whether it’s Windows, macOS, or Linux. We will be using Python programming language and the Python libraries such as BioPython for genomic data analysis, Scikit-learn for ML building, and Keras to train our DL models. Let’s take a closer look at these pieces of ML software.

Python programming language

We will be using the Python programming language throughout this book. Python is a widely used programming language for researchers because of its popularity, the available packages that support all types of data analysis, and its user-friendliness. More importantly, ML, DL, and the genomic community routinely use Python for their own analysis needs. Throughout this book, we will use Python version 3.7 and look at a few ways of installing Python using Pip, Conda, and Anaconda.

Visualization

We will be using the Matplotlib and Seaborn Python packages, which are the two most popular visualization libraries in Python. They are quick to install, easy to use, and easy to import in the Python script. They both come with a variety of functions and methods to use on the data. Throughout this book, we will use Matplotlib version 3.5.1 and Seaborn version 0.11.2. We will look at a few ways of installing these libraries in the subsequent chapters.

Biopython

We will also be using Biopython, a Python module that provides a collection of Python tools for processing genomic data. It creates high-quality, reusable calls for analyzing complex genomic data. It has inherent libraries to connect to databases such as Swiss-Port, NCBI, ENSEMBL, and so on. We will use Biopython version 1.78 and look at separate ways of installing Biopython using Pip, Conda, and Anaconda.

Scikit-learn

Scikit-learn is a Python package written for the sole purpose of performing ML and is one of the most popular ML libraries used by data scientists. It has a rich collection of ML algorithms, extensive tutorials, good documentation, and, most importantly, an excellent user community. For this introductory chapter, we will use scikit-learn for developing ML models in Python. Wherever applicable, we will use scikit-learn version 1.0.2 and look at separate ways of installing scikit-learn in the subsequent chapters.

Summary

In this first chapter, you were introduced to the concept of ML for genomics. We gained a brief understanding of ML in several genomic applications in the life science, pharma, clinical, and biotechnology industries. We also looked at the rapid strides that NGS has made in the last 15 years and how it contributed to the production of genomic big data. Then, we understood how ML can be used to analyze genomic data for the development of genomic-based products.

Finally, we looked at the different programming languages, including the most popular genomic library and ML software that we will be using throughout this book. You will mainly use Python and scikit-learn for developing models, Biopython for genomic data analysis, and some open source tools for model training and productionalizing them for deploying models.

In the next chapter, we will introduce the fundamentals of genomic data analysis.

Left arrow icon Right arrow icon

Key benefits

  • Apply deep learning algorithms to solve real-world problems in the field of genomics
  • Extract biological insights from deep learning models built from genomic datasets
  • Train, tune, evaluate, deploy, and monitor deep learning models for enabling predictions in genomics

Description

Deep learning has shown remarkable promise in the field of genomics; however, there is a lack of a skilled deep learning workforce in this discipline. This book will help researchers and data scientists to stand out from the rest of the crowd and solve real-world problems in genomics by developing the necessary skill set. Starting with an introduction to the essential concepts, this book highlights the power of deep learning in handling big data in genomics. First, you’ll learn about conventional genomics analysis, then transition to state-of-the-art machine learning-based genomics applications, and finally dive into deep learning approaches for genomics. The book covers all of the important deep learning algorithms commonly used by the research community and goes into the details of what they are, how they work, and their practical applications in genomics. The book dedicates an entire section to operationalizing deep learning models, which will provide the necessary hands-on tutorials for researchers and any deep learning practitioners to build, tune, interpret, deploy, evaluate, and monitor deep learning models from genomics big data sets. By the end of this book, you’ll have learned about the challenges, best practices, and pitfalls of deep learning for genomics.

Who is this book for?

This deep learning book is for machine learning engineers, data scientists, and academicians practicing in the field of genomics. It assumes that readers have intermediate Python programming knowledge, basic knowledge of Python libraries such as NumPy and Pandas to manipulate and parse data, Matplotlib, and Seaborn for visualizing data, along with a base in genomics and genomic analysis concepts.

What you will learn

  • Discover the machine learning applications for genomics
  • Explore deep learning concepts and methodologies for genomics applications
  • Understand supervised deep learning algorithms for genomics applications
  • Get to grips with unsupervised deep learning with autoencoders
  • Improve deep learning models using generative models
  • Operationalize deep learning models from genomics datasets
  • Visualize and interpret deep learning models
  • Understand deep learning challenges, pitfalls, and best practices

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Nov 11, 2022
Length: 270 pages
Edition : 1st
Language : English
ISBN-13 : 9781804613016
Category :
Concepts :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Nov 11, 2022
Length: 270 pages
Edition : 1st
Language : English
ISBN-13 : 9781804613016
Category :
Concepts :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 119.97
Deep Learning for Genomics
€33.99
Machine Learning in Biotechnology and Life Sciences
€41.99
Bioinformatics with Python Cookbook
€43.99
Total 119.97 Stars icon

Table of Contents

17 Chapters
Part 1 – Machine Learning in Genomics Chevron down icon Chevron up icon
Chapter 1: Introducing Machine Learning for Genomics Chevron down icon Chevron up icon
Chapter 2: Genomics Data Analysis Chevron down icon Chevron up icon
Chapter 3: Machine Learning Methods for Genomic Applications Chevron down icon Chevron up icon
Part 2 – Deep Learning for Genomic Applications Chevron down icon Chevron up icon
Chapter 4: Deep Learning for Genomics Chevron down icon Chevron up icon
Chapter 5: Introducing Convolutional Neural Networks for Genomics Chevron down icon Chevron up icon
Chapter 6: Recurrent Neural Networks in Genomics Chevron down icon Chevron up icon
Chapter 7: Unsupervised Deep Learning with Autoencoders Chevron down icon Chevron up icon
Chapter 8: GANs for Improving Models in Genomics Chevron down icon Chevron up icon
Part 3 – Operationalizing models Chevron down icon Chevron up icon
Chapter 9: Building and Tuning Deep Learning Models Chevron down icon Chevron up icon
Chapter 10: Model Interpretability in Genomics Chevron down icon Chevron up icon
Chapter 11: Model Deployment and Monitoring Chevron down icon Chevron up icon
Chapter 12: Challenges, Pitfalls, and Best Practices for Deep Learning in Genomics Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
(8 Ratings)
5 star 37.5%
4 star 25%
3 star 37.5%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




H2N Jul 14, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This textbook offers a practical and informative overview of the convergence of machine learning and biotechnology. It caters to individuals with experience in both fields and serves as a comprehensive concept refresher. Particularly valuable is its introduction to deep learning models in the context of biology and their interpretability. The inclusion of projects and code snippets empowers readers to embark on future endeavors confidently.
Amazon Verified review Amazon
Emerald Mar 12, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is the only book I can find on this topic. It’s very useful for me as a person who is relatively familiar to classical machine learning but just step into the field of genomics. I notice some comments mentioning that the book is a bit shallow. However, the book is just right to me. I like the Genomics Data Analysis part, which gives concrete examples. By these examples, I get a taste and have an idea about what people in this field are doing. My favorite part of the book is that the author gives many use cases and code examples. I’m sure this is a big topic and agree that if you want to understand the topic deeper, having this book only is not enough. But I’ll strongly recommend this book as a start for people like me. Btw, giving the paper book buyers an additional digital copy is another thing I like about the book.
Amazon Verified review Amazon
Sheena Apr 06, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book assumes technical knowledge so it is not for someone who is totally new to coding and deep learning. I think this book is a great primer for software developers who are new to genomics but have some solid fundamentals of classical machine learning and biology. It covers all the important topics of deep learning in genomics but you should not expect yourself to become an expert after reading this book. It saves you time, as a software developer who is new to genomics and deep learning, to sieve through the relevant topics and materials yourself by googling online. I wish I had this book handy when I did my onboarding for my previous job at a biotech startup. If you are looking for a primer to this topic, then this book is for you.
Amazon Verified review Amazon
Ekta Jun 27, 2023
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
This book teaches you about deep learning approaches to solve problems in genomics, interpret biological insights from genomic datasets, and finally, operationalize deep learning models using open source tools to enable predictions for end users.Part 1: It introduces the fundamentals of genomic data analysis and machine learning for genomic data and its applications.Part 2: This covers the basic concepts of deep learning and how to transform raw genomics data into biological insights. It includes DNN, CNN, RNN, Auto-encoders and GAN models.Part 3: Final part covers the deep learning models using open source tools to enable predictions for end users. It includes building and tuning deep learning methods, model interpretability, deployment and monitoring. This part also covers challenges, and best practices for deep learning in Genomics.
Amazon Verified review Amazon
YS Jan 30, 2023
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
OVERVIEWThis textbook is a great high level overview of the concepts, tools, and techniques at the intersection of ML and biotechnology, but it could benefit from a few more rounds of content balancing and visual inclusion.+ This textbook is massively practical for those who are looking to understand the intersection of ML and genetics, and already have some experience in both fields.+ I studied data science and genetics, and I found the textbook to be both a **thorough concept refresher** and a **useful introduction for DL bio models and their interpretability**.+ The **practical inclusion of projects and code snippets** made any future projects I pursue to be a lot more accessible now that I know what tools to begin with.+ Talking about **typical practices and use-cases of different models/packages and how to manage and optimize them** was a refreshing take, compared to many textbooks that lack little real-world application+ Chapters are generally well-organized and build upon knowledge from prior chapters. Summaries at end of chapters are concise but still provide a thorough overview of the chapter- The **balance of content allocated** between introductory vs more complex topics **is uneven**. This makes the later DL chapters less accessible to those without a thorough ML background, as some topics seem to be glossed over- The textbook could also **benefit from having more visuals** and **fewer blocks of text**, both of which would increase the digestibility of the work. There are some great figures (e.g. figure 5.3) but more to break up the text and visually explain tough concepts would be helpful.A lot of work has gone into the textbook and it is very useful, but it requires some visual reshaping and rebalancing of depth to improve the reader experience.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.