Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Deep Learning for Genomics

You're reading from   Deep Learning for Genomics Data-driven approaches for genomics applications in life sciences and biotechnology

Arrow left icon
Product type Paperback
Published in Nov 2022
Publisher Packt
ISBN-13 9781804615447
Length 270 pages
Edition 1st Edition
Arrow right icon
Author (1):
Arrow left icon
Upendra Kumar Devisetty Upendra Kumar Devisetty
Author Profile Icon Upendra Kumar Devisetty
Upendra Kumar Devisetty
Arrow right icon
View More author details
Toc

Table of Contents (18) Chapters Close

Preface 1. Part 1 – Machine Learning in Genomics
2. Chapter 1: Introducing Machine Learning for Genomics FREE CHAPTER 3. Chapter 2: Genomics Data Analysis 4. Chapter 3: Machine Learning Methods for Genomic Applications 5. Part 2 – Deep Learning for Genomic Applications
6. Chapter 4: Deep Learning for Genomics 7. Chapter 5: Introducing Convolutional Neural Networks for Genomics 8. Chapter 6: Recurrent Neural Networks in Genomics 9. Chapter 7: Unsupervised Deep Learning with Autoencoders 10. Chapter 8: GANs for Improving Models in Genomics 11. Part 3 – Operationalizing models
12. Chapter 9: Building and Tuning Deep Learning Models 13. Chapter 10: Model Interpretability in Genomics 14. Chapter 11: Model Deployment and Monitoring 15. Chapter 12: Challenges, Pitfalls, and Best Practices for Deep Learning in Genomics 16. Index 17. Other Books You May Enjoy

CNNs for genomics

Even though CNNs are primarily used for unstructured data such as images, text, audio, and so on, they are also powerful tools for non-image data such as DNA. Unfortunately, the raw DNA sequence data cannot be provided to CNNs as input for feature extraction. It has to be converted to numerical representation before it can be used by CNN. The first thing to note for non-numeric data such as a DNA sequence is that you will have to first convert the 1D DNA sequence data to a one-hot encoded structure (Figure 5.8):

Figure 5.8 – Example of one-hot encoding for a DNA sequence

As shown in the preceding diagram, each nucleotide in the DNA sequences is represented as a one-hot vector: A = [1000], C = [0100], G = [0001], and T = [0010]. The one-hot encoded matrix can then be fed into the model for training purposes. Please note that one-hot encoding is not the only way of representing DNA sequences to a CNN. There is also label encoding in which...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime