Use case – Predicting the binding site location of the JunD TF
In the last section of the chapter, we will see how to leverage a DNN algorithm to solve the problem of prediction of Transcription Factor Binding Site (TFBS) predictions in the human genome. We will build a DL model using one of the most popular NN architectures that are commonly used in genomics: CNNs, which we learned about previously. But before that, let’s understand the problem and data in detail.
TFs are proteins that control gene regulation. They bind to the regulatory regions of the DNA such as Promoters and either promote or repress gene expression. Each TF has a specific binding motif that it binds to, which is referred to as a TFBS. The identification of a TFBS is very challenging because the binding motifs are generally very small (<10bp) or not completely specific, or a TF may bind to many similar but not identical sequences, or in some cases, some bases in the motifs are generally more...