Tutorial – protein sequence classification via LSTMs using Keras and MLflow
Deep learning has gained a surge of popularity in recent years, prompting many scientists to turn to the field as a new means for solving and optimizing scientific problems. One of the most popular applications for deep learning within the biotechnology space involves protein sequence data. So far within this book, we have focused our efforts on developing predictive models when it comes to structured data. We will now turn our attention to data that's sequential in the sense that the elements within a sequence bear some relation to their previous element. Within this tutorial, we will attempt to develop a protein sequence classification model in which we will classify protein sequences based on their known family accession using the Pfam (https://pfam.xfam.org/) dataset.
Important note
Pfam
dataset: Pfam: The protein families database in 2021 J. Mistry, S. Chuguransky, L. Williams, M. Qureshi...