PyTorch and Explainable AI
Throughout this book, we have built several deep learning models that can perform different kinds of tasks for us, such as a handwritten digit classifier, an image-caption generator, and a sentiment classifier. Although we have mastered how to train and evaluate these models using PyTorch, we do not know precisely what is happening inside these models while they make predictions. Model interpretability or explainability is a field of machine learning where we aim to answer the question, “Why did the model make that prediction?” Put differently, “What did the model see in the input data to make that particular prediction?” The answers to such questions become essential when such models are used for sensitive applications such as cancer diagnosis and legal aid.
In this chapter, we will use the handwritten digit classification model from Chapter 1, Overview of Deep Learning Using PyTorch, examine its inner workings, and thereby...