Transduction and the inductive inheritance of transformers
Transformers possess the unique ability to apply their knowledge to tasks they did not learn. A BERT transformer, for example, acquires language through sequence-to-sequence and masked language modeling. The BERT transformer can then be fine-tuned to perform downstream tasks that it did not learn from scratch.
In this section, we will do a mind experiment. We will use the graph of a transformer to represent how humans and machines make sense of information using language. Machines make sense of information in a different way from humans but reach very efficient results.
The following figure, a mind experiment designed in transformer architecture layers and sub-layers, shows the deceptive similarity between humans and machines. Let's study the learning process of transformer models to understand downstream tasks.
Figure 4.1: Human and machine learning methods
For our example, N=2
. This conceptual representation...