Summary
Transformers are versatile NNs capable of capturing relationships of any data modality without explicit data-specific biases in the architecture. Instead of a neural network architecture capable of ingesting different data modalities directly, careful considerations of the data input structure along with crafting proper training task objectives are needed to successfully build a performant transformer. The benefits of pre-training still hold true even for the current SOTA architecture. The act of pre-training is part of a concept called transfer learning, which will be covered more extensively in the supervised and unsupervised learning chapters. Transformers can currently perform both data generation and supervised learning tasks in general with more and more research experimenting with using transformers in unexplored niche tasks and data modalities. Look forward to more deep learning innovations in the coming years with transformers being at the forefront of the advancement...