Distinguishing generative AI from other AI models
Again, the critical distinction between discriminative and generative models lies in their objectives. Discriminative models aim to predict target outputs given input data. Classification algorithms, such as logistic regression or support vector machines, find decision boundaries in data to categorize inputs as belonging to one or more class. Neural networks learn input-output mappings by optimizing weights through backpropagation (or tracing back to resolve errors) to make accurate predictions. Advanced gradient boosting models, such as XGBoost or LightGBM, further enhance these discriminative models by employing decision trees and incorporating the principles of gradient boosting (or the strategic ensembling of models) to make highly accurate predictions.
Generative methods learn complex relationships through expansive training in order to generate new data sequences enabling many downstream applications. Effectively, these models create synthetic outputs by replicating the statistical patterns and properties discovered in training data, capturing nuances and idiosyncrasies that closely reflect human behaviors.
In practice, a discriminative image classifier labels images containing a cat or a dog. In contrast, a generative model can synthesize diverse, realistic cat or dog images by learning the distributions of pixels and implicit features from existing images. Moreover, generative models can be trained across modalities to unlock new possibilities in synthesis-focused applications to generate human-like photographs, videos, music, and text.
There are several key methods that have formed the foundation for many of the recent advancements in Generative AI, each with unique approaches and strengths. In the next section, we survey generative advancements over time, including adversarial networks, variational autoencoders, diffusion models, and autoregressive transformers, to better understand their impact and influence.
Briefly surveying generative approaches
Modern generative modeling encompasses diverse architectures suited to different data types and distinct tasks. Here, we briefly introduce some of the key approaches that have emerged over the years, bringing us to the state-of-the-art models:
- Generative adversarial networks (GANs) involve two interconnected neural networks—one acting as a generator to create realistic synthetic data and the other acting as a discriminator that distinguishes between real and synthetic (fake) data points. The generator and discriminator are adversaries in a zero-sum game, each fighting to outperform the other. This adversarial relationship gradually improves the generator’s capacity to produce vividly realistic synthetic data, making GANs adept at creating intricate image distributions and achieving photo-realistic image synthesis.
- Variational autoencoders (VAEs) employ a unique learning method to compress data into a simpler form (or latent representation). This process involves an encoder and a decoder that work conjointly (Kingma & Welling, 2013). While VAEs may not be the top choice for image quality, they are unmatched in efficiently separating and understanding complex data patterns.
- Diffusion models continuously add Gaussian noise to data over multiple steps to corrupt it. Gaussian noise can be thought of as random variations applied to a signal to distort it, creating “noise”. Diffusion models are trained to eliminate the added noise to recover the original data distribution. This type of reverse engineering process equips diffusion models to generate diverse, high-quality samples that closely replicate the original data distribution, producing diverse high-fidelity images (Ho et al., 2020).
- Autoregressive transformers leverage parallelizable self-attention to model complex sequential dependencies, showing exceptional performance in language-related tasks (Vaswani et al., 2017). Pretrained models such as GPT-4 or Claude have demonstrated the capability for generalizations in natural language tasks and impressive human-like text generation. Despite ethical issues and misuse concerns, transformers have emerged as the frontrunners in language modeling and multimodal generation.
Collectively, these methodologies paved the way for advanced generative modeling across a wide array of domains, including images, videos, audio, and text. While architectural and engineering innovations progress daily, generative methods showcase unparalleled synthesis capabilities across diverse modalities. Throughout the book, we will explore and apply generative methods to simulate real-world scenarios. However, before diving in, we further distinguish generative methods from traditional ML methods by addressing some common misconceptions.
Clarifying misconceptions between discriminative and generative paradigms
To better understand the distinctive capabilities and applications of traditional ML models (often referred to as discriminative) and generative methods, here, we clear up some common misconceptions and myths:
Myth 1: Generative models cannot recognize patterns as effectively as discriminative models.
Truth: State-of-the-art generative models are well-known for their impressive abilities to recognize and trace patterns, rivaling some discriminative models. Despite primarily focusing on creative synthesis, generative models display classification capabilities. However, the classes output from a generative model can be difficult to explain as generative models are not explicitly trained to learn decision boundaries or predetermined relationships. Instead, they may only learn to simulate classification based on labels learned implicitly (or organically) during training. In short, in cases where the explanation of model outcomes is important, classification using a discriminative model may be the better choice.
Example: Consider GPT-4. In addition to synthesizing human-like text, it can understand context, capture long-range dependencies, and detect patterns in texts. GPT-4 uses these intrinsic language processing capabilities to discriminate between classes, such as traditional classifiers. However, because GPT learns semantic relationships through extensive training, explaining its decision-making cannot be accomplished using any established methods.
Myth 2: Generative AI will eventually replace discriminative AI.
Truth: This is a common misunderstanding. Discriminative models have consistently been the option for high-stakes prediction tasks because they focus directly on learning the decision boundary between classes, ensuring high precision and reliability. More importantly, discriminative models can be explained post-hoc, making them the ultimate choice for critical applications in sectors such as healthcare, finance, and security. However, generative models may increasingly become more popular for high-stakes modeling as explainability techniques emerge.
Example: Consider a discriminative model trained specifically for disease prediction in healthcare. A specialized model can classify data points (e.g., images of skin) as healthy or unhealthy, giving healthcare professionals a tool for early intervention and treatment plans. Post-hoc explanation methods, such as SHAP, can be employed to identify and analyze the key features that influence classification outcomes. This approach offers clear insights into the specific results (i.e., feature attribution).
Myth 3: Generative models continuously learn from user input.
Truth: Not exactly. Generative LLMs are trained using a static approach. This means they learn from a vast training data corpora, and their knowledge is limited to the information contained within that training window. While models can be augmented with additional data or in-context information to help them contextualize, giving the impression of real-time learning, the underlying model itself is essentially frozen and does not learn in real time.
Example: GPT-3 was trained in 2020 and only contained information up to that date until its successor GPT-3.5, released in March of 2023. Naturally, GPT-4 was trained on more recent data, but due to training limitations (including diminishing performance returns), it is reasonable to expect that subsequent training checkpoints will be released periodically and not continuously.
While generative and discriminative models have distinct strengths and limitations, knowing when to apply each paradigm requires evaluating several key factors. As we have clarified some common myths about their capabilities, let’s turn our attention to guidelines for selecting the right approach for a given task or problem.
Choosing the right paradigm
The choice between generative and discriminative models depends on various factors, such as the task or problem at hand, the quality and quantity of data available, the desired output, and the level of performance required. The following is a list of key considerations:
- Task specificity: Discriminative models are more suitable for high-stakes applications, such as disease diagnosis, fraud detection, or credit risk assessment, where precision is crucial. However, generative models are more adept at creative tasks such as synthesizing images, text, music, or video.
- Data availability: Discriminative models tend to overfit (or memorize examples) when trained on small datasets, which may lead to poor generalization. On the other hand, because generative models are often pretrained on vast amounts of data, they can produce a diverse output even with minimal input, making them a viable choice when data are scarce.
- Model performance: Discriminative models outperform generative models in tasks where it is crucial to learn and explain a decision boundary between classes or where expected relationships in the data are well understood. Generative models usually excel in less constrained tasks that require a measure of perceived creativity and flexibility.
- Model explainability: While both paradigms can include models that are considered “black boxes” or not intrinsically interpretable, generative models can be more difficult, or at times, impossible to explain, as they often involve complex data generation processes that rely on understanding the underlying data distribution. Alternatively, discriminative models often focus on learning the boundary between classes. In use cases where model explainability is a key requirement, discriminative models may be more suitable. However, generative explainability research is gaining traction.
- Model complexity: Generally, discriminative models require less computational power because they learn to directly predict some output given a well-defined set of inputs.
Alternatively, generative models may consume more computational resources, as their training objective is to jointly capture the intricate hidden relationships between both inputs and presumed outputs. Accurately learning these intricacies requires vast amounts of data and large computations. Computational efficiency in generative LLM training (e.g., quantization) is a vibrant area of research.
Ultimately, the choice between generative and discriminative models should be made by considering the trade-offs involved. Moreover, the adoption of these paradigms requires different levels of infrastructure, data curation, and other prerequisites. Occasionally, a hybrid approach that combines the strengths of both models can serve as an ideal solution. For example, a pretrained generative model can be fine-tuned as a classifier. We will learn about task-specific fine-tuning in Chapter 5.
Now that we have explored the key distinctions between traditional ML (i.e., discriminative) and generative paradigms, including their distinct risks, we can look back at how we arrived at this paradigm shift. In the next section, we take a brief look at the evolution of generative AI.