Discriminative and generative modeling, and Bayes’ theorem
Now, let us consider how these rules of conditional and joint probability relate to the kinds of predictive models that we build for various machine learning applications. In most cases—such as predicting whether an email is fraudulent or the dollar amount of the future lifetime value of a customer—we are interested in the conditional probability, P(Y|X=x), where Y is the set of outcomes we are trying to model and X is the input “features,” and x is a particular value of the input features. For example, we are trying to calculate the probability that an email is fraudulent based on the knowledge of the set of words (the x) in the message. This approach is known as discriminative modeling (Ref 15-17). Discriminative modeling attempts to learn a direct mapping between the data, X, and the outcomes, Y.
Another way to understand discriminative modeling is in the context of Bayes’ theorem ...