8.1 Linear models and non-linear data
In Chapter 4 and Chapter 6 we learned how to build models of the general form:
Here, θ is a parameter for some probability distribution, for example, the mean of a Gaussian, the p parameter of the binomial, the rate of a Poisson, and so on. We call the inverse link function and is some other function we use to potentially transform the data, like a square root, a polynomial function, or something else.
Fitting, or learning, a Bayesian model can be seen as finding the posterior distribution of the weights β, and thus this is known as the weight view of approximating functions. As we already saw with polynomial and splines regression, by letting be a non-linear function, we can map the inputs onto a feature space. We also saw that by using a polynomial of the proper degree, we can perfectly fit any function. But unless we apply some form of regularization, for example, using prior distributions, this will lead to models that memorize...