9.7 Exercises
Explain each of the following:
How is BART different from linear regression and splines?
When might you want to use linear regression over BART?
When might you want to use Gaussian processes over BART?
In your own words, explain why it can be the case that multiple small trees can fit patterns better than one single large tree. What is the difference in the two approaches? What are the trade-offs?
Below, we provide two simple synthetic datasets. Fit a BART model with m=50 to each of them. Plot the data and the mean fitted function. Describe the fit.
x = np.linspace(-1, 1., 200) and y = np.random.normal(2*x, 0.25)
x = np.linspace(-1, 1., 200) and y = np.random.normal(x**2, 0.25)
Create your own synthetic dataset.
Create the following dataset Y = 10sin(πX0X1)+20(X2 −0.5)2 +10X3 +5X4 + , where ∼(0,1) and X0:9 ∼(0,1). This is called Friedman’s five-dimensional function. Notice that we actually have 10 dimensions, but the last 5...