Here we will be using random forests to predict a bird's species. We will use the Caltech-UC San Diego dataset (http://www.vision.caltech.edu/visipedia/CUB-200-2011.html), which contains about 12,000 photos of birds from 200 different species. Here we are not going to look at the pictures because that would need a convolutional neural network (CNN) and this will be covered in later chapters. CNNs can handle pictures much better than a random forest. Instead, we will be using attributes of the birds such as size, shape, and color.
Here are just some of the species in the dataset:
Some, such as the American Crow and the Fish Crow, are almost indistinguishable, at least visually. The attributes for each photo, such as color and size, have actually been labeled by humans. Caltech and UCSD used human workers on Amazon's Mechanical...