Upgrading to Random Forest
We did a lot of great work in building up the classifier. Naïve Bayes, while usually not a bad choice, is typically never the best choice of classifier. Next, we'll try comparing its performance to an algorithm that is usually a great choice: Random Forest. Unlike the Naïve Bayes algorithm, we won't be implementing this by hand.
The reason has very little to do with the complication and more to do with the fact that it's better practice to avoid the leg work and the time cost of rolling your own algorithms by hand. A lot of sections in this book have covered how to roll your own algorithms in a test-driven manner, so you have the tools to do so if need be. The rest of this book will cover more of a test-driven approach in using the third-party libraries.
To get started, we can build a wrapper class around sklearn's functionality so that we can keep the same interface that we already have, but leverage the power of a third-party library. We should start with the same...