In this section, we're going to update the spam detector from before to use neural networks. Recall that the dataset used was from YouTube. There was an approximate of 2,000 comments with around half being spam and the other half not. These comments were of five different videos.
In the last version, we used a bag of words and a random forest. We carried out a parameter search to find the parameters best suited for the bag of words, which was the CountVectorizer that had 1,000 different words in it. These 1000 words were the top used words. We used unigrams instead of bigrams or trigrams. It would be good to drop the common and the stop words from the English language. The best way is to use TF-IDF. It was also found that using a 100 different trees would be best for the random forest. Now, we are going to use a bag of words...