Further reading
The quotes at the start of this chapter were sourced from the highly-readable Kaggle blog, No Free Hunch. Refer to http://blog.kaggle.com/2014/08/01/learning-from-the-best/.
There are many good resources for understanding NLP tasks. One fairly thorough, eight-part piece, is available online at http://textminingonline.com/dive-into-nltk-part-i-getting-started-with-nltk.
If you're keen to get started, one great option is to try Kaggle's for Knowledge NLP task, which is perfectly suited as a testbed for the techniques described in this chapter: https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-1-for-beginners-bag-of-words.
The Kaggle contest cited in this chapter is available at https://www.kaggle.com/c/detecting-insults-in-social-commentary.
For readers interested in further description of the ROC curve and the AUC measure, consider Tom Fawcett's excellent introduction, available at https://ccrma.stanford.edu/workshops/mir2009/references/ROCintro.pdf...