- The following are some text-cleaning tasks to perform:
- Clean your texts of stopwords, digits, and punctuation marks.
- Perform lemmatization.
- Create a word dictionary, including their frequencies.
- Remove the non-words from the dictionary.
- Extract the features from the data.
Check Chapter2-Practice folder for the answers: https://github.com/PacktPublishing/Mastering-Machine-Learning-for-Penetration-Testing/tree/master/Chapter%202/Chaptre2-Practice.
- Prepare the feature vectors and their labels.
train_labels = np.zeros(702)
train_labels[351:701] = 1
train_matrix = extract_features(train_dir)
- Train the model with a linear support vector machine classifier.
model = LinearSVC()
model.fit(train_matrix,train_labels)
- Print out the confusion matrix of your model.
result = model.predict(test_matrix)
print (confusion_matrix(test_labels,result...