Constructing a gender identifier
Gender identification is an interesting problem. In this case, we will use the heuristic to construct a feature vector and use it to train a classifier. The heuristic that will be used here is the last N letters of a given name. For example, if the name ends with ia, it's most likely a female name, such as Amelia or Genelia. On the other hand, if the name ends with rk, it's likely a male name such as Mark or Clark. Since we are not sure of the exact number of letters to use, we will play around with this parameter and find out what the best answer is. Let's see how to do it.
Create a new python file and import the following packages:
import random from nltk import NaiveBayesClassifier from nltk.classify import accuracy as nltk_accuracy from nltk.corpus import names
Define a function to extract the last N letters from the input word:
# Extract last N letters from the input word # and that will act as our "feature" def extract_features...