Identifying the gender
Identifying the gender of a name is an interesting task in NLP. We will use the heuristic that the last few characters in a name is its defining characteristic. For example, if the name ends with "la", it's most likely a female name, such as "Angela" or "Layla". On the other hand, if the name ends with "im", it's most likely a male name, such as "Tim" or "Jim". As we are sure of the exact number of characters to use, we will experiment with this. Let's see how to do it.
How to do it…
Create a new Python file, and import the following packages:
import random from nltk.corpus import names from nltk import NaiveBayesClassifier from nltk.classify import accuracy as nltk_accuracy
We need to define a function to extract features from input words:
# Extract features from the input word def gender_features(word, num_letters=2): return {'feature': word[-num_letters:].lower()}
Let's define the
main
function. We need some labeled training data:if __name__=='__main__': # Extract...