Coding – BoW
I will use the classic song Theme from New York, New York by Frank Sinatra. Its repetition of “New York” is a good choice for NLP. At the time of writing this chapter, the jazz music of the song accompanies the calm and snowy midnight in New York.
Let’s learn how to do BoW with Gensim.
Gensim for BoW
Let’s import several modules. The Gensim simple_preprocess
function converts a document into a list of tokens. The Gensim Dictionary()
class implements the concept of a dictionary in Gensim. It maps a tokenized word to a unique ID. I will also import pprint
for prettyprint
. It will print output in a prettier form:
import gensimfrom gensim.utils import simple_preprocess from gensim.corpora...