Distributed word scoring with Redis and execnet
We can use Redis
and execnet
together to do distributed word scoring. In the Calculating high information words recipe in Chapter 7, Text Classification, we calculated the information gain of each word in the movie_reviews
corpus using a FreqDist
and ConditionalFreqDist
. Now that we have Redis
, we can do the same thing using a RedisHashFreqDist
and a RedisConditionalHashFreqDist
, and then store the scores in a RedisOrderedDict
. We can use execnet
to distribute the counting in order to get a better performance out of Redis
.
Getting ready
Redis
, redis-py
, and execnet
must be installed, and an instance of redis-server
must be running on localhost.
How to do it...
We start by getting a list of (label, words)
tuples for each label in the movie_reviews
corpus (which only has pos
and neg
labels). Then, we get the word_scores
using score_words()
from the dist_featx
module. The word_scores
function is an instance of RedisOrderedDict
, and we can see that...