Storing a frequency distribution in Redis
The nltk.probability.FreqDist
class is used in many classes throughout NLTK for storing and managing frequency distributions. It's quite useful, but it's all in-memory, and doesn't provide a way to persist the data. A single FreqDist
is also not accessible to multiple processes. We can change all that by building a FreqDist
on top of Redis.
Redis is a data structure server that is one of the more popular NoSQL databases. Among other things, it provides a network-accessible database for storing dictionaries (also known as hash maps). Building a FreqDist
interface to a Redis hash map will allow us to create a persistent FreqDist
that is accessible to multiple local and remote processes at the same time.
Note
Most Redis operations are atomic, so it's even possible to have multiple processes write to the FreqDist
concurrently.
Getting ready
For this and the subsequent recipes, we need to install both Redis
and redis-py
. The Redis website is at http://redis...