Searching a string using the Rabin-Karp algorithm
The Rabin-Karp algorithm finds a pattern in a body of text by matching a unique representation of the pattern against a moving window. The unique representation, or hash, is computed by considering a string as a number written in an arbitrary base of 26 or greater.
The advantage of Rabin-Karp is in searching for many needles in a haystack. It's not very efficient to search for just a single string. After the initial preprocessing of the corpus, the algorithm can quickly find matches.
Getting ready
Install the Data.ByteString.Search
library from Cabal as follows:
$ cabal install stringsearch
How to do it...
Use the
OverloadedStrings
language extension to facilitate theByteString
manipulations in our code as follows. It essentially allows polymorphic behavior for strings so that the GHC compiler may infer it as aByteString
type when necessary:{-# LANGUAGE OverloadedStrings #-}
Import the Rabin-Karp algorithms as follows:
import Data.ByteString...