Pipes and filters
Unix and Linux shells approach problems with the use of the pipes and filters pattern. Here is how we could implement the inverted index in GNU awk, gawk:
BEGIN { FS = "->" } { split($2, a, " ") for (x in a) { w = a[x] # print w iidx[w] = iidx[w] $1 } } END { for(w in iidx) { print w " -> " iidx[w] } }
Here is an example of pipes and filters:
echo 'Carr -> And So To Murder Carr -> The Arabian Nights Murder Carr -> The Mad Hatter Mystery Christie -> The Murder Of Roger Ackroyd Christie -> The Sittaford Mystery Carr -> The Plague Court Murders' | sed -E 's/The|Of|And|To|So//g' | gawk -f iidx.awk
Note that we pipe in the input text to sed. Sed filters out uninteresting words and passes on the rest of the text to awk. Awk generates the inverted index using features such as field splitting and associative arrays (which are similar to hash maps).
Note that I can combine these tools in many ways. For example, I could implement...