In the previous sections, we saw how to SELECT data, inner JOIN data, and even do GROUP BY and ORDER BY operations on flat files or streams of data. Rounding out the commonly-used operations, we can also create sub-selected tables of data by simply wrapping a set of calls into a stream and then processing them further. This is what we've been doing using the piping model, but to illustrate a point, say we wanted to sub-select out of the grouped-by reviews only those reviewers who had between 100 and 200 reviews. We can take the command in the preceding example and awk it once more:
zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -d$'\t' -f2,8 | awk '{sum[$1]+=$2;count[$1]+=1} END {for (i in sum) {print i,sum[i],count[i],sum[i]/count[i]}}' | sort -k3 -r -n | awk '$3 >= 100 && $3 <=200' | head...