Let's break the command down before you run it. The cut command removes sections from each line of a file. The -d parameter tells cut we are working with a tsv (tab separated values), and the -f parameter tells cut what fields we are interested in. Since product_title is the sixth field in our file, we started with that:
cut -d$'\t' -f 6,8,13,14 reviews.tsv | more
Unlike most programs, cut starts at 1 instead of 0.
Let’s see the results:
data:image/s3,"s3://crabby-images/5addd/5addda555edf40ac74b48f20ab218e755ae56443" alt=""
Much better! Let's go ahead and save this as a new file:
cut -d$'\t' -f 6,8,13,14 reviews.tsv > stripped_reviews.tsv
The following is what you should see once you run the preceding command:
data:image/s3,"s3://crabby-images/3c8a6/3c8a64df8e4814b21d0a8fd51fd686ac07db69e0" alt=""
Let's see how many times the word Packt shows up in this dataset:
grep -i Packt stripped_reviews.tsv | wc -w
The following is what you should see once you run the preceding command:
data:image/s3,"s3://crabby-images/e75d7/e75d7a622d418619155500c8c275eee40d506352" alt=""
Let&apos...