Finding the frequency of words used in a given file
Finding the frequency of words used in a file is an interesting exercise to apply the text-processing skills. It can be done in many different ways. Let's see how to do it.
Getting ready
We can use associative arrays, awk
, sed
, grep
, and so on, to solve this problem in different ways. Words
are alphabetic characters, delimited by space or a period. First, we should parse all the words in a given file and then the count of each word needs to be found. Words can be parsed by using regex with any of the tools, such as sed
, awk
, or grep
.
How to do it...
We just saw the logic and ideas about the solution; now let's create the shell script as follows:
#!/bin/bash #Name: word_freq.sh #Desc: Find out frequency of words in a file if [ $# -ne 1 ]; then echo "Usage: $0 filename"; exit -1 fi filename=$1 egrep -o "\b[[:alpha:]]+\b" $filename | \ awk '{ count[$0]++ } END{ printf("%-14s%s\n","Word","Count") ; for(ind in count) { printf("%-14s%d\n...