In today's tutorial, we will learn about the networking aspects, for example working with TCP/IP for both client-side and server-side. We will also explore HTTP services to help you get going with networking in AWK.
This tutorial is an excerpt from a book written by Shiwang Kalkhanda, titled Learning AWK Programming.
The AWK programming language was developed as a pattern-matching language for text manipulation; however, GAWK has advanced features, such as file-like handling of network connections. We can perform simple TCP/IP connection handling in GAWK with the help of special filenames. GAWK extends the two-way I/O mechanism used with the |& operator to simple networking using these special filenames that hide the complex details of socket programming to the programmer.
The special filename for network communication is made up of multiple fields, all of which are mandatory. The following is the syntax of creating a filename for network communication:
/net-type/protocol/local-port/remote-host/remote-port
Each field is separated from another with a forward slash. Specifying all of the fields is mandatory. If any of the field is not valid for any protocol or you want the system to pick a default value for that field, it is set as 0. The following list illustrates the meaning of different fields used in creating the file for network communication:
TCP gaurantees that data is received at the other end and in the same order as it was transmitted, so always use TCP.
In the following example, we will create a tcp-server (sender) to send the current date time of the server to the client. The server uses the strftime() function with the coprocess operator to send to the GAWK server, listening on the 8080 port. The remote host and remote port could be any client, so its value is kept as 0.
The server connection is closed by passing the special filename to the close() function for closing the file as follows:
$ vi tcpserver.awk
#TCP-Server
BEGIN {
print strftime() |& "/inet/tcp/8080/0/0"
close("/inet/tcp/8080/0/0")
}
Now, open one Terminal and run this program before running the client program as follows:
$ awk -f tcpserver.awk
Next, we create the tcpclient (receiver) to receive the data sent by the tcpserver. Here, we first create the client connection and pass the received data to the getline() using the coprocess operator. Here the local-port value is set to 0 to be automatically chosen by the system, the remote-host is set to the localhost, and the remote-port is set to the tcp-server port, 8080. After that, the received message is printed, using the print $0 command, and finally, the client connection is closed using the close command, as follows:
$ vi tcpclient.awk
#TCP-client
BEGIN {
"/inet/tcp/0/localhost/8080" |& getline
print $0
close("/inet/tcp/0/localhost/8080")
}
Now, execute the tcpclient program in another Terminal as follows :
$ awk -f tcpclient.awk
The output of the previous code is as follows :
Fri Feb 9 09:42:22 IST 2018
The server and client programs that use the UDP protocol for communication are almost identical to their TCP counterparts, with the only difference being that the protocol is changed to udp from tcp. So, the UDP-server and UDP-client program can be written as follows:
$ vi udpserver.awk
#UDP-Server
BEGIN {
print strftime() |& "/inet/udp/8080/0/0"
"/inet/udp/8080/0/0" |& getline
print $0
close("/inet/udp/8080/0/0")
}
$ awk -f udpserver.awk
Here, only one addition has been made to the client program. In the client, we send the message hello from client ! to the server. So when we execute this program on the receiving Terminal, where the udpclient.awk program is run, we get the remote system date time. And on the Terminal where the udpserver.awk program is run, we get the hello message from the client:
$ vi udpclient.awk
#UDP-client
BEGIN {
print "hello from client!" |& "/inet/udp/0/localhost/8080"
"/inet/udp/0/localhost/8080" |& getline
print $0
close("/inet/udp/0/localhost/8080")
}
$ awk -f udpclient.awk
GAWK can be used to open direct sockets only. Currently, there is no way to access services available over an SSL connection such as https, smtps, pop3s, imaps, and so on.
To read a web page, we use the Hypertext Transfer Protocol (HTTP ) service which runs on port number 80. First, we redefine the record separators RS and ORS because HTTP requires CR-LF to separate lines. The program requests to the IP address 35.164.82.168 ( www.grymoire.com ) of a static website which, in turn, makes a GET request to the web page: http://35.164.82.168/Unix/donate.html . HTTP calls the GET request, a method which tells the web server to transmit the web page donate.html. The output is stored in the getline function using the co-process operator and printed on the screen, line by line, using the while loop. Finally, we close the http service connection. The following is the program to retrieve the web page:
$ vi view_webpage.awk
BEGIN {
RS=ORS="rn"
http = "/inet/tcp/0/35.164.82.168/80"
print "GET http://35.164.82.168/Unix/donate.html" |& http
while ((http |& getline) > 0)
print $0
close(http)
}
$ awk -f view_webpage.awk
Upon executing the program, it fills the screen with the source code of the page on the screen as follows:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML lang="en-US">
<HEAD>
<TITLE> Welcome to The UNIX Grymoire!</TITLE>
<meta name="keywords" content="grymoire, donate, unix, tutorials, sed, awk">
<META NAME="Description" CONTENT="Please donate to the Unix Grymoire" >
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="myCSS.css" rel="stylesheet" type="text/css">
<!-- Place this tag in your head or just before your close body tag -->
<script type="text/javascript" src="https://apis.google.com/js/plusone.js"></script>
<link rel="canonical" href="http://www.grymoire.com/Unix/donate.html">
<link href="myCSS.css" rel="stylesheet" type="text/css">
........
........
Profiling of code is done for code optimization. In GAWK, we can do profiling by supplying a profile option to GAWK while running the GAWK program. On execution of the GAWK program with that option, it creates a file with the name awkprof.out. Since GAWK is performing profiling of the code, the program execution is up to 45% slower than the speed at which GAWK normally executes.
Let's understand profiling by looking at some examples. In the following example, we create a program that has four functions; two arithmetic functions, one function prints an array, and one function calls all of them. Our program also contains two BEGIN and two END statements. First, the BEGIN and END statement and then it contains a pattern action rule, then the second BEGIN and END statement, as follows:
$ vi codeprof.awk
func z_array(){
arr[30] = "volvo"
arr[10] = "bmw"
arr[20] = "audi"
arr[50] = "toyota"
arr["car"] = "ferrari"
n = asort(arr)
print "Array begins...!"
print "====================="
for ( v in arr )
print v, arr[v]
print "Array Ends...!"
print "====================="
}
function mul(num1, num2){
result = num1 * num2
printf ("Multiplication of %d * %d : %dn", num1,num2,result)
}
function all(){
add(30,10)
mul(5,6)
z_array()
}
BEGIN { print "First BEGIN statement"
print "====================="
}
END { print "First END statement "
print "====================="
}
/maruti/{print $0 }
BEGIN {
print "Second BEGIN statement"
print "====================="
all()
}
END { print "Second END statement"
print "====================="
}
function add(num1, num2){
result = num1 + num2
printf ("Addition of %d + %d : %dn", num1,num2,result)
}
$ awk -- prof -f codeprof.awk cars.dat
The output of the previous code is as follows:
First BEGIN statement ===================== Second BEGIN statement ===================== Addition of 30 + 10 : 40 Multiplication of 5 * 6 : 30 Array begins...! ===================== 1 audi 2 bmw 3 ferrari 4 toyota 5 volvo Array Ends...! ===================== maruti swift 2007 50000 5 maruti dezire 2009 3100 6 maruti swift 2009 4100 5 maruti esteem 1997 98000 1 First END statement ===================== Second END statement =====================
Execution of the previous program also creates a file with the name awkprof.out. If we want to create this profile file with a custom name, then we can specify the filename as an argument to the --profile option as follows:
$ awk --prof=codeprof.prof -f codeprof.awk cars.dat
Now, upon execution of the preceding code we get a new file with the name codeprof.prof. Let's try to understand the contents of the file codeprof.prof created by the profiles as follows:
# gawk profile, created Fri Feb 9 11:01:41 2018
# BEGIN rule(s)
BEGIN {
1 print "First BEGIN statement"
1 print "====================="
}
BEGIN {
1 print "Second BEGIN statement"
1 print "====================="
1 all()
}
# Rule(s)
12 /maruti/ { # 4
4 print $0
}
# END rule(s)
END {
1 print "First END statement "
1 print "====================="
}
END {
1 print "Second END statement"
1 print "====================="
}
# Functions, listed alphabetically
1 function add(num1, num2)
{
1 result = num1 + num2
1 printf "Addition of %d + %d : %dn", num1, num2, result
}
1 function all()
{
1 add(30, 10)
1 mul(5, 6)
1 z_array()
}
1 function mul(num1, num2)
{
1 result = num1 * num2
1 printf "Multiplication of %d * %d : %dn", num1, num2, result
}
1 function z_array()
{
1 arr[30] = "volvo"
1 arr[10] = "bmw"
1 arr[20] = "audi"
1 arr[50] = "toyota"
1 arr["car"] = "ferrari"
1 n = asort(arr)
1 print "Array begins...!"
1 print "====================="
5 for (v in arr) {
5 print v, arr[v]
}
1 print "Array Ends...!"
1 print "====================="
}
This profiling example explains the various basic features of profiling in GAWK. They are as follows:
GAWK provides standard representation in a profiled version of the program. GAWK also accepts another option, --pretty-print. The following is an example of a pretty-printing AWK program:
$ awk --pretty-print -f codeprof.awk cars.dat
When GAWK is called with pretty-print, the program generates awkprof.out, but this time without any execution counts in the output. Pretty-print output also preserves any original comments if they are given in a program while the profile option omits the original program’s comments. The file created on execution of the program with --pretty-print option is as follows:
# gawk profile, created Fri Feb 9 11:04:19 2018
# BEGIN rule(s)
BEGIN {
print "First BEGIN statement"
print "====================="
}
BEGIN {
print "Second BEGIN statement"
print "====================="
all()
}
# Rule(s)
/maruti/ {
print $0
}
# END rule(s)
END {
print "First END statement "
print "====================="
}
END {
print "Second END statement"
print "====================="
}
# Functions, listed alphabetically
function add(num1, num2)
{
result = num1 + num2
printf "Addition of %d + %d : %dn", num1, num2, result
}
function all()
{
add(30, 10)
mul(5, 6)
z_array()
}
function mul(num1, num2)
{
result = num1 * num2
printf "Multiplication of %d * %d : %dn", num1, num2, result
}
function z_array()
{
arr[30] = "volvo"
arr[10] = "bmw"
arr[20] = "audi"
arr[50] = "toyota"
arr["car"] = "ferrari"
n = asort(arr)
print "Array begins...!"
print "====================="
for (v in arr) {
print v, arr[v]
}
print "Array Ends...!"
print "====================="
}
To summarize, we looked at the basics of network programming and GAWK's built-in command line debugger.
Do check out the book Learning AWK Programming to know more about the intricacies of AWK programming for text processing.
20 ways to describe programming in 5 words