Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Learning AWK Programming

You're reading from   Learning AWK Programming A fast, and simple cutting-edge utility for text-processing on the Unix-like environment

Arrow left icon
Product type Paperback
Published in Mar 2018
Publisher Packt
ISBN-13 9781788391030
Length 416 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Shiwang Kalkhanda Shiwang Kalkhanda
Author Profile Icon Shiwang Kalkhanda
Shiwang Kalkhanda
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Getting Started with AWK Programming FREE CHAPTER 2. Working with Regular Expressions 3. AWK Variables and Constants 4. Working with Arrays in AWK 5. Printing Output in AWK 6. AWK Expressions 7. AWK Control Flow Statements 8. AWK Functions 9. GNU's Implementation of AWK – GAWK (GNU AWK) 10. Practical Implementation of AWK

AWK programming language overview

In this section, we will explore the AWK philosophy and different types of AWK that exist today, starting from its original implementation in 1977 at AT&T's Laboratories, Inc. We will also look at the various implementation areas of AWK in data science today.

What is AWK?

AWK is an interpreted programming language designed for text processing and report generation. It is typically used for data manipulation, such as searching for items within data, performing arithmetic operations, and restructuring raw data for generating reports in most Unix-like operating systems. Using AWK programs, one can handle repetitive text-editing problems with very simple and short programs. AWK is a pattern-action language; it searches for patterns in a given input and, when a match is found, it performs the corresponding action. The pattern can be made of strings, regular expressions, comparison operations on numbers, fields, variables, and so on. AWK reads the input files and splits each input line of the file into fields automatically.

AWK has most of the well-designed features that every programming language should contain. Its syntax particularly resembles that of the C programming language. It is named after its original three authors:

  • Alfred V. Aho
  • Peter J. Weinberger
  • Brian W. Kernighan

AWK is a very powerful, elegant, and simple tool that every person dealing with text processing should be familiar with.

Types of AWK

The AWK language was originally implemented as an AWK utility on Unix. Today, most Linux distributions provide GNU implementation of AWK (GAWK), and a symlink for AWK is created from the original GAWK binary. The AWK utility can be categorized into the following three types, depending upon the type of interpreter it uses for executing AWK programs:

  • AWK: This is the original AWK interpreter available from AT&T Laboratories. However, it is not used much nowadays and hence it might not be well-maintained. Its limitation is that it splits a line into a maximum 99 fields. It was updated and replaced in the mid-1980s with an enhanced version called New AWK (NAWK).
  • NAWK: This is AT&T's latest development on the AWK interpreter. It is well-maintained by one of the original authors of AWK - Dr. Brian W. Kernighan.
  • GAWK: This is the GNU project's implementation of the AWK programming language. All GNU/Linux distributions are shipped with GAWK by default and hence it is the most popular version of AWK. GAWK interpreter is fully compatible with AWK and NAWK.

Beyond these, we also have other, less popular, AWK interpreters and translators, mentioned as follows. These variants are useful in operations when you want to translate your AWK program to C, C++, or Perl:

  • MAWK: Michael Brennan interpreter for AWK.
  • TAWK: Thompson Automation interpreter/compiler/Microsoft Windows DLL for AWK.
  • MKSAWK: Mortice Kern Systems interpreter/compiler/for AWK.
  • AWKCC: An AWK translator to C (might not be well-maintained).
  • AWKC++: Brian Kernighan's AWK translator to C++ (experimental). It can be downloaded from: https://9p.io/cm/cs/who/bwk/awkc++.ps.
  • AWK2C: An AWK translator to C. It uses GNU AWK libraries extensively.
  • A2P: An AWK translator to Perl. It comes with Perl.
  • AWKA: Yet another AWK translator to C (comes with the library), based on MAWK. It can be downloaded from: http://awka.sourceforge.net/download.html.

When and where to use AWK

AWK is simpler than any other utility for text processing and is available as the default on Unix-like operating systems. However, some people might say Perl is a superior choice for text processing, as AWK is functionally a subset of Perl, but the learning curve for Perl is steeper than that of AWK; AWK is simpler than Perl. AWK programs are smaller and hence quicker to execute. Anybody who knows the Linux command line can start writing AWK programs in no time. Here are a few use cases of AWK:

  • Text processing
  • Producing formatted text reports/labels
  • Performing arithmetic operations on fields of a file
  • Performing string operations on different fields of a file

Programs written in AWK are smaller than they would be in other higher-level languages for similar text processing operations. AWK programs are interpreted on a GNU/Linux Terminal and thus avoid the compiling, debugging phase of software development in other languages.

You have been reading a chapter from
Learning AWK Programming
Published in: Mar 2018
Publisher: Packt
ISBN-13: 9781788391030
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime