Home page of Eric Pement

Home > awk.htm

   

The awk programming language

awk is a programming language that gets its name from the 3 people who invented it (Aho, Weinberger, and Kernighan). Because it was developed on a Unix operating system, its name is usually printed in lower-case ("awk") instead of capitalized ("Awk"). awk is distributed as free software, meaning that you don't have to pay anything for it and you can get the source code to build awk yourself .

It's not an "I can do anything" programming language like C++ or VisualBasic, although it can do a lot. awk excels at handling text and data files, the kind that are created in Notepad or (for example) HTML files. You wouldn't use awk to modify a Microsoft Word document or an Excel spreadsheet. However, if you take the Word document and Save As "Text Only" or if you take the Excel spreadsheet and Save As tab-delimited (*.txt) or comma-delimited (*.csv) output files, then awk could do a good job at handling them.

I like awk because it's concise. The shortest awk program that does anything useful is just 1 character:

awk 1 yourfile

On a DOS/Windows machine, this converts Unix line endings (LF) to standard DOS line endings (CR,LF). awk programs are often called "scripts" because they don't require an intermediate stage of compiling the progam into an executable form like an *.EXE file. In fact, awk programs are almost never compiled into *.EXE files (although I think it's possible to do this). Thus, many people refer to awk as a "scripting language" instead of a "programming language."

This doesn't mean that you couldn't run an awk program from an icon on the Windows desktop. It means that instead of creating a shortcut to something like "mywidget.exe", you'd create a shortcut to "awk -f mywidget.awk somefile.txt" when Windows prompts you for the Command Line.

Some awk thingies I've written

  • awk1line.txt - one-line scripts for awk. Modeled after my "sed one-liners" file, particular to awk.

  • awktail.txt - the proper way to assign the "rest of the line" to a variable. I did it wrong for a year or two, and now I don't want to forget.

  • awk_sed.txt - a table comparing similar commands between sed and awk. How to do substitutions, deletions, etc., in both sed and awk.

  • awkuse.txt - my personal cheat sheet of all the main awk commands. When I forget the main tricks, this reminds me.

  • Using system commands - the real way to embed system commands (say, calls to sed or perl or fmt) within an awk script, so they can be used just on a particular hunk of text.

  • endnote.txt - (newly updated!) This is a really helpful file for people who write documents in plain ASCII (like Emacs or vim users!), and who want to insert footnotes or endnotes in their documents, but who want to be able to move their footnotes all around without renumbering everything. Basically, you use references like this[##] in your text, putting the actual citation (Dante, Book 3, sect. 2) directly below the paragraph. Rearrange the document to your heart's delight. When you're all finished, use this script to sequentially number all your references, gather your notes together, and print them at the end of the file with numbers corresponding to the in-text references. Totally cool. Eric Meyer thought of it, and I wrote it for both awk and perl.

    The same script is also available in perl if you'd rather. And if a picture is worth 1000 words, then this page is the best and clearest way I can think of to show the difference.

  • italbold.awk - given a textfile marked up in _pseudo-italic_ or else in *pseudo-bold* (or _*both*_), convert those tags to bona-fide HTML or some other desired output.

  • longest.awk - print the longest line in a file, with its length.

  • outline_classic.awk - given a document created in Emacs "outline-mode", convert the outline markers to traditional Outline format (e.g., A, B, C, 1, 2, (a), (b), etc.)

  • outline_numbered.awk - given a document created in Emacs "outline-mode", convert the outline markers to numbered outline format (e.g., 1, 2, 3, 3.1, 3.2, 3.2.1, 3.2.2, etc.)

  • paragrep.awk - when grepping (searching) a textfile, print the entire paragraph that contains the search expression, not just the line that it's on.

  • pmailadd.awk - how to take a list of names and e-mail addresses, and use awk to convert them to a format for immediate import into the Pegasus Mail program.

  • printf.txt - memory jog of how printf() works in awk

  • titlecase.awk - This is a function for taking a string in "ALL CAPS", "lowercase", or "mIXeD cAsE" and converting it to "Title Case", such as would be used for book or chapter titles. It keeps Roman numerals and special abbreviations (like USA, LXX, NT, NY) in caps, but keeps articles, conjunctions, and prepositions between words in lowercase. Names like D'Arcy, O'Reilly, and McDonald are properly capitalized, as are abbreviations like Ph.D. or D.Min. Obeys most style manual rules. This is really the best "titlecase" function I've seen.

  • uniq-1.awk - sample script to show how to remove duplicate data.

Tutorials

Discussion forums, newsgroups

 

These pages created with First Page, GNU Emacs, htmlpp, and Servant Salamander. Icons courtesy of Qbullets .
Last modified: Thursday, Feb 12, 2004