GNU Text Utilities

I've been talking about the UNIX (and Linux) shell of late, and it's about time I talked about some of the standard GNU text utilities, since they pop up so often in scripts and daily life at the command prompt. They have familiar names, at least some of them: cat, wc, sort, join, cut, head, fmt, uniq, cksum, comm, csplit, expand, fold, nl, od, paste, pr, split, sum, tac, tail, tr, and unexpand. I'll talk mostly about the ones I use quite a bit.

Mind you, there are several GNU toolkits, such as file utilities (sync, cp, mv, df, chown, etc.), find utilities (find, xargs, code, locate, etc.), shell utilities (date, echo, uname, nohup, pwd, etc.), and text utilities. I'll be talking about only the text utilities in this short article.

cat. You may well use cat every day, but did you know you could display non-printing characters, such as tabs, end-of-line symbols, and control characters in a text file? Did you know you could output with line numbers for all either all lines or just non-empty lines? See the info page to learn how. (info cat)

tail. Usually you use tail to examine only the most recent changes to various logfiles or to watch a file as it's being updated dynamically. For example, if you wanted to output the contents of /var/log/messages to an XTerm, you could type as root tail -f /var/log/messages, and that XTerm would show any changes to this all-important logfile as they happened. It would be similar to running the xconsole program on your display.

GNU tail differs from BSD tail in that it doesn't support the -r switch (reverse output). This isn't as disappointing as it might sound: BSD tail can only reverse output of files that are smaller than its buffer size, which is typically 32 kilobytes. Anyway, another GNU text utility, tac, accomplishes the same thing anyway.

wc. Since I write quite a bit, wc is one tool I use a lot. As I write this article in vi, I frequently use the command wc -w %, meaning: count the words in this current file. It's very handy as I try to not aggravate John by exceeding my wordcount.

wc. counts not only words, but also characters, lines, and bytes. Or, you can display both: wc --words --bytes FILENAME You can also use wc in ways you may not have thought of:

fold, pr, nl, fmt. These are tools I use often enough to be familiar but not intimate with them. I use them most often from inside an editor, such as vi. For example, suppose I want to reformat someone's email that I'm responding to. Many people don't set their linewrap properly on their email client, so it usually overfills my display and appears rather ugly. I'll simply use an ex command such as the following from within vi:

:45,53!fmt --width=65 --prefix='> '

Lines 45 to 53 will be reformatted and the "> " prefix will be reinserted for a pleasing display. In fact, of all these four commands, I use fmt the most. It's also quite good at formatting programming code, which most of these tools were originally designed for. However, they work just as well on simple text.

fold is particularly good at wrapping each input line to a fixed width. I usually use fmt to do this, though. Still fold will let you wrap lines according to byte length, not just columns, as fmt does.

pr is useful in either a text editor or by itself on the command line. pr prepares a document for printing by typesetting the text into columns and paginated copy with headers and footers. Take a text file and go pr -2 textfile and see what I mean. You can see that sometimes the line breaks don't occur neatly, so you might first filter the text through fold, mentioned above. fold -w 30 -s filename.txt | pr -2 | more will wrap each line to not more than 30 characters and not break any line in the middle of a word, and pr will print it with nice headers and footers in two-column format (or however many columns you prefer). pr has many options that give you quite a bit of flexibility in its appearance. A better way to count pages in your document is cat filename | pr -1 | grep Page | wc -l, for example. Check out info pr.

nl is a line numbering tool. Since I use vi, I normally just use the built-in line numbering within the Ex editor. But I use nl in scripts or on the command line when I want line numbers to precede each line. Sometimes this is particularly useful when quoting some program code. cat filename.c | nl | pr -1 -t | more will display numbers in front of each line of filename.c and print it without pretty headers or spaces where headers and footers would normally be. The output is paged through more for easy viewing.

You can also use the -n option of cat to achieve much the same purpose. For example, if you wanted to direct a programmer's attention to lines 45 through 48 of a program, you could try: cat -n program.c | grep -A 3 ^45 > forpaul.txt This will put line numbers in front of each line of program.c, grep will isolate line 45 and the -A will append 3 additional lines of program context, and this group of lines will be output to a file called forpaul.txt (your programmer friend's name, hopefully), which you can send to him.

The rest. I have used each of the tools mentioned at least several times. They're all useful, some more than others. I use the rest of them primarily in shell scripts or in handling various administrative tasks from the commandline. Once again, I'm out of space in this little column, so you'll have to try them out on your own. Remember, the GNU info pages are the authoritative source now since the man pages are no longer being updated. Try these GNU text tools out and see how they can make your life easier!



dsj@dsj.net