I've been talking about the UNIX (and Linux) shell of late, and it's about time I talked about some of the standard GNU text utilities, since they pop up so often in scripts and daily life at the command prompt. They have familiar names, at least some of them: cat, wc, sort, join, cut, head, fmt, uniq, cksum, comm, csplit, expand, fold, nl, od, paste, pr, split, sum, tac, tail, tr, and unexpand. I'll talk mostly about the ones I use quite a bit.
Mind you, there are several GNU toolkits, such as file utilities (sync, cp, mv, df, chown, etc.), find utilities (find, xargs, code, locate, etc.), shell utilities (date, echo, uname, nohup, pwd, etc.), and text utilities. I'll be talking about only the text utilities in this short article.
cat. You may well use cat every day, but did you know you could display non-printing characters, such as tabs, end-of-line symbols, and control characters in a text file? Did you know you could output with line numbers for all either all lines or just non-empty lines? See the info page to learn how. (info cat)
tail. Usually you use tail to examine only the most recent changes to various logfiles or to watch a file as it's being updated dynamically. For example, if you wanted to output the contents of /var/log/messages to an XTerm, you could type as root tail -f /var/log/messages, and that XTerm would show any changes to this all-important logfile as they happened. It would be similar to running the xconsole program on your display.
GNU tail differs from BSD tail in that it doesn't support the -r switch (reverse output). This isn't as disappointing as it might sound: BSD tail can only reverse output of files that are smaller than its buffer size, which is typically 32 kilobytes. Anyway, another GNU text utility, tac, accomplishes the same thing anyway.
wc. Since I write quite a bit, wc is one tool
I use a lot. As I write this article in vi, I frequently use the
wc -w %,
meaning: count the words in this current file. It's very handy as I
try to not aggravate John by exceeding my wordcount.
wc. counts not only words, but also characters, lines, and
bytes. Or, you can display both:
wc --words --bytes FILENAME
You can also use wc in ways you may not have thought of:
cat user.log | grep login | grep USER | wc -lwill tell you how many times USER logged in.
wc -c FILENAMEis like a ``sizeof'' function in various programming languages. It will tell you how many bytes it takes up.
if [wc -w ARTICLE > 1000]; then echo "edit this down" fiwill help impose some discipline on wordy people like me.
a=$((cat ARTICLE | wc -l)); b=$(($a/30)); echo $bdoes a good job of approximating the number of pages in ARTICLE. I have this function bound to a Vim macro, so I can tell how long the chapter I'm working on currently is.
fold, pr, nl, fmt. These are tools I use often enough to be familiar but not intimate with them. I use them most often from inside an editor, such as vi. For example, suppose I want to reformat someone's email that I'm responding to. Many people don't set their linewrap properly on their email client, so it usually overfills my display and appears rather ugly. I'll simply use an ex command such as the following from within vi:
:45,53!fmt --width=65 --prefix='> '
Lines 45 to 53 will be reformatted and the "> " prefix will be reinserted for a pleasing display. In fact, of all these four commands, I use fmt the most. It's also quite good at formatting programming code, which most of these tools were originally designed for. However, they work just as well on simple text.
fold is particularly good at wrapping each input line to a fixed width. I usually use fmt to do this, though. Still fold will let you wrap lines according to byte length, not just columns, as fmt does.
pr is useful in either a text editor or by itself on the
command line. pr prepares a document for printing by
typesetting the text into columns and paginated copy with headers
and footers. Take a text file and go
pr -2 textfile and see
what I mean. You can see that sometimes the line breaks don't occur
neatly, so you might first filter the text through fold,
fold -w 30 -s filename.txt | pr -2 | more
will wrap each line to not more than 30 characters and not break any
line in the middle of a word, and pr will print it with nice
headers and footers in two-column format (or however many columns
you prefer). pr has many options that give you quite a bit of
flexibility in its appearance. A better way to count pages in your
cat filename | pr -1 | grep Page | wc -l, for
example. Check out info pr.
nl is a line numbering tool. Since I use vi, I normally just
use the built-in line numbering within the Ex editor. But I use
nl in scripts or on the command line when I want line numbers
to precede each line. Sometimes this is particularly useful when
quoting some program code.
cat filename.c | nl | pr -1 -t | more
will display numbers in front of each line of filename.c
and print it without pretty headers or spaces where headers and
footers would normally be. The output is paged through more for
You can also use the -n option of cat to achieve much
the same purpose. For example, if you wanted to direct a
programmer's attention to lines 45 through 48 of a program, you
cat -n program.c | grep -A 3 ^45 > forpaul.txt
This will put line numbers in front of each line of program.c, grep will isolate line 45 and the -A will append 3
additional lines of program context, and this group of lines will be
output to a file called forpaul.txt (your programmer friend's
name, hopefully), which you can send to him.
The rest. I have used each of the tools mentioned at least several times. They're all useful, some more than others. I use the rest of them primarily in shell scripts or in handling various administrative tasks from the commandline. Once again, I'm out of space in this little column, so you'll have to try them out on your own. Remember, the GNU info pages are the authoritative source now since the man pages are no longer being updated. Try these GNU text tools out and see how they can make your life easier!