Hack 16 Format Text at the Command Line 
Combine basic Unix tools to become a formatting
expert.
Don't let the syntax of the sed
command scare you off. sed is a powerful utility
capable of handling most of your formatting needs. For example, have
you ever needed to add or remove comments from a source file? Perhaps
you need to shuffle some text from one section to another.
In this hack, I'll demonstrate how to do that.
I'll also show some handy formatting tricks using
two other built-in Unix commands, tr and
col.
2.5.1 Adding Comments to Source Code
sed
allows you to specify an address range
using a pattern, so let's put this to use. Suppose
we want to comment out a block of text in a source file by adding
// to the start of each line we wish to comment
out. We might use a text editor to mark the block with
bc-start and bc-end:
% cat source.c
if (tTd(27, 1))
sm_dprintf("%s (%s, %s) aliased to %s\n",
a->q_paddr, a->q_host, a->q_user, p);
bc-start
if (bitset(EF_VRFYONLY, e->e_flags))
{
a->q_state = QS_VERIFIED;
return;
}
bc-end
message("aliased to %s", shortenstring(p, MAXSHORTSTR));
and then apply a sed script such as:
% sed '/bc-start/,/bc-end/s/^/\/\//' source.c
to get:
if (tTd(27, 1))
sm_dprintf("%s (%s, %s) aliased to %s\n",
a->q_paddr, a->q_host, a->q_user, p);
//bc-start
// if (bitset(EF_VRFYONLY, e->e_flags))
// {
// a->q_state = QS_VERIFIED;
// return;
// }
//bc-end
message("aliased to %s", shortenstring(p, MAXSHORTSTR));
The script used search and replace to add // to
the start of all lines (s/^/\/\//) that lie
between the two markers (/bc-start/,/bc-end/).
This will apply to every block in the file between the marker pairs.
Note that in the sed script, the
/ character has to be escaped as
\/ so it is not mistaken for a delimiter.
2.5.2 Removing Comments
When
we need to delete the comments and the
two bc- lines (let's assume that
the edited contents were copied back to
source.c), we can use a script such as:
% sed '/bc-start/d;/bc-end/d;/bc-start/,/bc-end/s/^\/\///' source.c
Oops! My first attempt won't work. The
bc- lines must be deleted
after they have been used as address ranges.
Trying again we get:
% sed '/bc-start/,/bc-end/s/^\/\///;/bc-start/d;/bc-end/d' source.c
If you want to leave the two bc- marker lines in
but comment them out, use this piece of trickery:
% sed '/bc-start/,/bc-end/{/^\/\/bc-/\!s/\/\///;}' source.c
to get:
if (tTd(27, 1))
sm_dprintf("%s (%s, %s) aliased to %s\n",
a->q_paddr, a->q_host, a->q_user, p);
//bc-start
if (bitset(EF_VRFYONLY, e->e_flags))
{
a->q_state = QS_VERIFIED;
return;
}
//bc-end
message("aliased to %s", shortenstring(p, MAXSHORTSTR));
Note that in the bash shell you must use:
% sed '/bc-start/,/bc-end/{/^\/\/bc-/!s/\/\///;}' source.c
because the bang character (!) does not need to be
escaped as it does in tcsh.
What's with the curly braces? They prevent a common
mistake. You may imagine that this example:
% sed -n '/$USER/p;p' *
prints each line containing $USER twice because of
the p;p commands. It doesn't,
though, because the second p is not restrained by
the /$USER/ line address and therefore applies to
every line. To print twice just those lines
containing $USER, use:
% sed -n '/$USER/p;/$USER/p' *
or:
% sed -n '/$USER/{p;p;}' *
The construct {...} introduces a function list
that applies to the preceding line address or range.
A line address followed by ! (or
\! in the tcsh shell) reverses
the address range, and so the function (list) that follows is applied
to all lines not matching. The net effect is to
remove // from all lines that
don't start with //bc- but that
do lie within the bc- markers.
2.5.3 Using the Holding Space to Mark Text
sed
reads input into the
pattern space, but it also provides a buffer (called the
holding space) and functions to move text from
one space to the other. All other functions (such as
s and d) operate on the
pattern space, not the holding space.
Check out this sed script:
% cat case.script
# Sed script for case insensitive search
#
# copy pattern space to hold space to preserve it
h
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
# use a regular expression address to search for lines containing:
/test/ {
i\
vvvv
a\
^^^^
}
# restore the original pattern space from the hold space
x;p
First, I have written the script to a file instead of typing it in on
the command line. Lines starting with # are
comments and are ignored. Other lines specify a
sed command, and commands are separated by either
a newline or ; character. sed
reads one line of input at a time and applies the whole script file
to each line. The following functions are applied to each line as it
is read:
- h
-
Copies the pattern space (the line just read) into the holding space.
- y/ABC/abc/
-
Operates on the pattern space, translating A to
a, B to b,
and C to c and so on, ensuring
the line is all lowercase.
- /test/ {...}
-
Matches the line just read if it includes the text
test (whatever the original case, because the line
is now all lowercase) and then applies the list of functions that
follow. This example appends text before (i\) and
after (a\) the matched line to highlight it.
- x
-
Exchanges the pattern and hold space, thus restoring the original
contents of the pattern space.
- p
-
Prints the pattern space.
Here is the test file:
% cat case
This contains text Hello
that we want to TeSt
search for, but in test
a case insensitive XXXX
manner using the sed TEST
editor. Bye bye.
%
Here are the results of running our sed script on
it:
% sed -n -f case.script case
This contains text Hello
vvvv
that we want to TeSt
^^^^
vvvv
search for, but in test
^^^^
a case insensitive XXXX
vvvv
manner using the sed TEST
^^^^
editor. Bye bye.
Notice the vvv ^^^ markers around lines that
contain test.
2.5.4 Translating Case
The tr
command can translate one character
to another. To change the contents of case into
all lowercase and write the results to file
lower-case, we could use:
% tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' \
< case > lower-case
tr works with standard input and output only, so
to read and write files we must use redirection.
2.5.5 Translating Characters
To translate carriage return characters into newline characters, we
could use:
% tr \\r \\n <
cr
>
lf
where cr is the original file and
lf is a new file containing line feeds in
place of carriage returns. \n represents a line
feed character, but we must escape the backslash character in the
shell, so we use \\n instead. Similarly, a
carriage return is specified as \\r.
2.5.6 Removing Duplicate Line Feeds
tr
can also squeeze multiple
consecutive occurrences of a particular character into a single
occurrence. For example, to remove duplicate line feeds from the
lines file:
% tr -s \\n < lines > tmp ; mv tmp lines
Here we use the tmp file trick again because
tr, like grep and
sed, will trash the input file if it is also the
output file.
2.5.7 Deleting Characters
tr can also delete selected characters. If for
instance if you hate vowels, run your documents through this:
% tr -d aeiou < file
2.5.8 Translating Tabs to Spaces
To
translate tabs into multiple
spaces, use the -x flag:
% cat tabs
col col col
% od -x tabs
0000000 636f 6c09 636f 6c09 636f 6c0a 0a00
0000015
% col -x < tabs > spaces
% cat spaces
col col col
% od -h spaces
0000000 636f 6c20 2020 2020 636f 6c20 2020 2020
0000020 636f 6c0a 0a00
0000025
In
this example I
have used od -x to octal dump in hexadecimal the
contents of the before and after files, which shows more clearly that
the translation has worked. (09 is the code for
Tab and 20 is the code for Space.)
2.5.9 See Also
man
sed man tr man col man od
|