Hack 14 Get the Most Out of grep 
You may not know where its odd name originated,
but you can't argue the usefulness of
grep.
Have
you ever needed to find a particular
file and thought, "I don't recall
the filename, but I remember some of its contents"?
The oddly named grep command does just that,
searching inside files and reporting on those that contain a given
piece of text.
2.3.1 Finding Text
Suppose
you
wish to search your shell scripts for the text
$USER. Try this:
% grep -s '$USER' *
add-user:if [ "$USER" != "root" ]; then
bu-user: echo " [-u user] - override $USER as the user to backup"
bu-user:if [ "$user" = "" ]; then user="$USER"; fi
del-user:if [ "$USER" != "root" ]; then
mount-host:mounted=$(df | grep "$ALM_AFP_MOUNT/$USER")
.....
mount-user: echo " [-u user] - override $USER as the user to backup"
mount-user:if [ "$user" = "" ]; then user="$USER"; fi
In this example, grep has searched through all
files in the current directory, displaying each line that contained
the text $USER. Use single quotes around the text
to prevent the shell from interpreting special characters. The
-s option suppresses error messages when
grep encounters a directory.
Perhaps you only want to know the name of each file containing the
text $USER. Use the -l option
to create that list for you:
% grep -ls '$USER' *
add-user
bu-user
del-user
mount-host
mount-user
2.3.2 Searching by Relevance
What if you're more concerned about how many times a
particular string occurs within a file? That's known
as a relevance
search
. Use a
command similar to:
% grep -sc '$USER' * | grep -v ':0' | sort -k 2 -t : -r
mount-host:6
mount-user:2
bu-user:2
del-user:1
add-user:1
How does this magic work? The -c flag lists each
file with a count of matching lines, but it unfortunately includes
files with zero matches. To counter this, I piped the output from
grep into a second grep, this
time searching for ':0' and using a second option,
-v, to reverse the sense of the search by
displaying lines that don't
match. The second grep reads from the pipe instead
of a file, searching the output of the first grep.
For a little extra flair, I sorted the subsequent output by the
second field of each line with sort -k 2, assuming
a field separator of colon (-t :) and using
-r to reverse the sort into descending order.
2.3.3 Document Extracts
Suppose
you wish to search a
set of documents and extract a few lines of text centered on each
occurrence of a keyword. This time we are interested in the matching
lines and their surrounding context, but not in the filenames. Use a
command something like this:
% grep -rhiw -A4 -B4 'preferences' *.txt > research.txt
% more research.txt
This grep command searches all files with the
.txt extension for the word
preferences. It performs a recursive search
(-r) to include all subdirectories, hides
(-h) the filename in the output, matches in a
case-insensitive (-i) manner, and matches
preferences as a complete word but not as part of
another word (-w). The -A4 and
-B4 options display the four lines immediately
after and before the matched
line, to give the desired context. Finally, I've
redirected the output to the file research.txt.
You could also send the output straight to the vim
text editor with:
% grep -rhiw -A4 -B4 'preferences' *.txt | vim -
Vim: Reading from stdin...
 |
vim can be installed from
/usr/ports/editors/vim.
|
|
Specifying vim - tells vim to
read stdin (in this case the piped output from
grep) instead of a file. Type
:q! to exit vim.
To search files for several alternatives, use the
-e option to introduce extra search patterns:
% grep -e 'text1' -e 'text2' *
 |
Q. How did grep get its odd name?
A. grep was written as a standalone program to
simulate a commonly performed command available in the ancient Unix
editor ex. The command in question searched an
entire file for lines containing a regular expression and displayed
those lines. The command was g/re/p:
globally search for a regular
expression and print the line.
|
|
2.3.4 Using Regular Expressions
To
search
for text that is more vaguely specified, use a regular expression.
grep understands both basic and extended regular
expressions, though it must be invoked as either
egrep or grep -E when given an
extended regular expression. The text or regular expression to be
matched is usually called the pattern.
Suppose you need to search for lines that end in a space or tab
character. Try this command (to insert a tab, press Ctrl-V and then
Ctrl-I, shown as <tab> in the example):
% grep -n '[ <tab>]$' test-file
2:ends in space
3:ends in tab
I used the [...] construct to form a regular
expression listing the characters to match: space and tab. The
expression matches exactly one space or one tab
character. $ anchors the match to the end of a
line. The -n flag tells grep to
include the line number in its output.
Alternatively, use:
% grep -n '[[:blank:]]$' test-file
2:ends is space
3:ends in tab
Regular expressions provide many preformed character groups of the
form [[:description:]].
Example groups include all control characters, all digits, or all
alphanumeric characters. See man
re_format for
details.
We can modify a previous example to search for either
"preferences" or
"preference" as a complete word,
using an extended regular expression such as this:
% egrep -rhiw -A4 -B4 'preferences?' *.txt > research.txt
The ? symbol specifies zero or one of the
preceding character, making the s of
preferences optional. Note that I use
egrep
because ? is
available only in extended regular expressions. If you wish to search
for the ? character itself, escape it with a
backslash, as in \?.
An alternative method uses an expression of the form
(string1|string2),
which matches either one string or the other:
% egrep -rhiw -A4 -B4 'preference(s|)' *.txt > research.txt
As a final example, use this to seek out all bash,
tcsh, or sh shell scripts:
% egrep '^#\!/bin/(ba|tc|)sh[[:blank:]]*$' *
The caret (^) character at the start of a regular
expression anchors it to the start of the line (much as
$ at the end anchors it to the end).
(ba|tc|) matches ba, tc, or nothing. The
* character specifies zero or more of
[[:blank:]], allowing trailing whitespace but
nothing else. Note that the ! character must be
escaped as \! to avoid shell interpretation in
tcsh (but not in bash).
 |
Here's a handy tip for debugging
regular expressions: if you
don't pass a filename to grep, it
will read standard input, allowing you to enter lines of text to see
which match. grep will echo back only matching
lines.
|
|
2.3.5 Combining grep with Other Commands
grep works well with other
commands. For example, to
display all tcsh processes:
% ps axww | grep -w 'tcsh'
saruman 10329 0.0 0.2 6416 1196 p1 Ss Sat01PM 0:00.68 -tcsh (tcsh)
saruman 11351 0.0 0.2 6416 1300 std Ss Sat07PM 0:02.54 -tcsh (tcsh)
saruman 13360 0.0 0.0 1116 4 std R+ 10:57PM 0:00.00 grep -w tcsh
%
Notice that the grep command itself appears in the
output. To prevent this, use:
% ps axww | grep -w '[t]csh'
saruman 10329 0.0 0.2 6416 1196 p1 Ss Sat01PM 0:00.68 -tcsh (tcsh)
saruman 11351 0.0 0.2 6416 1300 std Ss Sat07PM 0:02.54 -tcsh (tcsh)
%
I'll let you figure out how this
works.
2.3.6 See Also
|