Hack 14 Get the Most Out of grep

You may not know where its odd name originated, but you can't argue the usefulness of grep.

Have you ever needed to find a particular file and thought, "I don't recall the filename, but I remember some of its contents"? The oddly named grep command does just that, searching inside files and reporting on those that contain a given piece of text.

2.3.1 Finding Text

Suppose you wish to search your shell scripts for the text $USER. Try this:

% grep -s '$USER' *

add-user:if [ "$USER" != "root" ]; then

bu-user:  echo "  [-u user] - override $USER as the user to backup"

bu-user:if [ "$user" = "" ]; then user="$USER"; fi

del-user:if [ "$USER" != "root" ]; then

mount-host:mounted=$(df | grep "$ALM_AFP_MOUNT/$USER")

.....

mount-user:  echo "  [-u user] - override $USER as the user to backup"

mount-user:if [ "$user" = "" ]; then user="$USER"; fi

In this example, grep has searched through all files in the current directory, displaying each line that contained the text $USER. Use single quotes around the text to prevent the shell from interpreting special characters. The -s option suppresses error messages when grep encounters a directory.

Perhaps you only want to know the name of each file containing the text $USER. Use the -l option to create that list for you:

% grep -ls '$USER' *

add-user

bu-user

del-user

mount-host

mount-user

2.3.2 Searching by Relevance

What if you're more concerned about how many times a particular string occurs within a file? That's known as a relevance search . Use a command similar to:

% grep -sc '$USER' * | grep -v ':0' | sort  -k 2 -t : -r

mount-host:6

mount-user:2

bu-user:2

del-user:1

add-user:1

How does this magic work? The -c flag lists each file with a count of matching lines, but it unfortunately includes files with zero matches. To counter this, I piped the output from grep into a second grep, this time searching for ':0' and using a second option, -v, to reverse the sense of the search by displaying lines that don't match. The second grep reads from the pipe instead of a file, searching the output of the first grep.

For a little extra flair, I sorted the subsequent output by the second field of each line with sort -k 2, assuming a field separator of colon (-t :) and using -r to reverse the sort into descending order.

2.3.3 Document Extracts

Suppose you wish to search a set of documents and extract a few lines of text centered on each occurrence of a keyword. This time we are interested in the matching lines and their surrounding context, but not in the filenames. Use a command something like this:

% grep -rhiw -A4 -B4 'preferences' *.txt > research.txt

% more research.txt

This grep command searches all files with the .txt extension for the word preferences. It performs a recursive search (-r) to include all subdirectories, hides (-h) the filename in the output, matches in a case-insensitive (-i) manner, and matches preferences as a complete word but not as part of another word (-w). The -A4 and -B4 options display the four lines immediately after and before the matched line, to give the desired context. Finally, I've redirected the output to the file research.txt.

You could also send the output straight to the vim text editor with:

% grep -rhiw -A4 -B4 'preferences' *.txt | vim -

Vim: Reading from stdin...

vim can be installed from /usr/ports/editors/vim.

Specifying vim - tells vim to read stdin (in this case the piped output from grep) instead of a file. Type :q! to exit vim.

To search files for several alternatives, use the -e option to introduce extra search patterns:

% grep -e 'text1' -e 'text2' *

Q. How did grep get its odd name?

A. grep was written as a standalone program to simulate a commonly performed command available in the ancient Unix editor ex. The command in question searched an entire file for lines containing a regular expression and displayed those lines. The command was g/re/p: globally search for a regular expression and print the line.

2.3.4 Using Regular Expressions

To search for text that is more vaguely specified, use a regular expression. grep understands both basic and extended regular expressions, though it must be invoked as either egrep or grep -E when given an extended regular expression. The text or regular expression to be matched is usually called the pattern.

Suppose you need to search for lines that end in a space or tab character. Try this command (to insert a tab, press Ctrl-V and then Ctrl-I, shown as <tab> in the example):

% grep -n '[ <tab>]$' test-file

2:ends in space 

3:ends in tab

I used the [...] construct to form a regular expression listing the characters to match: space and tab. The expression matches exactly one space or one tab character. $ anchors the match to the end of a line. The -n flag tells grep to include the line number in its output.

Alternatively, use:

% grep -n '[[:blank:]]$' test-file

2:ends is space 

3:ends in tab

Regular expressions provide many preformed character groups of the form [[:description:]]. Example groups include all control characters, all digits, or all alphanumeric characters. See man re_format for details.

We can modify a previous example to search for either "preferences" or "preference" as a complete word, using an extended regular expression such as this:

% egrep -rhiw -A4 -B4 'preferences?' *.txt > research.txt

The ? symbol specifies zero or one of the preceding character, making the s of preferences optional. Note that I use egrep because ? is available only in extended regular expressions. If you wish to search for the ? character itself, escape it with a backslash, as in \?.

An alternative method uses an expression of the form (string1|string2), which matches either one string or the other:

% egrep -rhiw -A4 -B4 'preference(s|)' *.txt > research.txt

As a final example, use this to seek out all bash, tcsh, or sh shell scripts:

% egrep '^#\!/bin/(ba|tc|)sh[[:blank:]]*$' *

The caret (^) character at the start of a regular expression anchors it to the start of the line (much as $ at the end anchors it to the end). (ba|tc|) matches ba, tc, or nothing. The * character specifies zero or more of [[:blank:]], allowing trailing whitespace but nothing else. Note that the ! character must be escaped as \! to avoid shell interpretation in tcsh (but not in bash).

Here's a handy tip for debugging regular expressions: if you don't pass a filename to grep, it will read standard input, allowing you to enter lines of text to see which match. grep will echo back only matching lines.

2.3.5 Combining grep with Other Commands

grep works well with other commands. For example, to display all tcsh processes:

% ps axww | grep -w 'tcsh'

saruman 10329  0.0  0.2    6416  1196  p1  Ss  Sat01PM  0:00.68 -tcsh (tcsh)

saruman 11351  0.0  0.2    6416  1300 std  Ss  Sat07PM  0:02.54 -tcsh (tcsh)

saruman 13360  0.0  0.0    1116     4 std  R+  10:57PM  0:00.00 grep -w tcsh

%

Notice that the grep command itself appears in the output. To prevent this, use:

% ps axww | grep -w '[t]csh'

saruman 10329  0.0  0.2    6416  1196  p1  Ss  Sat01PM  0:00.68 -tcsh (tcsh)

saruman 11351  0.0  0.2    6416  1300 std  Ss  Sat07PM  0:02.54 -tcsh (tcsh)

%

I'll let you figure out how this works.

2.3.6 See Also

man grep
man re_format (regular expressions)

< Day Day Up >