sed
, awk
, etc.)\d{1,3}
: The {1,3}
references back to the \d
sed
\d{1,3}
would fail, since the {1,3}
references the \d
before it+
, ?
, |
){ }
and ( )
metacharacters must be written as \{ \}
and \( \)
[:digit:]
“bracket expressions” versus \d
character classesgrep
:
Match IP addresses in access-log
file with PCRE (-P
):
$ grep -P ‘...’ access-log
Match IP address in access-log
file with ERE (-E
):
$ grep -E ‘[[:digit:]]{1,3}.[[:digit:]]{1,3}.[[:digit:]]{1,3}.[[:digit:]]{1,3}’ access-log
\w
: Match “word” characters (A-Z, a-z, 0-9)\W
: Match non-word characters (anything that is not A-Z, a-z, 0-9)[a-z]
: Match any letter between a
and z
, lowercase only[C-K]
: Match any letter between C
and K
, uppercase only[ACK]
: Match either A
, C
, or K
\d
: Match any digit, 0-9\D
: Match anything that is not a digit (0-9)[0-9]
: Match any digit between 0
and 9
[4-7]
: Match any digit between '4
and 7
(4
, 5
, 6
, 7
)[347]
: Match either 3
, 4
, or 7
\s
: Match any whitespace characters (tab, space, newline, carriage return)\S
: Match any non-whitespare characters\t
: Match any tabs\n
: Match any newlines\r
: Match any carriage returns^
: Match start of line$
: Match end of line^\s+$
: Match blank line\b
: Does not match anything, but marks a boundary; at the start of an expression it ensures the previous token is not a word character (\w
), while at the end of an expression, it ensures the next token is not a word character\B
: Does not match anything, but marks a boundary; at the end of an expression it ensures the previous token is not a non-word character (\W
), while at the end of an expression, it ensures the next token is not a non-word character\< ... \>
: Some older programs (such as Vim) use \<
and \>
to mark boundaries; these work as boundaries against both word and non-word characters (\w
and \W
)|
: “Or”; match one of the provided subexpressions; there can be more than two subexpressions
(NJ|PA)
: Match NJ
or PA
?
: Make preceding token optional; can be an individual character or subexpression contained in a group+
: Repeat preceding token or subexpression one or more times*
: Repeat preceding token or subexpression zero or more times.
: Wildcard; match any single character+
, *
) alongside the .
metacharacter, the regex engine will try to match as much as possible
?
after the quantifier to force the engine to match as few matches as possible
[...]
: Anything encased in square brackets is a character class
[a-zA-Z0-9]
[^...]
: Negated character class; anything encased in square brackets with a caret (^
) cannot be matched[A-Z-[N]]
(matched A-M
, O-Z
; does not match N
)( ... )
: Create a capturing group
\#
: Reference a capturing group, where #
is the number of the group(H2) ... \1
, (H2)
matches H2
, while \1
also matches H2
(?P<ID> ... )
(?P=ID)
(?<tag> ... )
(?'tag' ... )
\k<tag>
\k'tag'
(?P<ID> ... )
(?<tag> ... )
(?'tag' ... )
(?P=ID)
\k<tag>
\k'tag'
\k{tag}
\g{tag}
(?: ... )
: Create a non-capturing group(?= ... )
: A positive lookahead; the regex engine ensures this match exists as a boundary following the expression but does not capturing it(?! ... )
: A negative lookahead; the regex engine ensures this match does not exist as a boundary following the expression(?<= ... )
: A positive lookbehind; the regex engine ensures this match exists as a boundary before the expression but does not capture it(?<! ... )
: A negative lookbehind; the regex engine ensures this match does not exist as a boundary before the expression(?(if)then|else)
: If the expression matches the text referenced in if
, match the expression in then
; if it does not match the expression in else
.if
state, only use the name of the reference, do not call the reference as normal (i.e., do not use \g{named_ref}
)