sed, awk, etc.)\d{1,3}: The {1,3} references back to the \dsed\d{1,3} would fail, since the {1,3} references the \d before it+, ?, |){ } and ( ) metacharacters must be written as \{ \} and \( \)[:digit:] “bracket expressions” versus \d character classesgrep:
Match IP addresses in access-log file with PCRE (-P):
$ grep -P ‘...’ access-log
Match IP address in access-log file with ERE (-E):
$ grep -E ‘[[:digit:]]{1,3}.[[:digit:]]{1,3}.[[:digit:]]{1,3}.[[:digit:]]{1,3}’ access-log
\w: Match “word” characters (A-Z, a-z, 0-9)\W: Match non-word characters (anything that is not A-Z, a-z, 0-9)[a-z]: Match any letter between a and z, lowercase only[C-K]: Match any letter between C and K, uppercase only[ACK]: Match either A, C, or K\d: Match any digit, 0-9\D: Match anything that is not a digit (0-9)[0-9]: Match any digit between 0 and 9[4-7]: Match any digit between '4 and 7 (4, 5, 6, 7)[347]: Match either 3, 4, or 7\s: Match any whitespace characters (tab, space, newline, carriage return)\S: Match any non-whitespare characters\t: Match any tabs\n: Match any newlines\r: Match any carriage returns^: Match start of line$: Match end of line^\s+$: Match blank line\b: Does not match anything, but marks a boundary; at the start of an expression it ensures the previous token is not a word character (\w), while at the end of an expression, it ensures the next token is not a word character\B: Does not match anything, but marks a boundary; at the end of an expression it ensures the previous token is not a non-word character (\W), while at the end of an expression, it ensures the next token is not a non-word character\< ... \>: Some older programs (such as Vim) use \< and \> to mark boundaries; these work as boundaries against both word and non-word characters (\w and \W)|: “Or”; match one of the provided subexpressions; there can be more than two subexpressions
(NJ|PA): Match NJ or PA?: Make preceding token optional; can be an individual character or subexpression contained in a group+: Repeat preceding token or subexpression one or more times*: Repeat preceding token or subexpression zero or more times.: Wildcard; match any single character+, *) alongside the . metacharacter, the regex engine will try to match as much as possible
? after the quantifier to force the engine to match as few matches as possible
[...]: Anything encased in square brackets is a character class
[a-zA-Z0-9][^...]: Negated character class; anything encased in square brackets with a caret (^) cannot be matched[A-Z-[N]] (matched A-M, O-Z; does not match N)( ... ): Create a capturing group
\#: Reference a capturing group, where # is the number of the group(H2) ... \1, (H2) matches H2, while \1 also matches H2(?P<ID> ... )(?P=ID)(?<tag> ... )(?'tag' ... )\k<tag>\k'tag'(?P<ID> ... )(?<tag> ... )(?'tag' ... )(?P=ID)\k<tag>\k'tag'\k{tag}\g{tag}(?: ... ): Create a non-capturing group(?= ... ): A positive lookahead; the regex engine ensures this match exists as a boundary following the expression but does not capturing it(?! ... ): A negative lookahead; the regex engine ensures this match does not exist as a boundary following the expression(?<= ... ): A positive lookbehind; the regex engine ensures this match exists as a boundary before the expression but does not capture it(?<! ... ): A negative lookbehind; the regex engine ensures this match does not exist as a boundary before the expression(?(if)then|else): If the expression matches the text referenced in if, match the expression in then; if it does not match the expression in else.if state, only use the name of the reference, do not call the reference as normal (i.e., do not use \g{named_ref})