In awk, regular expressions (regex) allow for dynamic and complex pattern definitions. You're not limited to searching for simple strings but also patterns within patterns. The syntax for using regular expressions to match lines in awk is: word /match/ The inverse of that is not matching a pattern: word! /match/ If you haven't already, create the sample file from our previous article. GREP cheat sheet characters — what to seek ring matches ring, springboard, ringtone, etc. Matches almost any character h.o matches hoo, h2o, h/o, etc. Use to search for these special characters.
Next: Common Commands,Previous: Addresses,Up: sed Programs
3.3 Overview of Regular Expression Syntax
To know how to use sed, people should understand regularexpressions (regexp for short). A regular expressionis a pattern that is matched against asubject string from left to right. Most characters areordinary: they stand forthemselves in a pattern, and match the corresponding charactersin the subject. As a trivial example, the pattern
matches a portion of a subject string that is identical toitself. The power of regular expressions comes from theability to include alternatives and repetitions in the pattern. These are encoded in the pattern by the use of special characters,which do not stand for themselves but insteadare interpreted in some special way. Here is a brief descriptionof regular expression syntax as used in sed.
*
, a .
, a grouped regexp(see below), or a bracket expression. As a GNU extension, apostfixed regular expression can also be followed by *
; forexample, a**
is equivalent to a*
. POSIX1003.1-2001 says that *
stands for itself when it appears atthe start of a regular expression or subexpression, but manynonGNU implementations do not support this and portablescripts should instead use *
in these contexts. +
*
, but matches one or more. It is a GNU extension. ?
*
, but only matches zero or one. It is a GNU extension. {
i}
*
, but matches exactly i sequences (i is adecimal integer; for portability, keep it between 0 and 255inclusive). {
i,
j}
{
i,}
(
regexp)
- Apply postfix operators, like
(abcd)*
:this will search for zero or more whole sequencesof ‘abcd’, whileabcd*
would searchfor ‘abc’ followed by zero or more occurrencesof ‘d’. Note that support for(abcd)*
isrequired by POSIX 1003.1-2001, but many non-GNUimplementations do not support it and hence it is not universallyportable. - Use back references (see below).
.
^
In most scripts, pattern space is initialized to the content of eachline (see How sed
works). So, it is auseful simplification to think of ^#include
as matching onlylines where ‘#include’ is the first thing on line—if there arespaces before, for example, the match fails. This simplification isvalid as long as the original content of pattern space is not modified,for example with an s
command.
^
acts as a special character only at the beginning of theregular expression or subexpression (that is, after (
or|
). Portable scripts should avoid ^
at the beginning ofa subexpression, though, as POSIX allows implementations thattreat ^
as an ordinary character in that context.
$
^
, but refers to end of pattern space. $
also acts as a special character only at the endof the regular expression or subexpression (that is, before )
or |
), and its use at the end of a subexpression is notportable. [
list]
[^
list]
[aeiou]
matches all vowels. A list may includesequences like char1-
char2, whichmatches any character between (inclusive) char1and char2. A leading ^
reverses the meaning of list, so thatit matches any single character not in list. To include]
in the list, make it the first character (afterthe ^
if needed), to include -
in the list,make it the first or last; to include ^
putit after the first character.
Regex Cheat Sheet Pdf
The characters $
, *
, .
, [
, and are normally not special within list. For example,
[*]
matches either ‘’ or ‘*’, because the is notspecial here. However, strings like
[.ch.]
, [=a=]
, and[:space:]
are special within list and represent collatingsymbols, equivalence classes, and character classes, respectively, and[
is therefore special within list when it is followed by.
, =
, or :
. Also, when not inPOSIXLY_CORRECT mode, special escapes like n
andt
are recognized within list. See Escapes.
|
regexp2|
, ^
, and$
, but less tightly than the other regular expressionoperators.
digit(...)
parenthesizedsubexpression in the regular expression. This is called a backreference. Subexpressions are implicity numbered by countingoccurrences of (
left-to-right. n
charLinux Regular Expression Cheat Sheet
$
,*
, .
, [
,
, or ^
. Note that the only C-likebackslash sequences that you can portably assume to beinterpreted are n
and
; in particulart
is not portable, and matches a ‘t’ under mostimplementations of sed, rather than a tab character. Note that the regular expression matcher is greedy, i.e., matchesare attempted from left to right and, if two or more matches arepossible starting at the same character, it selects the longest.
Examples:
- ‘abcdef’
- Matches ‘abcdef’.
- ‘a*b’
- Matches zero or more ‘a’s followed by a single‘b’. For example, ‘b’ or ‘aaaaab’.
- ‘a?b’
- Matches ‘b’ or ‘ab’.
- ‘a+b+’
- Matches one or more ‘a’s followed by one or more‘b’s: ‘ab’ is the shortest possible match, butother examples are ‘aaaab’ or ‘abbbbb’ or‘aaaaaabbbbbbb’.
- ‘.*’
- ‘.+’
- These two both match all the characters in a string;however, the first matches every string (including the emptystring), while the second matches only strings containingat least one character.
- ‘^main.*(.*)’
- This matches a string starting with ‘main’,followed by an opening and closingparenthesis. The ‘n’, ‘(’ and ‘)’ need notbe adjacent.
- ‘^#’
- This matches a string beginning with ‘#’.
- ‘$’
- This matches a string ending with a single backslash. Theregexp contains two backslashes for escaping.
- ‘$’
- Instead, this matches a string consisting of a single dollar sign,because it is escaped.
- ‘[a-zA-Z0-9]’
- In the C locale, this matches any ASCII letters or digits.
- ‘[^ tab]+’
- (Here tab stands for a single tab character.) This matches a string of one or morecharacters, none of which is a space or a tab. Usually this means a word.
- ‘^(.*)n1$’
- This matches a string consisting of two equal substrings separated bya newline.
- ‘.{9}A$’
- This matches nine characters followed by an ‘A’.
- ‘^.{15}A’
- This matches the start of a string that contains 16 characters,the last of which is an ‘A’.
Comments are closed.