Unix Regular Expression Cheat Sheet



In awk, regular expressions (regex) allow for dynamic and complex pattern definitions. You're not limited to searching for simple strings but also patterns within patterns. The syntax for using regular expressions to match lines in awk is: word /match/ The inverse of that is not matching a pattern: word! /match/ If you haven't already, create the sample file from our previous article. GREP cheat sheet characters — what to seek ring matches ring, springboard, ringtone, etc. Matches almost any character h.o matches hoo, h2o, h/o, etc. Use to search for these special characters.

  1. Regex Cheat Sheet Pdf
  2. Linux Regular Expression Cheat Sheet
  3. Regex Cheat Sheet

Next: Common Commands,Previous: Addresses,Up: sed Programs

3.3 Overview of Regular Expression Syntax

To know how to use sed, people should understand regularexpressions (regexp for short). A regular expressionis a pattern that is matched against asubject string from left to right. Most characters areordinary: they stand forthemselves in a pattern, and match the corresponding charactersin the subject. As a trivial example, the pattern

matches a portion of a subject string that is identical toitself. The power of regular expressions comes from theability to include alternatives and repetitions in the pattern. These are encoded in the pattern by the use of special characters,which do not stand for themselves but insteadare interpreted in some special way. Here is a brief descriptionof regular expression syntax as used in sed.

Reg expression cheat sheet
char
A single ordinary character matches itself.
*
Matches a sequence of zero or more instances of matches for thepreceding regular expression, which must be an ordinary character, aspecial character preceded by , a ., a grouped regexp(see below), or a bracket expression. As a GNU extension, apostfixed regular expression can also be followed by *; forexample, a** is equivalent to a*. POSIX1003.1-2001 says that * stands for itself when it appears atthe start of a regular expression or subexpression, but manynonGNU implementations do not support this and portablescripts should instead use * in these contexts.
+
As *, but matches one or more. It is a GNU extension.
?
As *, but only matches zero or one. It is a GNU extension.
{i}
As *, but matches exactly i sequences (i is adecimal integer; for portability, keep it between 0 and 255inclusive).
{i,j}
Matches between i and j, inclusive, sequences.
{i,}
Matches more than or equal to i sequences.
Sheet
(regexp)
Groups the inner regexp as a whole, this is used to:
  • Apply postfix operators, like (abcd)*:this will search for zero or more whole sequencesof ‘abcd’, while abcd* would searchfor ‘abc’ followed by zero or more occurrencesof ‘d’. Note that support for (abcd)* isrequired by POSIX 1003.1-2001, but many non-GNUimplementations do not support it and hence it is not universallyportable.
  • Use back references (see below).

.
Unix Regular Expression Cheat Sheet
Matches any character, including newline.
^
Matches the null string at beginning of the pattern space, i.e. whatappears after the circumflex must appear at the beginning of thepattern space.

In most scripts, pattern space is initialized to the content of eachline (see How sed works). So, it is auseful simplification to think of ^#include as matching onlylines where ‘#include’ is the first thing on line—if there arespaces before, for example, the match fails. This simplification isvalid as long as the original content of pattern space is not modified,for example with an s command.

^ acts as a special character only at the beginning of theregular expression or subexpression (that is, after ( or|). Portable scripts should avoid ^ at the beginning ofa subexpression, though, as POSIX allows implementations thattreat ^ as an ordinary character in that context.

$
It is the same as ^, but refers to end of pattern space. $ also acts as a special character only at the endof the regular expression or subexpression (that is, before )or |), and its use at the end of a subexpression is notportable.
[list]
[^list]
Matches any single character in list: for example,[aeiou] matches all vowels. A list may includesequences like char1-char2, whichmatches any character between (inclusive) char1and Unix regular expression cheat sheetchar2.

A leading ^ reverses the meaning of list, so thatit matches any single character not in list. To include] in the list, make it the first character (afterthe ^ if needed), to include - in the list,make it the first or last; to include ^ putit after the first character.

Regex Cheat Sheet Pdf

The characters $, *, ., [, and are normally not special within list. For example, [*]matches either ‘’ or ‘*’, because the is notspecial here. However, strings like [.ch.], [=a=], and[:space:] are special within list and represent collatingsymbols, equivalence classes, and character classes, respectively, and[ is therefore special within list when it is followed by., =, or :. Also, when not inPOSIXLY_CORRECT mode, special escapes like n andt are recognized within list. See Escapes.

regexp1|regexp2
Matches either regexp1 or regexp2. Useparentheses to use complex alternative regular expressions. The matching process tries each alternative in turn, fromleft to right, and the first one that succeeds is used. It is a GNU extension.
regexp1regexp2
Matches the concatenation of regexp1 and regexp2. Concatenation binds more tightly than |, ^, and$, but less tightly than the other regular expressionoperators.
digit
Matches the digit-th (...) parenthesizedsubexpression in the regular expression. This is called a backreference. Subexpressions are implicity numbered by countingoccurrences of ( left-to-right.
n
Matches the newline character.
char

Linux Regular Expression Cheat Sheet

Matches char, where char is one of $,*, ., [, , or ^. Note that the only C-likebackslash sequences that you can portably assume to beinterpreted are n and ; in particulart is not portable, and matches a ‘t’ under mostimplementations of sed, rather than a tab character.

Note that the regular expression matcher is greedy, i.e., matchesare attempted from left to right and, if two or more matches arepossible starting at the same character, it selects the longest.

Examples:

abcdef
Matches ‘abcdef’.
a*b
Matches zero or more ‘a’s followed by a single‘b’. For example, ‘b’ or ‘aaaaab’.
a?b
Matches ‘b’ or ‘ab’.
a+b+
Matches one or more ‘a’s followed by one or more‘b’s: ‘ab’ is the shortest possible match, butother examples are ‘aaaab’ or ‘abbbbb’ or‘aaaaaabbbbbbb’.
.*
.+
These two both match all the characters in a string;however, the first matches every string (including the emptystring), while the second matches only strings containingat least one character.
^main.*(.*)
This matches a string starting with ‘main’,followed by an opening and closingparenthesis. The ‘n’, ‘(’ and ‘)’ need notbe adjacent.
^#
This matches a string beginning with ‘#’.
$
This matches a string ending with a single backslash. Theregexp contains two backslashes for escaping.
$
Instead, this matches a string consisting of a single dollar sign,because it is escaped.
[a-zA-Z0-9]
In the C locale, this matches any ASCII letters or digits.
[^ tab]+
(Here tab stands for a single tab character.) This matches a string of one or morecharacters, none of which is a space or a tab. Usually this means a word.
^(.*)n1$
This matches a string consisting of two equal substrings separated bya newline.
.{9}A$
This matches nine characters followed by an ‘A’.
^.{15}A
This matches the start of a string that contains 16 characters,the last of which is an ‘A’.

Regex Cheat Sheet





Comments are closed.