正则表达式语法详解

正则表达式语法详解.

来源:http://www.onejava.com 更新日期：2008-06-22 07:20

Regular expressions allow more complex search and replace functions to be performed in a single operation.

Regular Expressions Syntax:

Symbol

Function

%

Matches the start of line - Indicates the search string must be at the beginning of a line but does not include any line terminator characters in the resulting string selected.

$

Matches the end of line - Indicates the search string must be at the end of line but does not include any line terminator characters in the resulting string selected.

?

Matches any single character except newline.

*

Matches any number of occurrences of any character except newline.

+

Matches one or more of the preceding character/expression. At least one occurrence of the character must be found. Does not match repeated newlines.

++

Matches the preceding character/expression zero or more times. Does not match repeated newlines.

^b

Matches a page break.

^p

Matches a newline (CR/LF) (paragraph) (DOS Files)

^r

Matches a newline (CR Only) (paragraph) (MAC Files)

^n

Matches a newline (LF Only) (paragraph) (UNIX Files)

^t

Matches a tab character

[ ]

Matches any single character or range in the brackets

^{A^}^{B^}

Matches expression A OR B

^

Overrides the following regular expression character

^(區)

Brackets or tags an expression to use in the replace command. A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression.

The corresponding replacement expression is ^x, for x in the range 1-9. Example: If ^(h*o^) ^(f*s^) matches "hello folks", ^2 ^1 would replace it with "folks hello".

Note - ^ refers to the character '^' NOT Control Key + value.

Examples:

m?n matches "man", "men", "min" but not "moon".

t*t matches "test", "tonight" and "tea time" (the "tea t" portion) but not "tea

time" (newline between "tea " and "time").

Te+st matches "test", "teest", "teeeest" etc. but does not match "tst".

[aeiou] matches every lowercase vowel

[,.?] matches a literal ",", "." or "?".

[0-9a-z] matches any digit, or lowercase letter

[~0-9] matches any character except a digit (~ means NOT the following)

You may search for an expression A or B as follows:

"^{John^}^{Tom^}?/SPAN>

This will search for an occurrence of John or Tom. There should be nothing between the two expressions.

You may combine A or B and C or D in the same search as follows:

"^{John^}^{Tom^} ^{Smith^}^{Jones^}"

This will search for John or Tom followed by Smith or Jones.

The table below shows the syntax for the "Unix" style regular expressions.

Regular Expressions (Unix Syntax):

Symbol

Function

\

Indicates the next character has a special meaning. "n" on it抯 own matches the character "n". "\n" matches a linefeed or newline character. See examples below (\d, \f, \n etc).

^

Matches/anchors the beginning of line.

$

Matches/anchors the end of line.

*

Matches the preceding character zero or more times.

+

Matches the preceding character one or more times. Does not match repeated newlines.

.

Matches any single character except a newline character. Does not match repeated newlines.

(expression)

Brackets or tags an expression to use in the replace command.A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression.

The corresponding replacement expression is \x, for x in the range 1-9. Example: If (h.*o) (f.*s) matches "hello folks", \2 \1 would replace it with "folks hello".

[xyz]

A character set. Matches any characters between brackets.

[^xyz]

A negative character set. Matches any characters NOT between brackets.

\d

Matches a digit character. Equivalent to [0-9].

\D

Matches a nondigit character. Equivalent to [^0-9].

\f

Matches a form-feed character.

\n

Matches a linefeed character.

\r

Matches a carriage return character.

\s

Matches any whitespace including space, tab, form-feed, etc but not newline.

\S

Matches any non-whitespace character but not newline.

\t

Matches a tab character.

\v

Matches a vertical tab character.

\w

Matches any word character including underscore.

\W

Matches any nonword character.

\p

Matches CR/LF (same as \r\n) to match a DOS line terminator

Note - ^ refers to the character '^' NOT Control Key + value.

Examples:

m.n matches "man", "men", "min" but not "moon".

Te+st matches "test", "teest", "teeeest" etc. BUT NOT "tst".

Te*st matches "test", "teest", "teeeest" etc. AND "tst".

[aeiou] matches every lowercase vowel

[,.?] matches a literal ",", "." or "?".

[0-9a-z] matches any digit, or lowercase letter

[^0-9] matches any character except a digit (^ means NOT the following)

You may search for an expression A or B as follows:

"(John|Tom)"

This will search for an occurrence of John or Tom. There should be nothing between the two expressions.

You may combine A or B and C or D in the same search as follows:

"(John|Tom) (Smith|Jones)"

This will search for John or Tom followed by Smith or Jones.

If Regular Expression is not selected for the find/replace and in the Replace field the following special characters are also valid:

Symbol

Function

^^

Matches a "^" character

^s

Is substituted with the selected (highlighted) text of the active file window.

^c

Is substituted with the contents of the clipboard.

^b

Matches a page break

^p

Matches a newline (CR/LF) (paragraph) (DOS Files)

^r

Matches a newline (CR Only) (paragraph) (MAC Files)

^n

Matches a newline (LF Only) (paragraph) (UNIX Files)

^t

Matches a tab character

Note - ^ refers to the character '^' NOT Control Key + value.

Symbol	Function
%	Matches the start of line - Indicates the search string must be at the beginning of a line but does not include any line terminator characters in the resulting string selected.
$	Matches the end of line - Indicates the search string must be at the end of line but does not include any line terminator characters in the resulting string selected.
?	Matches any single character except newline.
*	Matches any number of occurrences of any character except newline.
+	Matches one or more of the preceding character/expression. At least one occurrence of the character must be found. Does not match repeated newlines.
++	Matches the preceding character/expression zero or more times. Does not match repeated newlines.
^b	Matches a page break.
^p	Matches a newline (CR/LF) (paragraph) (DOS Files)
^r	Matches a newline (CR Only) (paragraph) (MAC Files)
^n	Matches a newline (LF Only) (paragraph) (UNIX Files)
^t	Matches a tab character
[ ]	Matches any single character or range in the brackets
^{A^}^{B^}	Matches expression A OR B
^	Overrides the following regular expression character
^(區)	Brackets or tags an expression to use in the replace command. A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression. The corresponding replacement expression is ^x, for x in the range 1-9. Example: If ^(ho^) ^(fs^) matches "hello folks", ^2 ^1 would replace it with "folks hello".

Symbol	Function
\	Indicates the next character has a special meaning. "n" on it抯 own matches the character "n". "\n" matches a linefeed or newline character. See examples below (\d, \f, \n etc).
^	Matches/anchors the beginning of line.
$	Matches/anchors the end of line.
*	Matches the preceding character zero or more times.
+	Matches the preceding character one or more times. Does not match repeated newlines.
.	Matches any single character except a newline character. Does not match repeated newlines.
(expression)	Brackets or tags an expression to use in the replace command.A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression. The corresponding replacement expression is \x, for x in the range 1-9. Example: If (h.o) (f.s) matches "hello folks", \2 \1 would replace it with "folks hello".
[xyz]	A character set. Matches any characters between brackets.
[^xyz]	A negative character set. Matches any characters NOT between brackets.
\d	Matches a digit character. Equivalent to [0-9].
\D	Matches a nondigit character. Equivalent to [^0-9].
\f	Matches a form-feed character.
\n	Matches a linefeed character.
\r	Matches a carriage return character.
\s	Matches any whitespace including space, tab, form-feed, etc but not newline.
\S	Matches any non-whitespace character but not newline.
\t	Matches a tab character.
\v	Matches a vertical tab character.
\w	Matches any word character including underscore.
\W	Matches any nonword character.
\p	Matches CR/LF (same as \r\n) to match a DOS line terminator

Symbol	Function
^^	Matches a "^" character
^s	Is substituted with the selected (highlighted) text of the active file window.
^c	Is substituted with the contents of the clipboard.
^b	Matches a page break
^p	Matches a newline (CR/LF) (paragraph) (DOS Files)
^r	Matches a newline (CR Only) (paragraph) (MAC Files)
^n	Matches a newline (LF Only) (paragraph) (UNIX Files)
^t	Matches a tab character