正则表达式语法详解.
来源:http://www.onejava.com 更新日期:2008-06-22 07:20
Regular expressions allow more complex search and replace functions to be performed in a single operation.
Regular Expressions Syntax:
Symbol |
Function |
% |
Matches the start of line - Indicates the search string must be at the beginning of a line but does not include any line terminator characters in the resulting string selected. |
$ |
Matches the end of line - Indicates the search string must be at the end of line but does not include any line terminator characters in the resulting string selected. |
? |
Matches any single character except newline. |
* |
Matches any number of occurrences of any character except newline. |
+ |
Matches one or more of the preceding character/expression. At least one occurrence of the character must be found. Does not match repeated newlines. |
++ |
Matches the preceding character/expression zero or more times. Does not match repeated newlines. |
^b |
Matches a page break. |
^p |
Matches a newline (CR/LF) (paragraph) (DOS Files) |
^r |
Matches a newline (CR Only) (paragraph) (MAC Files) |
^n |
Matches a newline (LF Only) (paragraph) (UNIX Files) |
^t |
Matches a tab character |
[ ] |
Matches any single character or range in the brackets |
^{A^}^{B^} |
Matches expression A OR B |
^ |
Overrides the following regular expression character |
^(區) |
Brackets or tags an expression to use in the replace command. A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression. |
Note - ^ refers to the character '^' NOT Control Key + value.
Examples:
m?n matches "man", "men", "min" but not "moon".
t*t matches "test", "tonight" and "tea time" (the "tea t" portion) but not "tea
time" (newline between "tea " and "time").
Te+st matches "test", "teest", "teeeest" etc. but does not match "tst".
[aeiou] matches every lowercase vowel
[,.?] matches a literal ",", "." or "?".
[0-9a-z] matches any digit, or lowercase letter
[~0-9] matches any character except a digit (~ means NOT the following)
You may search for an expression A or B as follows:
"^{John^}^{Tom^}?/SPAN>
This will search for an occurrence of John or Tom. There should be nothing between the two expressions.
You may combine A or B and C or D in the same search as follows:
"^{John^}^{Tom^} ^{Smith^}^{Jones^}"
This will search for John or Tom followed by Smith or Jones.
The table below shows the syntax for the "Unix" style regular expressions.
Regular Expressions (Unix Syntax):
Symbol |
Function |
\ |
Indicates the next character has a special meaning. "n" on it抯 own matches the character "n". "\n" matches a linefeed or newline character. See examples below (\d, \f, \n etc). |
^ |
Matches/anchors the beginning of line. |
$ |
Matches/anchors the end of line. |
* |
Matches the preceding character zero or more times. |
+ |
Matches the preceding character one or more times. Does not match repeated newlines. |
. |
Matches any single character except a newline character. Does not match repeated newlines. |
(expression) |
Brackets or tags an expression to use in the replace command.A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression. |
[xyz] |
A character set. Matches any characters between brackets. |
[^xyz] |
A negative character set. Matches any characters NOT between brackets. |
\d |
Matches a digit character. Equivalent to [0-9]. |
\D |
Matches a nondigit character. Equivalent to [^0-9]. |
\f |
Matches a form-feed character. |
\n |
Matches a linefeed character. |
\r |
Matches a carriage return character. |
\s |
Matches any whitespace including space, tab, form-feed, etc but not newline. |
\S |
Matches any non-whitespace character but not newline. |
\t |
Matches a tab character. |
\v |
Matches a vertical tab character. |
\w |
Matches any word character including underscore. |
\W |
Matches any nonword character. |
\p |
Matches CR/LF (same as \r\n) to match a DOS line terminator |
Note - ^ refers to the character '^' NOT Control Key + value.
Examples:
m.n matches "man", "men", "min" but not "moon".
Te+st matches "test", "teest", "teeeest" etc. BUT NOT "tst".
Te*st matches "test", "teest", "teeeest" etc. AND "tst".
[aeiou] matches every lowercase vowel
[,.?] matches a literal ",", "." or "?".
[0-9a-z] matches any digit, or lowercase letter
[^0-9] matches any character except a digit (^ means NOT the following)
You may search for an expression A or B as follows:
"(John|Tom)"
This will search for an occurrence of John or Tom. There should be nothing between the two expressions.
You may combine A or B and C or D in the same search as follows:
"(John|Tom) (Smith|Jones)"
This will search for John or Tom followed by Smith or Jones.
If Regular Expression is not selected for the find/replace and in the Replace field the following special characters are also valid:
Symbol |
Function |
^^ |
Matches a "^" character |
^s |
Is substituted with the selected (highlighted) text of the active file window. |
^c |
Is substituted with the contents of the clipboard. |
^b |
Matches a page break |
^p |
Matches a newline (CR/LF) (paragraph) (DOS Files) |
^r |
Matches a newline (CR Only) (paragraph) (MAC Files) |
^n |
Matches a newline (LF Only) (paragraph) (UNIX Files) |
^t |
Matches a tab character |
Note - ^ refers to the character '^' NOT Control Key + value.