本文共 5388 字,大约阅读时间需要 17 分钟。
虽然apache认为是一个更完备的正则表达式处理包,但的应用也是非常广泛,大概是因为它的简单吧。下面是regexp的学习笔记。
2)RE类regexp包中非常重要的一个类,它是一个高效的、轻量级的正则式计算器/匹配器的类,RE是regular expression的缩写。正则式是能够进行复杂的字符串匹配的模板,而且当一个字符串能匹配某个模板时,你可以抽取出那些匹配的部分,这在进行文本解析时非常有用。下面讨论一下正则式的语法。
为了编译一个正则式,你需要简单地以模板为参数构造一个RE匹配器对象来完成,然后就可调用任一个RE.match方法来对一个字符串进行匹配检查,如果匹配成功/失败,则返回真/假值。例如:RE.getParen可以取回匹配的字符序列,或者匹配的字符序列的某一部分(如果模板中有相应的括号的话),以及它们的位置、长度等属性。如:
String wholeExpr = r.getParen(0); // wholeExpr will be 'aaaab'
String insideParens = r.getParen(1); // insideParens will be 'aaaa'int startWholeExpr = r.getParenStart(0); // startWholeExpr will be index 1
int endWholeExpr = r.getParenEnd(0); // endWholeExpr will be index 6int lenWholeExpr = r.getParenLength(0); // lenWholeExpr will be 5int startInside = r.getParenStart(1); // startInside will be index 1
int endInside = r.getParenEnd(1); // endInside will be index 5int lenInside = r.getParenLength(1); // lenInside will be 4RE支持正则式的后向引用,如:
3)RE支持的正则式的语法如下:
字符unicodeChar | Matches any identical unicode character |
/ | Used to quote a meta-character (like '*') |
// | Matches a single '/' character |
/0nnn | Matches a given octal character |
/xhh | Matches a given 8-bit hexadecimal character |
//uhhhh | Matches a given 16-bit hexadecimal character |
/t | Matches an ASCII tab character |
/n | Matches an ASCII newline character |
/r | Matches an ASCII return character |
/f | Matches an ASCII form feed character |
[abc] | 简单字符集 |
[a-zA-Z] | 带区间的字符集 |
[^abc] | 字符集的否定 |
[:alnum:] | Alphanumeric characters. |
[:alpha:] | Alphabetic characters. |
[:blank:] | Space and tab characters. |
[:cntrl:] | Control characters. |
[:digit:] | Numeric characters. |
[:graph:] | Characters that are printable and are also visible.(A space is printable, but not visible, while an `a' is both.) |
[:lower:] | Lower-case alphabetic characters. |
[:print:] | Printable characters (characters that are not control characters.) |
[:punct:] | Punctuation characters (characters that are not letter,digits, control characters, or space characters). |
[:space:] | Space characters (such as space, tab, and formfeed, to name a few). |
[:upper:] | Upper-case alphabetic characters. |
[:xdigit:] | Characters that are hexadecimal digits. |
[:javastart:] | Start of a Java identifier |
[:javapart:] | Part of a Java identifier |
. | Matches any character other than newline |
/w | Matches a "word" character (alphanumeric plus "_") |
/W | Matches a non-word character |
/s | Matches a whitespace character |
/S | Matches a non-whitespace character |
/d | Matches a digit character |
/D | Matches a non-digit character |
^ | Matches only at the beginning of a line |
$ | Matches only at the end of a line |
/b | Matches only at a word boundary |
/B | Matches only at a non-word boundary |
A* | Matches A 0 or more times (greedy) |
A+ | Matches A 1 or more times (greedy) |
A? | Matches A 1 or 0 times (greedy) |
A{n} | Matches A exactly n times (greedy) |
A{n,} | Matches A at least n times (greedy) |
A*? | Matches A 0 or more times (reluctant) |
A+? | Matches A 1 or more times (reluctant) |
A?? | Matches A 0 or 1 times (reluctant) |
AB | Matches A followed by B |
A|B | Matches either A or B |
(A) | Used for subexpression grouping |
(?:A) | Used for subexpression clustering (just like grouping but no backrefs) |
/1 | Backreference to 1st parenthesized subexpression |
/2 | Backreference to 2nd parenthesized subexpression |
/3 | Backreference to 3rd parenthesized subexpression |
/4 | Backreference to 4th parenthesized subexpression |
/5 | Backreference to 5th parenthesized subexpression |
/6 | Backreference to 6th parenthesized subexpression |
/7 | Backreference to 7th parenthesized subexpression |
/8 | Backreference to 8th parenthesized subexpression |
/9 | Backreference to 9th parenthesized subexpression |
RE运行的程序先经过RECompiler类的编译. 由于效率的原因,RE匹配器没有包括正则式的编译类. 实际上,如果要预编译1个或多个正则式,可以通过命令行运行'recompile'类,如
通过利用预编译的req来构建RE匹配器对象,可以避免运行时进行编译的成本。 如果需要动态的构造正则式,则可以创建单独一个RECompiler对象,并利用它来编译每个正则式。注意,RE 和 RECompiler 都不是threadsafe的(出于效率的原因), 因此当多线程运行时,你需要为每个线程分别创建编译器和匹配器。
2、The Jakarta Site – CVS Repository
http://jakarta.apache.org/site/cvsindex.html转载地址:http://jhskb.baihongyu.com/