正則表達(dá)式中的量詞可以用來(lái)指明某個(gè)字符串匹配的次數(shù)驶臊。將在以下描述“貪心量詞”(Greedy)、“厭惡量詞”(reluctant)耘婚、“占有量詞”(possessive)這三種量詞罢浇。(真的不知道怎么翻譯)。乍一看量詞X?(貪心量詞)嚷闭、X??(厭惡量詞) 和X攒岛?+(占有量詞)好像作用也差不多,因?yàn)樗鼈兊钠ヅ湟?guī)則都是匹配“X” 一次或者零次胞锰,即X出現(xiàn)一次或者一次都不出現(xiàn)灾锯。其實(shí)它們有著細(xì)微的差別,在本文中最后一部分會(huì)說(shuō)明嗅榕。
讓我們用貪心量詞來(lái)創(chuàng)建三種不同的正則表達(dá)式:a?顺饮、a*、a+凌那、兼雄。看看如果用空字符串來(lái)測(cè)匹配會(huì)得到什么結(jié)果帽蝶。
先給出以下測(cè)試代碼(直接使用終端編譯運(yùn)行即可):
public class RegexTestHarness {
public static void main(String[] args){
Console console = System.console();
if (console == null) {
System.err.println("No console.");
System.exit(1);
}
while (true) {
Pattern pattern =
Pattern.compile(console.readLine("%nEnter your regex: "));
Matcher matcher =
pattern.matcher(console.readLine("Enter input string to search: "));
boolean found = false;
while (matcher.find()) {
console.format("I found the text" +
" \"%s\" starting at " +
"index %d and ending at index %d.%n",
matcher.group(),
matcher.start(),
matcher.end());
found = true;
}
if(!found){
console.format("No match found.%n");
}
}
}
}
Enter your regex: a?
Enter input string to search:
I found the text "" starting at index 0 and ending at index 0.
Enter your regex: a*
Enter input string to search:
I found the text "" starting at index 0 and ending at index 0.
Enter your regex: a+
Enter input string to search:
No match found.
零長(zhǎng)度匹配
在上面的例子中君旦,前兩個(gè)例子可以匹配成功是因?yàn)楸磉_(dá)式a?和a*允許字符串中不出現(xiàn)‘a(chǎn)’字符。你會(huì)看到開(kāi)始和結(jié)束的下標(biāo)都是0嘲碱〗鹂常空字符串""沒(méi)有長(zhǎng)度,因此這個(gè)正則在開(kāi)始位置(即下標(biāo)為0)即匹配成功麦锯。像這一類的匹配稱之為“零長(zhǎng)度匹配”恕稠。零長(zhǎng)度匹配會(huì)在以下三種情況出現(xiàn):
1.一個(gè)空字符串匹配。
2.和字符串的開(kāi)端匹配扶欣,即下標(biāo)為0的地方匹配鹅巍。(開(kāi)端即是空字符串)
3.和字符串結(jié)束的位置匹配。(結(jié)束即是空字符串)
4.任意兩個(gè)字符之間,如"bc"料祠,b和c之間即存在一個(gè)空字符串""骆捧。
用“foo”這個(gè)字符串作為例子,下標(biāo)的位置對(duì)應(yīng)關(guān)系為
即index=0和index=3的地方會(huì)匹配。
零長(zhǎng)度匹配是非常容易辨別出來(lái)揣云,因?yàn)樗麄冮_(kāi)始的位置和結(jié)束的位置是同一下標(biāo)匣椰。
然我們?cè)倏磶讉€(gè)列子,輸入一個(gè)“a”字符枫攀。
Enter your regex: a?
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.
Enter your regex: a*
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.
Enter your regex: a+
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.
以上三個(gè)量詞都能找到字符“a”,但是前兩個(gè)例子在下標(biāo)為1處匹配株茶,也就是字符的結(jié)尾處来涨。記住,匹配器查找到下標(biāo)0和1之間的“a”启盛,該程序會(huì)一直匹配到?jīng)]有匹配為止蹦掐。
接下來(lái)輸入"ababaaaab",看下會(huì)得到什么輸出技羔。輸出如下:
Enter your regex: a?
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "" starting at index 3 and ending at index 3.
I found the text "a" starting at index 4 and ending at index 5.
I found the text "a" starting at index 5 and ending at index 6.
I found the text "a" starting at index 6 and ending at index 7.
I found the text "a" starting at index 7 and ending at index 8.
I found the text "" starting at index 8 and ending at index 8.
I found the text "" starting at index 9 and ending at index 9.
Enter your regex: a*
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "" starting at index 3 and ending at index 3.
I found the text "aaaa" starting at index 4 and ending at index 8.
I found the text "" starting at index 8 and ending at index 8.
I found the text "" starting at index 9 and ending at index 9.
Enter your regex: a+
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "aaaa" starting at index 4 and ending at index 8.
讀者可以自己推敲為什么會(huì)得出以上結(jié)果。
如果要限制某個(gè)字符出現(xiàn)的次數(shù)卧抗,可以使用大括號(hào)"{}"藤滥。如:
匹配“aaa”
Enter your regex: a{3}
Enter input string to search: aa
No match found.
Enter your regex: a{3}
Enter input string to search: aaa
I found the text "aaa" starting at index 0 and ending at index 3.
Enter your regex: a{3}
Enter input string to search: aaaa
I found the text "aaa" starting at index 0 and ending at index 3.
對(duì)于第三個(gè)實(shí)例,要注意的是颗味,當(dāng)匹配了前三個(gè)a超陆,后面的匹配和前面3個(gè)a沒(méi)有任何關(guān)系,正則會(huì)繼續(xù)和“aaa”后面的內(nèi)容繼續(xù)嘗試匹配浦马。
被量詞修飾的子表達(dá)式 如:
Enter your regex: (dog){3}
Enter input string to search: dogdogdogdogdogdog
I found the text "dogdogdog" starting at index 0 and ending at index 9.
I found the text "dogdogdog" starting at index 9 and ending at index 18.
Enter your regex: dog{3}
Enter input string to search: dogdogdogdogdogdog
No match found.
對(duì)于第二個(gè)例子时呀,正則表達(dá)式匹配的內(nèi)容應(yīng)該是"do",后面緊跟3個(gè)"g",因此第二個(gè)例子無(wú)法匹配。
再看多一個(gè)例子:
Enter your regex: [abc]{3}
Enter input string to search: abccabaaaccbbbc
I found the text "abc" starting at index 0 and ending at index 3.
I found the text "cab" starting at index 3 and ending at index 6.
I found the text "aaa" starting at index 6 and ending at index 9.
I found the text "ccb" starting at index 9 and ending at index 12.
I found the text "bbc" starting at index 12 and ending at index 15.
Enter your regex: abc{3}
Enter input string to search: abccabaaaccbbbc
No match found.
貪婪模式和厭惡模式和占有模式的區(qū)別
貪婪模式之所以被稱為貪婪模式晶默,是因?yàn)樨澙纺J綍?huì)盡可能的去匹配更多的內(nèi)容谨娜,如果匹配不成功,將會(huì)進(jìn)行回溯磺陡,直至匹配成功或者不成功趴梢。
看看下面例子:
Enter your regex: .*foo // greedy quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.
Enter your regex: .*?foo // reluctant quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfoo" starting at index 0 and ending at index 4.
I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.
Enter your regex: .+foo // possessive quantifier
Enter input string to search: xfooxxxxxxfoo
No match found.
第一個(gè)例子采用貪婪模式,.部分和整個(gè)字符串"xfooxxxxxxfoo"匹配币他,接著正則中foo部分和字符串"xfooxxxxxxfoo"的剩余部分匹配坞靶,即空字串"",發(fā)現(xiàn)匹配不成功。開(kāi)始回溯, .*與"xfooxxxxxxfo"匹配蝴悉,正則中的foo部分和"xfooxxxxxxfo"剩余部分進(jìn)行匹配彰阴,即"o",發(fā)現(xiàn)不匹配拍冠,繼續(xù)回溯尿这。重復(fù)上訴過(guò)程,直到匹配成功庆杜。由于是貪婪模式射众,一旦成功,將不會(huì)繼續(xù)匹配晃财,匹配終止叨橱。
第二個(gè)例子采用的是厭惡模式(非貪婪模式),剛好和貪婪模式相反拓劝,一開(kāi)始只會(huì)和字符串開(kāi)始位置進(jìn)行匹配雏逾,此例中,即和空字符串""匹配郑临,匹配成功后,正則中的foo部分和字符串中的開(kāi)頭三個(gè)字符"xfo"匹配屑宠,發(fā)現(xiàn)匹配不成功厢洞。.*?開(kāi)始和第一個(gè)字符匹配,即"x",匹配成功,接著正則中的foo和字符串中的"foo"匹配躺翻。至此整個(gè)正則第一次匹配成功丧叽。接著繼續(xù)匹配,接下來(lái)的匹配內(nèi)容為"xxxxxxfoo",采用相同的規(guī)則繼續(xù)匹配,第二次匹配成功的字符串為"xxxxxxfoo"公你。直至整個(gè)字符串被消耗完畢才終止匹配踊淳。
第三個(gè)例子是占有模式。該模式只進(jìn)行一次匹配陕靠。不進(jìn)行回溯嘗試迂尝,在次例中,.*+與"xfooxxxxxxfoo"匹配剪芥,正則中的foo和空字符串""匹配垄开,匹配失敗。將不進(jìn)行回溯嘗試税肪。匹配結(jié)束溉躲。
以上內(nèi)容大部分是翻譯The Java? Tutorials中關(guān)于正則的教程