2017年11月08日09:57:27再次重新系統(tǒng)的開始學(xué)習(xí)正則表達(dá)式饰迹,希望這次可以真正的學(xué)會(huì)瘸味!
語(yǔ)法學(xué)習(xí)
1. Character classes
Character classes match a character from a specific set. There are a number of predefined(預(yù)定的) character classes and you can also define your own sets.
用法 | 匹配 |
---|---|
. |
匹配任何字符,除了換行符(any character except newline) |
\w 邑滨,\d ,\s
|
字符,數(shù)字,空白(word, digit, whitespace) |
\W ,\D ,\S
|
非字符,非數(shù)字颈娜,非空白( not word, digit, whitespace) |
[abc] |
any of a, b, or c 【中間沒有任何字符分隔】 |
[^abc] |
not a,b or c (出了個(gè)a,b,c其余的) |
[a-g] |
character between a & g |
\b |
匹配單詞(word)邊界位置,如空格官辽,標(biāo)點(diǎn)符號(hào)或字符串的開始/結(jié)尾。這匹配一個(gè)位置同仆,而不是一個(gè)字符萤捆。(Matches a word boundary position such as whitespace, punctuation, or the start/end of the string. This matches a position, not a character.) |
\B |
匹配不是單詞邊界的任意位置,這匹配的是一個(gè)位置俗批,不是一個(gè)字符俗或。 |
\w |
匹配任何單詞字符(字母數(shù)字和下劃線)。只匹配low-ascii字符(沒有重音或非羅馬字符)岁忘。相當(dāng)于[A-Za-z0-9_](Matches any word character (alphanumeric & underscore). Only matches low-ascii characters (no accented or non-roman characters). Equivalent to [A-Za-z0-9_]) |
\W |
Matches any character that is not a word character (alphanumeric & underscore). Equivalent to [^A-Za-z0-9_] |
\d |
Matches any digit character (0-9). Equivalent to [0-9]. |
\D |
Matches any character that is not a digit character (0-9). Equivalent to [^0-9]. |
\s |
Matches any whitespace character (spaces, tabs, line breaks(換行符)). |
\S |
Matches any character that is not a whitespace character (spaces, tabs, line breaks). |
character set(字符集)[aeiou]
|
Match any character in the set. |
negated set(非字符集)[^aeiou]
|
Match any character that is not in the set. |
range: [a-z]
|
Matches a character having a character code between the two specified characters inclusive.(在兩個(gè)指定的字符之間匹配具有字符代碼的字符) |
2. Anchors (錨)
Anchors are unique in that they match a position within a string, not a character.
是獨(dú)一無(wú)二的,它們匹配的位置在一個(gè)字符串,而不是一個(gè)字符辛慰。
用法 | 匹配 |
---|---|
begining : ^
|
Matches the beginning of the string, or the beginning of a line if the multiline flag (m) is enabled. This matches a position, not a character. e.g.: /^\w+/gm -- He is a boy. |
end: $
|
Matches the end of the string, or the end of a line if the multiline flag (m) is enabled. This matches a position, not a character. e.g.:/\w+$/g -- I am a girl_
|
word boundary(邊界)\b
|
Matches a word boundary position such as whitespace(空白), punctuation(標(biāo)點(diǎn)符號(hào)), or the start/end of the string. This matches a position, not a character. e.g.:/m\b/gm -- me em ;/\bm/gm -- me em ; |
not word boundary \B
|
Matches any position that is not a word boundary. This matches a position, not a character. e.g.:/\w\B/gm --I am a girl_ me em |
3. Escapaed characters(轉(zhuǎn)義字符)
Some characters have special meaning in regular expressions and must be escaped(避免). All escaped characters begin with the \ character.
Within a character set, only , -, and ] need to be escaped.
用法 | 匹配 |
---|---|
octal escape:\251
|
Octal(八進(jìn)制) escaped character in the form \000. Value must be less than 255 (\377).(八進(jìn)制需要轉(zhuǎn)義的字符是從\000 到\377 (十進(jìn)制的255))e.g.: /\255/g -- RegExr is ?2017.(其中?是被匹配的) |
Hexadecimal(16進(jìn)制) escape:\xA9
|
Hexadecimal escaped character in the form \xFF. e.g.: /\xA9/g -- RegExr is ?2017.(其中?是被匹配的) |
Unicode escape:\u00A9
|
Unicode escaped character in the form \uFFFF. e.g.: /\u00A9/g -- RegExr is ?2017.(其中?是被匹配的) |
control character escape | Escaped control character in the form \cZ. This can range from \cA (NULL, char code 0) to \cZ (EM, char code 25). e.g.: \cI matches TAB (char code 9).
|
tab: \t
|
Matches a TAB character (char code 9). |
line feed : \n
|
Matches a LINE FEED character (char code 10 - 換行鍵). |
vertical tab: \v
|
Matches a VERTICAL TAB character (char code 11 - 垂直制表符). |
form feed:\f
|
Matches a FORM FEED character (char code 12-換行鍵). |
carriage return :\r
|
Matches a CARRIAGE RETURN character (char code 13 - 回車鍵). |
null : \0
|
Matches a NULL character (char code 0 - 空字符). |
點(diǎn).: \.
|
Matches a "." character (char code 46). |
斜杠 \ : \\
|
Matches a "" character (char code 92). |
+ : \+
|
Matches a "+" character (char code 43). |
* : \*
|
Matches a "*" character (char code 42). |
?:\?
|
Matches a "?" character (char code 63). |
^: \^
|
Matches a "^" character (char code 94). |
$: \$
|
Matches a "$" character (char code 36). |
[: \[
|
Matches a "[" character (char code 91). |
]: \]
|
Matches a "]" character (char code 93). |
{:\{
|
Matches a "{" character (char code 123). |
}: \}
|
Matches a "}" character (char code 125). |
(: \(
|
Matches a "(" character (char code 40). |
): \)
|
Matches a ")" character (char code 41). |
: \|
|
Matches a "|" character (char code 124). |
/: \/
|
Matches a "/" character (char code 47). |
4. Groups & Lookaround(組和查看)
組允許您將一系列令牌組合在一起操作干像。捕獲組可以通過反向引用來引用帅腌,并在結(jié)果中單獨(dú)訪問。
(Groups allow you to combine a sequence of tokens to operate on them together. Capture groups can be referenced by a backreference and accessed separately in the results.)
Lookaround讓你匹配一個(gè)組麻汰,而不會(huì)在結(jié)果中包含它速客。
(Lookaround lets you match a group without including it in the result.)
用法 | 匹配 |
---|---|
capturing group(捕獲組): (ABC)
|
將多個(gè)標(biāo)記組合在一起,并創(chuàng)建一個(gè)提取子字符串或使用反向引用的捕獲組什乙。(Groups multiple tokens together and creates a capture group for extracting a substring or using a backreference.) (備注1) |
backreference(對(duì)捕獲組的反向引用): \1 ,\3 , \num
|
Matches the results of a previous capture group. For example \1 matches the results of the first capture group & \3 matches the third. 對(duì)前面捕獲分組的引用挽封。 e.g. : var str = 'abccab abc aba cbc dbd;' 結(jié)果:str.match(/(\w)b\1/g) -> (3) ["aba", "cbc", "dbd"] (\num代表捕獲到的第一個(gè)值) |
non-capturing group:(?:ABC)
|
Groups multiple tokens together without creating a capture group. (沒有搞明白) |
以下是斷言 | -- |
positive lookahead: (?=ABC)
|
Matches a group after the main expression without including it in the result. 向指定xxx后邊肯定會(huì)出現(xiàn)ABC已球,就用正先行斷言臣镣,表達(dá)式:(?=ABC) e.g.: \d(?=px) -- 1pt 2px 3em 4px |
nagative lookahead: (?!ABC)
|
Specifies a group that can not match after the main expression (if it matches, the result is discarded). 向指定xxx后邊肯定不會(huì)出現(xiàn)ABC辅愿,就用正先行斷言,表達(dá)式:(?!ABC) e.g.: \d(?=px) -- 1pt 2px 3em 4px |
備注1 - 捕獲組:
捕獲組就是把正則表達(dá)式中子表達(dá)式匹配的內(nèi)容忆某,保存到內(nèi)存中以數(shù)字編號(hào)或顯式命名的組里点待,方便后面引用。當(dāng)然弃舒,這種引用既可以是在正則表達(dá)式內(nèi)部癞埠,也可以是在正則表達(dá)式外部。
備注2 - 斷言:
所謂斷言苗踪,就是指明某個(gè)字符串前邊或者后邊通铲,將會(huì)出現(xiàn)滿足某種規(guī)律的字符串颅夺。
5. Quantifiers & Alternation
量詞指示前面的標(biāo)記必須匹配一定的次數(shù)吧黄。在默認(rèn)情況下拗慨,量詞是貪婪的胆描,并且會(huì)匹配盡可能多的字符昌讲。
交替行為像一個(gè)布爾OR,匹配一個(gè)或另一個(gè)序列筹裕。(Alternation acts like a boolean OR, matching one sequence or another.)
用法 | 匹配 |
---|---|
plus: +
|
Matches 1 or more of the preceding token. |
star: *
|
Matches 0 or more of the preceding token. |
quantifier: {1,3}
|
Matches the specified quantity of the previous token. {1,3} will match 1 to 3. {3} will match exactly 3. {3,} will match 3 or more. |
optional: ?
|
Matches 0 or 1 of the preceding token(上述標(biāo)記), effectively making it optional.(使其成為可選項(xiàng)) eg. : /colou?r/g -- color colour |
lazy: ?
|
Makes the preceding(前) quantifier(量詞) lazy, causing it to match as few characters as possible. By default, quantifiers are greedy(貪婪的), and will match as many characters as possible. eg. : b\w+? -- b be bee beer beers |
alternation:|
|
Acts like a boolean OR. Matches the expression before or after the | .It can operate within a group, or on a whole expression. The patterns will be tested in order.(它可以在一個(gè)小組內(nèi)或者是整個(gè)表達(dá)內(nèi)內(nèi)運(yùn)行证逻,模式將按順序進(jìn)行測(cè)試) eg. :b(a|e|i)d -- bad bud bod bed bid
|
6. Substitution (代替)
These tokens are used in a substitution string to insert different parts of the match.(這些令牌用于替換字符串插入的不同部分匹配丈咐。)
用法 | 匹配 |
---|---|
match:$&
|
Inserts the matched text. eg. 題目:'意見-rightClick-modal'.replace(/([^-]*)-/,'===$&====') 結(jié)果:"===意見-====rightClick-modal"
|
capture group: $1
|
Inserts the results of the specified capture group. For example, $3 would insert the third capture group. 詳細(xì)解釋在下:1.2 捕獲組編號(hào)規(guī)則
|
before match: $` | Inserts the portion of the source string that precedes the match.(插入匹配之前的源字符串部分) |
after match : $' | Inserts the portion of the source string that follows the match. |
escaped: $
|
Inserts a dollar sign character ($). e.g.. > test = 'abcdefg'; > test.replace(/ab(\w+?)[\D\d]*e(\w+?).*/g, '$2$') < "f$"
|
7. Flags
用法 | 匹配 |
---|---|
ignore case : i
|
Makes the whole expression case-insensitive(不區(qū)分大小寫). For example, /aBc/i would match AbC. |
global search: g
|
Retain the index of the last match, allowing subsequent(后續(xù)的) searches to start from the end of the previous match. |
multiline: m
|
When the multiline flag is enabled, beginning and end anchors (^ and $) will match the start and end of a line, instead of the start and end of the whole string.((^ and $) 將匹配行的開始和結(jié)束,而不是整個(gè)字符串的開始和結(jié)束) |
Unicode:u
|
When the unicode flag is enabled, you can use extended(擴(kuò)展) unicode escapes in the form \x{FFFFF}.It also makes other escapes stricter, causing unrecognized escapes (ex. \j) to throw an error. |
sticky:y
|
It also makes other escapes stricter, causing unrecognized escapes (ex. \j) to throw an error.(該表達(dá)式只能匹配lastIndex的位置辆影,如果設(shè)置則忽略全局(g)標(biāo)志蛙讥。 由于RegExr中的每個(gè)搜索都是離散的键菱,因此此標(biāo)志對(duì)顯示的結(jié)果沒有進(jìn)一步的影響经备。) |
重點(diǎn)詳解
1. 捕獲組
1.1 what
捕獲組就是把正則表達(dá)式中子表達(dá)式匹配的內(nèi)容侵蒙,保存到內(nèi)存中以數(shù)字編號(hào)或顯式命名的組里纷闺,方便后面引用犁功。當(dāng)然浸卦,這種引用既可以是在正則表達(dá)式內(nèi)部限嫌,也可以是在正則表達(dá)式外部怒医。
捕獲組有兩種形式稚叹,一種是普通捕獲組扒袖,另一種是命名捕獲組僚稿,通常所說的捕獲組指的是普通捕獲組蚀同。語(yǔ)法如下:
普通捕獲組:(Expression)
命名捕獲組:(?<name>Expression)
普通捕獲組在大多數(shù)支持正則表達(dá)式的語(yǔ)言或工具中都是支持的蠢络,而命名捕獲組目前只有.NET刹孔、PHP髓霞、Python等部分語(yǔ)言支持方库,據(jù)說Java會(huì)在7.0中提供對(duì)這一特性的支持纵潦。上面給出的命名捕獲組的語(yǔ)法是.NET中的語(yǔ)法垃环,另外在.NET中使(?’name’Expression)
與使用(?<name>Expression)
等價(jià)的遂庄。在PHP和Python中命名捕獲組語(yǔ)法為:(?P<name>Expression)
只磷。
另外需要說明的一點(diǎn)是泌绣,除(Expression)
和(?<name>Expression)
語(yǔ)法外阿迈,其它的(?...)
語(yǔ)法都不是捕獲組。
1.2 捕獲組編號(hào)規(guī)則:$1\$2\$3...$n
編號(hào)規(guī)則指的是以數(shù)字為捕獲組進(jìn)行編號(hào)的規(guī)則炭晒,在普通捕獲組或命名捕獲組單獨(dú)出現(xiàn)的正則表達(dá)式中网严,編號(hào)規(guī)則比較清晰震束,在普通捕獲組與命名捕獲組混合出現(xiàn)的正則表達(dá)式中垢村,捕獲組的編號(hào)規(guī)則稍顯復(fù)雜嘉栓。
在展開討論之前侵佃,需要說明的是趣钱,編號(hào)為0的捕獲組首有,指的是正則表達(dá)式整體井联,這一規(guī)則在支持捕獲組的語(yǔ)言中烙常,基本上都是適用的蚕脏。下面對(duì)其它編號(hào)規(guī)則逐一展開討論侦锯。
通俗的解釋:
$1,$2...是表示的小括號(hào)里的內(nèi)容 尺碰,$1是第一個(gè)小括號(hào)里的 ,$2是第2個(gè)小括號(hào)里的
e.g.var test = 'abcde acerfade sdfjawide hello dfwaf';
test.replace(/a(\w+?)[\D\d]he(\w+?)[\D\D]/g, '$1-$2')
結(jié)果: "b-l"
第一個(gè)分組匹配出來的是“b”,第二個(gè)分組匹配出來的是"l",所以將$1-$2
的結(jié)果就是b-l
。
而[\D\d]是匹配任意字符
1.2.1普通捕獲組
如果沒有顯式為捕獲組命名固耘,即沒有使用命名捕獲組厅目,那么需要按數(shù)字順序來訪問所有捕獲組损敷。在只有普通捕獲組的情況下,捕獲組的編號(hào)是按照“(”出現(xiàn)的順序墓塌,從左到右苫幢,從1開始進(jìn)行編號(hào)的韩肝。
e.g.:正則表達(dá)式:(\d{4})-(\d{2}-(\d\d))
上面的正則表達(dá)式可以用來匹配格式為yyyy-MM-dd的日期,為了在下表中得以區(qū)分剩蟀,月和日分別采用了\d{2}和\d\d這兩種寫法育特。
編號(hào) | 命名 | 捕獲組 | 匹配內(nèi)容 |
---|---|---|---|
0 | (\d{4})-(\d{2}-(\d\d)) | 2008-12-31 | |
1 | (\d{4}) | 2008 | |
2 | (\d{2}-(\d\d)) | 12-31 | |
3 | (\d\d) | 31 |
用以上正則表達(dá)式匹配字符串:2008-12-31,匹配結(jié)果為:
編號(hào) | 命名 | 捕獲組 | 匹配內(nèi)容 |
---|---|---|---|
0 | (\d{4})-(\d{2}-(\d\d)) | 2008-12-31 | |
1 | (\d{4}) | 2008 | |
2 | (\d{2}-(\d\d)) | 12-31 | |
3 | (\d\d) | 31 |
e.g.: 瀏覽器中執(zhí)行的結(jié)果:
> var dateStr = '2008-12-31';
> datestr.match(/(\d{4})-(\d{2}-(\d\d))/g);
< ["2008-12-31"]
> datestr.match(/(\d{4})-(\d{2}-(\d\d))/)
< (4) ["2008-12-31", "2008", "12-31", "31", index: 0, input: "2008-12-31"]
為什么會(huì)有這兩種不同的結(jié)果呢棉浸?
這是JavaScript中
match()
方法的特性涮拗。match方法的返回值存放匹配結(jié)果的數(shù)組。該數(shù)組的內(nèi)容依賴于 regexp 是否具有全局標(biāo)志 g鼓择。
- 如果 regexp 沒有標(biāo)志 g呐能,那么 match() 方法就只能在 stringObject 中執(zhí)行一次匹配摆出。如果沒有找到任何匹配的文本偎漫, match() 將返回 null象踊。否則,它將返回一個(gè)數(shù)組史隆,其中存放了與它找到的匹配文本有關(guān)的信息泌射。該數(shù)組的第 0 個(gè)元素存放的是匹配文本魄幕,而其余的元素存放的是與正則表達(dá)式的子表達(dá)式匹配的文本纯陨。除了這些常規(guī)的數(shù)組元素之外,返回的數(shù)組還含有兩個(gè)對(duì)象屬性阴颖。index 屬性聲明的是匹配文本的起始字符在 stringObject 中的位置量愧,input 屬性聲明的是對(duì) stringObject 的引用
- 如果 regexp 具有標(biāo)志 g偎肃,則 match() 方法將執(zhí)行全局檢索滞详,找到 stringObject 中的所有匹配子字符串。若沒有找到任何匹配的子串紊馏,則返回 null料饥。如果找到了一個(gè)或多個(gè)匹配子串,則返回一個(gè)數(shù)組朱监。不過全局匹配返回的數(shù)組的內(nèi)容與前者大不相同岸啡,它的數(shù)組元素中存放的是 stringObject 中所有的匹配子串,而且也沒有 index 屬性或 input 屬性赫编。
注意:在全局檢索模式下巡蘸,match() 即不提供與子表達(dá)式匹配的文本的信息,也不聲明每個(gè)匹配子串的位置沛慢。如果您需要這些全局檢索的信息赡若,可以使用 RegExp.exec()。
由于普通捕獲組編號(hào)順序從0開始,那么可以使用$1,$2,$..
來進(jìn)行表示。e.g:
> datestr.replace(/(\d{4})-(\d{2}-(\d\d))/, "$1年$2號(hào)");
< "2008年12-31號(hào)"
寫在后面
GitHub上集大家之力搞了一個(gè)前端面試題的項(xiàng)目,里面都是大家面試時(shí)所遇到的題以及一些學(xué)習(xí)資料,有興趣的話可以關(guān)注一下衰琐。如果你也有興趣加入我們的話辛辨,請(qǐng)?jiān)陧?xiàng)目中留言僻焚。項(xiàng)目同時(shí)也可以在gitbook上查看架馋。