介紹演示
正則表達(dá)式 (Regular Expression) 又稱 RegEx, 是用來匹配字符的一種工具. 在一大串字符中尋找你需要的內(nèi)容. 它常被用在很多方面, 比如網(wǎng)頁(yè)爬蟲, 文稿整理, 數(shù)據(jù)篩選等等. 最簡(jiǎn)單的一個(gè)例子, 比如我需要爬取網(wǎng)頁(yè)中每一頁(yè)的標(biāo)題. 而網(wǎng)頁(yè)中的標(biāo)題常常是這種形式.
<title>我是標(biāo)題</ title>
而且每個(gè)網(wǎng)頁(yè)的標(biāo)題各不相同, 我就能使用正則表達(dá)式, 用一種簡(jiǎn)單的匹配方法, 一次性選取出成千上萬網(wǎng)頁(yè)的標(biāo)題信息. 正則表達(dá)式絕對(duì)不是一天就能學(xué)會(huì)和記住的, 因?yàn)楸磉_(dá)式里面的內(nèi)容非常多, 強(qiáng)烈建議, 現(xiàn)在這個(gè)階段, 你只需要了解正則里都有些什么, 不用記住, 等到你真正需要用到它的時(shí)候, 再反過頭來, 好好琢磨琢磨, 那個(gè)時(shí)候才是你需要訓(xùn)練自己記住這些表達(dá)式的時(shí)候.
簡(jiǎn)單的匹配
正則表達(dá)式無非就是在做這么一回事. 在文字中找到特定的內(nèi)容, 比如下面的內(nèi)容. 我們?cè)?“dog runs to cat” 這句話中尋找是否存在 “cat” 或者 “bird”.
# matching string
pattern1 = "cat"
pattern2 = "bird"
string = "dog runs to cat"
print(pattern1 in string) # True
print(pattern2 in string) # False
但是正則表達(dá)式絕非不止這樣簡(jiǎn)單的匹配, 它還能做更加高級(jí)的內(nèi)容. 要使用正則表達(dá)式, 首先需要調(diào)用一個(gè) python 的內(nèi)置模塊 re
. 然后我們重復(fù)上面的步驟, 不過這次使用正則. 可以看出, 如果 re.search()
找到了結(jié)果, 它會(huì)返回一個(gè) match 的 object. 如果沒有匹配到, 它會(huì)返回 None. 這個(gè) re.search()
只是 re
中的一個(gè)功能, 之后會(huì)介紹其它的功能.
import re
# regular expression
pattern1 = "cat"
pattern2 = "bird"
string = "dog runs to cat"
print(re.search(pattern1, string)) # <_sre.SRE_Match object; span=(12, 15), match='cat'>
print(re.search(pattern2, string)) # None
<ins class="adsbygoogle" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-4601203457616636" data-ad-slot="3397817325" data-adsbygoogle-status="done" data-overlap-observer-io="false" style="display: block; text-align: center; height: 159px; width: 617px;"><ins id="aswift_2_expand" style="display: inline-table; border: none; height: 159px; margin: 0px; padding: 0px; position: relative; visibility: visible; width: 617px; background-color: transparent;"><ins id="aswift_2_anchor" style="display: block; border: none; height: 159px; margin: 0px; padding: 0px; position: relative; visibility: visible; width: 617px; background-color: transparent; overflow: hidden;"><iframe id="aswift_2" name="aswift_2" sandbox="allow-forms allow-pointer-lock allow-popups allow-popups-to-escape-sandbox allow-same-origin allow-scripts allow-top-navigation-by-user-activation" width="617" height="159" frameborder="0" src="https://googleads.g.doubleclick.net/pagead/ads?client=ca-pub-4601203457616636&output=html&h=159&slotname=3397817325&adk=3401585631&adf=960100261&w=636&fwrn=4&lmt=1583377291&rafmt=11&psa=0&guci=2.2.0.0.2.2.0.0&format=636x159&url=https%3A%2F%2Fmorvanzhou.github.io%2Ftutorials%2Fpython-basic%2Fbasic%2F13-10-regular-expression%2F&flash=0&wgl=1&dt=1587802197986&bpp=3&bdt=388&idt=102&shv=r20200422&cbv=r20190131&ptt=9&saldr=aa&abxe=1&prev_fmts=700x90%2C250x250&correlator=3466323590182&frm=20&pv=1&ga_vid=1212726724.1583946939&ga_sid=1587802198&ga_hid=1016610859&ga_fc=0&iag=0&icsg=9130715816&dssz=30&mdo=0&mso=0&rplot=4&u_tz=480&u_his=9&u_java=0&u_h=1080&u_w=1920&u_ah=1040&u_aw=1920&u_cd=24&u_nplug=3&u_nmime=4&adx=484&ady=2282&biw=1903&bih=937&scr_x=0&scr_y=0&eid=21065214&oid=3&pvsid=322963587975057&pem=821&ref=https%3A%2F%2Fmorvanzhou.github.io%2Ftutorials%2Fpython-basic%2Fbasic%2F&rx=0&eae=0&fc=896&brdim=0%2C0%2C0%2C0%2C1920%2C0%2C1920%2C1040%2C1920%2C937&vis=1&rsz=%7C%7CeEbr%7C&abl=CS&pfx=0&fu=8344&bc=31&ifi=3&uci=a!3&btvi=2&fsb=1&xpc=I9411BAsV6&p=https%3A//morvanzhou.github.io&dtd=106" marginwidth="0" marginheight="0" vspace="0" hspace="0" allowtransparency="true" scrolling="no" allowfullscreen="true" data-google-container-id="a!3" data-google-query-id="CMnao5CQg-kCFQQaKgodoykIYA" data-load-complete="true" style="left: 0px; position: absolute; top: 0px; border: 0px; width: 617px; height: 159px;"></iframe></ins></ins></ins>
靈活匹配
除了上面的簡(jiǎn)單匹配, 下面的內(nèi)容才是正則的核心內(nèi)容, 使用特殊的 pattern 來靈活匹配需要找的文字.
如果需要找到潛在的多個(gè)可能性文字, 我們可以使用 []
將可能的字符囊括進(jìn)來. 比如 [ab]
就說明我想要找的字符可以是 a
也可以是 b
. 這里我們還需要注意的是, 建立一個(gè)正則的規(guī)則, 我們?cè)?pattern 的 “” 前面需要加上一個(gè) r
用來表示這是正則表達(dá)式, 而不是普通字符串. 通過下面這種形式, 如果字符串中出現(xiàn) “run” 或者是 “ran”, 它都能找到.
# multiple patterns ("run" or "ran")
ptn = r"r[au]n" # start with "r" means raw string
print(re.search(ptn, "dog runs to cat")) # <_sre.SRE_Match object; span=(4, 7), match='run'>
同樣, 中括號(hào) []
中還可以是以下這些或者是這些的組合. 比如 [A-Z]
表示的就是所有大寫的英文字母. [0-9a-z]
表示可以是數(shù)字也可以是任何小寫字母.
print(re.search(r"r[A-Z]n", "dog runs to cat")) # None
print(re.search(r"r[a-z]n", "dog runs to cat")) # <_sre.SRE_Match object; span=(4, 7), match='run'>
print(re.search(r"r[0-9]n", "dog r2ns to cat")) # <_sre.SRE_Match object; span=(4, 7), match='r2n'>
print(re.search(r"r[0-9a-z]n", "dog runs to cat")) # <_sre.SRE_Match object; span=(4, 7), match='run'>
<ins class="adsbygoogle" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-4601203457616636" data-ad-slot="3397817325" data-adsbygoogle-status="done" data-overlap-observer-io="false" style="display: block; text-align: center; height: 123px; width: 636px;"><ins id="aswift_3_expand" style="display: inline-table; border: none; height: 123px; margin: 0px; padding: 0px; position: relative; visibility: visible; width: 636px; background-color: transparent;"><ins id="aswift_3_anchor" style="display: block; border: none; height: 123px; margin: 0px; padding: 0px; position: relative; visibility: visible; width: 636px; background-color: transparent; overflow: hidden;"><iframe id="aswift_3" name="aswift_3" sandbox="allow-forms allow-pointer-lock allow-popups allow-popups-to-escape-sandbox allow-same-origin allow-scripts allow-top-navigation-by-user-activation" width="636" height="123" frameborder="0" src="https://googleads.g.doubleclick.net/pagead/ads?client=ca-pub-4601203457616636&output=html&h=159&slotname=3397817325&adk=3401585631&adf=3131098319&w=636&fwrn=4&lmt=1583377291&rafmt=11&psa=0&guci=2.2.0.0.2.2.0.0&format=636x159&url=https%3A%2F%2Fmorvanzhou.github.io%2Ftutorials%2Fpython-basic%2Fbasic%2F13-10-regular-expression%2F&flash=0&wgl=1&dt=1587802197989&bpp=2&bdt=391&idt=111&shv=r20200422&cbv=r20190131&ptt=9&saldr=aa&abxe=1&prev_fmts=700x90%2C250x250%2C636x159&correlator=3466323590182&frm=20&pv=1&ga_vid=1212726724.1583946939&ga_sid=1587802198&ga_hid=1016610859&ga_fc=0&iag=0&icsg=9130715816&dssz=30&mdo=0&mso=0&rplot=4&u_tz=480&u_his=9&u_java=0&u_h=1080&u_w=1920&u_ah=1040&u_aw=1920&u_cd=24&u_nplug=3&u_nmime=4&adx=484&ady=3137&biw=1903&bih=937&scr_x=0&scr_y=0&eid=21065214&oid=3&pvsid=322963587975057&pem=821&ref=https%3A%2F%2Fmorvanzhou.github.io%2Ftutorials%2Fpython-basic%2Fbasic%2F&rx=0&eae=0&fc=896&brdim=0%2C0%2C0%2C0%2C1920%2C0%2C1920%2C1040%2C1920%2C937&vis=1&rsz=%7C%7CeEbr%7C&abl=CS&pfx=0&fu=8344&bc=31&ifi=4&uci=a!4&btvi=3&fsb=1&xpc=RnhynsXM2u&p=https%3A//morvanzhou.github.io&dtd=115" marginwidth="0" marginheight="0" vspace="0" hspace="0" allowtransparency="true" scrolling="no" allowfullscreen="true" data-google-container-id="a!4" data-google-query-id="CLy5o5CQg-kCFdgYKgodU_ACLQ" data-load-complete="true" style="left: 0px; position: absolute; top: 0px; border: 0px; width: 636px; height: 123px;"></iframe></ins></ins></ins>
按類型匹配
除了自己定義規(guī)則, 還有很多匹配的規(guī)則時(shí)提前就給你定義好了的. 下面有一些特殊的匹配類型給大家先總結(jié)一下, 然后再上一些例子.
- \d : 任何數(shù)字
- \D : 不是數(shù)字
- \s : 任何 white space, 如 [\t\n\r\f\v]
- \S : 不是 white space
- \w : 任何大小寫字母, 數(shù)字和 “” [a-zA-Z0-9]
- \W : 不是 \w
- \b : 空白字符 (只在某個(gè)字的開頭或結(jié)尾)
- \B : 空白字符 (不在某個(gè)字的開頭或結(jié)尾)
- \ : 匹配 \
- . : 匹配任何字符 (除了 \n)
- ^ : 匹配開頭
- $ : 匹配結(jié)尾
- ? : 前面的字符可有可無
下面就是具體的舉例說明啦.
# \d : decimal digit
print(re.search(r"r\dn", "run r4n")) # <_sre.SRE_Match object; span=(4, 7), match='r4n'>
# \D : any non-decimal digit
print(re.search(r"r\Dn", "run r4n")) # <_sre.SRE_Match object; span=(0, 3), match='run'>
# \s : any white space [\t\n\r\f\v]
print(re.search(r"r\sn", "r\nn r4n")) # <_sre.SRE_Match object; span=(0, 3), match='r\nn'>
# \S : opposite to \s, any non-white space
print(re.search(r"r\Sn", "r\nn r4n")) # <_sre.SRE_Match object; span=(4, 7), match='r4n'>
# \w : [a-zA-Z0-9_]
print(re.search(r"r\wn", "r\nn r4n")) # <_sre.SRE_Match object; span=(4, 7), match='r4n'>
# \W : opposite to \w
print(re.search(r"r\Wn", "r\nn r4n")) # <_sre.SRE_Match object; span=(0, 3), match='r\nn'>
# \b : empty string (only at the start or end of the word)
print(re.search(r"\bruns\b", "dog runs to cat")) # <_sre.SRE_Match object; span=(4, 8), match='runs'>
# \B : empty string (but not at the start or end of a word)
print(re.search(r"\B runs \B", "dog runs to cat")) # <_sre.SRE_Match object; span=(8, 14), match=' runs '>
# \\ : match \
print(re.search(r"runs\\", "runs\ to me")) # <_sre.SRE_Match object; span=(0, 5), match='runs\\'>
# . : match anything (except \n)
print(re.search(r"r.n", "r[ns to me")) # <_sre.SRE_Match object; span=(0, 3), match='r[n'>
# ^ : match line beginning
print(re.search(r"^dog", "dog runs to cat")) # <_sre.SRE_Match object; span=(0, 3), match='dog'>
# $ : match line ending
print(re.search(r"cat$", "dog runs to cat")) # <_sre.SRE_Match object; span=(12, 15), match='cat'>
# ? : may or may not occur
print(re.search(r"Mon(day)?", "Monday")) # <_sre.SRE_Match object; span=(0, 6), match='Monday'>
print(re.search(r"Mon(day)?", "Mon")) # <_sre.SRE_Match object; span=(0, 3), match='Mon'>
如果一個(gè)字符串有很多行, 我們想使用 ^
形式來匹配行開頭的字符, 如果用通常的形式是不成功的. 比如下面的 “I” 出現(xiàn)在第二行開頭, 但是使用 r"^I"
卻匹配不到第二行, 這時(shí)候, 我們要使用 另外一個(gè)參數(shù), 讓 re.search()
可以對(duì)每一行單獨(dú)處理. 這個(gè)參數(shù)就是 flags=re.M
, 或者這樣寫也行 flags=re.MULTILINE
.
<ins data-ad-format="auto" class="adsbygoogle adsbygoogle-noablate" data-ad-client="ca-pub-4601203457616636" data-adsbygoogle-status="done" data-overlap-observer-io="false" style="display: block; margin: auto; background-color: transparent;"><ins id="aswift_8_expand" style="display: inline-table; border: none; height: 159px; margin: 0px; padding: 0px; position: relative; visibility: visible; width: 636px; background-color: transparent;"><ins id="aswift_8_anchor" style="display: block; border: none; height: 159px; margin: 0px; padding: 0px; position: relative; visibility: visible; width: 636px; background-color: transparent;"><iframe id="aswift_8" name="aswift_8" sandbox="allow-forms allow-pointer-lock allow-popups allow-popups-to-escape-sandbox allow-same-origin allow-scripts allow-top-navigation-by-user-activation" width="636" height="159" frameborder="0" src="https://googleads.g.doubleclick.net/pagead/ads?client=ca-pub-4601203457616636&output=html&h=159&adk=1744626125&adf=2663990526&w=636&lmt=1583377291&num_ads=1&rafmt=16&sem=mc&pwprc=9194589492&psa=0&guci=2.2.0.0.2.2.0.0&ad_type=text_image&format=636x159&url=https%3A%2F%2Fmorvanzhou.github.io%2Ftutorials%2Fpython-basic%2Fbasic%2F13-10-regular-expression%2F&flash=0&pra=3&wgl=1&fa=27&adsid=NT&dt=1587802198495&bpp=2&bdt=897&idt=2&shv=r20200422&cbv=r20190131&ptt=9&saldr=aa&abxe=1&prev_fmts=700x90%2C250x250%2C636x159%2C636x159%2C300x250%2C0x0&nras=2&correlator=3466323590182&frm=20&pv=1&ga_vid=1212726724.1583946939&ga_sid=1587802198&ga_hid=1016610859&ga_fc=0&iag=0&icsg=9130715816&dssz=30&mdo=0&mso=0&u_tz=480&u_his=9&u_java=0&u_h=1080&u_w=1920&u_ah=1040&u_aw=1920&u_cd=24&u_nplug=3&u_nmime=4&adx=484&ady=4507&biw=1903&bih=937&scr_x=0&scr_y=0&eid=21065214&oid=3&pvsid=322963587975057&pem=821&ref=https%3A%2F%2Fmorvanzhou.github.io%2Ftutorials%2Fpython-basic%2Fbasic%2F&rx=0&eae=0&fc=384&brdim=0%2C0%2C0%2C0%2C1920%2C0%2C1920%2C1040%2C1920%2C937&vis=1&rsz=%7C%7Cs%7C&abl=NS&fu=8216&bc=31&ifi=8&uci=a!8&btvi=4&fsb=1&xpc=ZuJ21Wg7LZ&p=https%3A//morvanzhou.github.io&dtd=5" marginwidth="0" marginheight="0" vspace="0" hspace="0" allowtransparency="true" scrolling="no" allowfullscreen="true" data-google-container-id="a!8" data-google-query-id="CK31vpCQg-kCFcEgKgodDV0EoQ" data-load-complete="true" style="left: 0px; position: absolute; top: 0px; border: 0px; width: 636px; height: 159px;"></iframe></ins></ins></ins>
string = """
dog runs to cat.
I run to dog.
"""
print(re.search(r"^I", string)) # None
print(re.search(r"^I", string, flags=re.M)) # <_sre.SRE_Match object; span=(18, 19), match='I'>
重復(fù)匹配
如果我們想讓某個(gè)規(guī)律被重復(fù)使用, 在正則里面也是可以實(shí)現(xiàn)的, 而且實(shí)現(xiàn)的方式還有很多. 具體可以分為這三種:
-
*
: 重復(fù)零次或多次 -
+
: 重復(fù)一次或多次 -
{n, m}
: 重復(fù) n 至 m 次 -
{n}
: 重復(fù) n 次
舉例如下:
# * : occur 0 or more times
print(re.search(r"ab*", "a")) # <_sre.SRE_Match object; span=(0, 1), match='a'>
print(re.search(r"ab*", "abbbbb")) # <_sre.SRE_Match object; span=(0, 6), match='abbbbb'>
# + : occur 1 or more times
print(re.search(r"ab+", "a")) # None
print(re.search(r"ab+", "abbbbb")) # <_sre.SRE_Match object; span=(0, 6), match='abbbbb'>
# {n, m} : occur n to m times
print(re.search(r"ab{2,10}", "a")) # None
print(re.search(r"ab{2,10}", "abbbbb")) # <_sre.SRE_Match object; span=(0, 6), match='abbbbb'>
分組
我們甚至可以為找到的內(nèi)容分組, 使用 ()
能輕松實(shí)現(xiàn)這件事. 通過分組, 我們能輕松定位所找到的內(nèi)容. 比如在這個(gè) (\d+)
組里, 需要找到的是一些數(shù)字, 在 (.+)
這個(gè)組里, 我們會(huì)找到 “Date: “ 后面的所有內(nèi)容. 當(dāng)使用 match.group()
時(shí), 他會(huì)返回所有組里的內(nèi)容, 而如果給 .group(2)
里加一個(gè)數(shù), 它就能定位你需要返回哪個(gè)組里的信息.
match = re.search(r"(\d+), Date: (.+)", "ID: 021523, Date: Feb/12/2017")
print(match.group()) # 021523, Date: Feb/12/2017
print(match.group(1)) # 021523
print(match.group(2)) # Date: Feb/12/2017
有時(shí)候, 組會(huì)很多, 光用數(shù)字可能比較難找到自己想要的組, 這時(shí)候, 如果有一個(gè)名字當(dāng)做索引, 會(huì)是一件很容易的事. 我們字需要在括號(hào)的開頭寫上這樣的形式 ?P<名字>
就給這個(gè)組定義了一個(gè)名字. 然后就能用這個(gè)名字找到這個(gè)組的內(nèi)容.
match = re.search(r"(?P<id>\d+), Date: (?P<date>.+)", "ID: 021523, Date: Feb/12/2017")
print(match.group('id')) # 021523
print(match.group('date')) # Date: Feb/12/2017
<ins class="adsbygoogle" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-4601203457616636" data-ad-slot="3397817325" data-adsbygoogle-status="done" data-overlap-observer-io="false" style="display: block; text-align: center; height: 123px; width: 636px;"><ins id="aswift_4_expand" style="display: inline-table; border: none; height: 123px; margin: 0px; padding: 0px; position: relative; visibility: visible; width: 636px; background-color: transparent;"><ins id="aswift_4_anchor" style="display: block; border: none; height: 123px; margin: 0px; padding: 0px; position: relative; visibility: visible; width: 636px; background-color: transparent; overflow: hidden;"><iframe id="aswift_4" name="aswift_4" sandbox="allow-forms allow-pointer-lock allow-popups allow-popups-to-escape-sandbox allow-same-origin allow-scripts allow-top-navigation-by-user-activation" width="636" height="123" frameborder="0" src="https://googleads.g.doubleclick.net/pagead/ads?client=ca-pub-4601203457616636&output=html&h=159&slotname=3397817325&adk=3401585631&adf=2619620792&w=636&fwrn=4&lmt=1583377291&rafmt=11&psa=0&guci=2.2.0.0.2.2.0.0&format=636x159&url=https%3A%2F%2Fmorvanzhou.github.io%2Ftutorials%2Fpython-basic%2Fbasic%2F13-10-regular-expression%2F&flash=0&wgl=1&adsid=NT&dt=1587802197991&bpp=2&bdt=393&idt=131&shv=r20200422&cbv=r20190131&ptt=9&saldr=aa&abxe=1&prev_fmts=700x90%2C250x250%2C636x159%2C636x159%2C300x250%2C0x0%2C636x159&nras=2&correlator=3466323590182&frm=20&pv=1&ga_vid=1212726724.1583946939&ga_sid=1587802198&ga_hid=1016610859&ga_fc=0&iag=0&icsg=43490454184&dssz=33&mdo=0&mso=0&rplot=4&u_tz=480&u_his=9&u_java=0&u_h=1080&u_w=1920&u_ah=1040&u_aw=1920&u_cd=24&u_nplug=3&u_nmime=4&adx=484&ady=6053&biw=1903&bih=937&scr_x=0&scr_y=2305&eid=21065214&oid=3&psts=AKB7eCK4lOp292lEAVSqA7mQs41q8NeRpyYvav9UdXDUuHhp_SKIMiPoxfCs2yoHVOjnLQ%2CAKB7eCK4lOp292lEAVSqA7mQs41q8NeRpyYvav9UdXDUuHhp_SKIMiPoxfCs2yoHVOjnLQ%2CAKB7eCK4lOp292lEAVSqA7mQs41q8NeRpyYvav9UdXDUuHhp_SKIMiPoxfCs2yoHVOjnLQ%2CAKB7eCK4lOp292lEAVSqA7mQs41q8NeRpyYvav9UdXDUuHhp_SKIMiPoxfCs2yoHVOjnLQ%2CAKB7eCK4lOp292lEAVSqA7mQs41q8NeRpyYvav9UdXDUuHhp_SKIMiPoxfCs2yoHVOjnLQ%2CAKB7eCK4lOp292lEAVSqA7mQs41q8NeRpyYvav9UdXDUuHhp_SKIMiPoxfCs2yoHVOjnLQ%2CAKB7eCK4lOp292lEAVSqA7mQs41q8NeRpyYvav9UdXDUuHhp_SKIMiPoxfCs2yoHVOjnLQ&pvsid=322963587975057&pem=821&ref=https%3A%2F%2Fmorvanzhou.github.io%2Ftutorials%2Fpython-basic%2Fbasic%2F&rx=0&eae=0&fc=896&brdim=0%2C0%2C0%2C0%2C1920%2C0%2C1920%2C1040%2C1920%2C937&vis=1&rsz=%7C%7CeEbr%7C&abl=CS&pfx=0&fu=8344&bc=31&ifi=5&uci=a!5&btvi=5&fsb=1&xpc=6Rn7ABKl1J&p=https%3A//morvanzhou.github.io&dtd=M" marginwidth="0" marginheight="0" vspace="0" hspace="0" allowtransparency="true" scrolling="no" allowfullscreen="true" data-google-container-id="a!5" data-google-query-id="CK-2rcSQg-kCFYrXuwgdepIJ3w" data-load-complete="true" style="left: 0px; position: absolute; top: 0px; border: 0px; width: 636px; height: 123px;"></iframe></ins></ins></ins>
findall
前面我們說的都是只找到了最開始匹配上的一項(xiàng)而已, 如果需要找到全部的匹配項(xiàng), 我們可以使用 findall
功能. 然后返回一個(gè)列表. 注意下面還有一個(gè)新的知識(shí)點(diǎn), |
是 or 的意思, 要不是前者要不是后者.
# findall
print(re.findall(r"r[ua]n", "run ran ren")) # ['run', 'ran']
# | : or
print(re.findall(r"(run|ran)", "run ran ren")) # ['run', 'ran']
replace
我們還能通過正則表達(dá)式匹配上一些形式的字符串然后再替代掉這些字符串. 使用這種匹配 re.sub()
, 將會(huì)比 python 自帶的 string.replace()
要靈活多變.
print(re.sub(r"r[au]ns", "catches", "dog runs to cat")) # dog catches to cat
split
再來我們 Python 中有個(gè)字符串的分割功能, 比如想獲取一句話中所有的單詞. 比如 "a is b".split(" ")
, 這樣它就會(huì)產(chǎn)生一個(gè)列表來保存所有單詞. 但是在正則中, 這種普通的分割也可以做的淋漓精致.
print(re.split(r"[,;\.]", "a;b,c.d;e")) # ['a', 'b', 'c', 'd', 'e']
compile
最后, 我們還能使用 compile 過后的正則, 來對(duì)這個(gè)正則重復(fù)使用. 先將正則 compile 進(jìn)一個(gè)變量, 比如 compiled_re
, 然后直接使用這個(gè) compiled_re
來搜索.
compiled_re = re.compile(r"r[ua]n")
print(compiled_re.search("dog ran to cat")) # <_sre.SRE_Match object; span=(4, 7), match='ran'>
小抄
為了大家方便記憶, 我很久以前在網(wǎng)上找到了一份小抄, 這個(gè)小抄的原出處應(yīng)該是這里. 小抄很有用, 不記得的時(shí)候回頭方便看.