字符串方法
- 轉(zhuǎn)換字符
-
str.maketrans()
獲取轉(zhuǎn)換表
-
translate()
基于轉(zhuǎn)換表執(zhí)行字符串映射
-
maketrans()
第一個(gè)參數(shù)是被取代的字符温亲,第二個(gè)參數(shù)是取代的字符拄显,第三個(gè)是被映射為None
的字符
- 字符轉(zhuǎn)換例子
>>> greeting = '===== Have a great day ====='
>>> greeting.translate(str.maketrans('=', '-'))
'----- Have a great day -----'
>>> greeting = '===== Have a great day!! ====='
>>> greeting.translate(str.maketrans('=', '-', '!'))
'----- Have a great day -----'
>>> import string
>>> quote = 'SIMPLICITY IS THE ULTIMATE SOPHISTICATION'
>>> tr_table = str.maketrans(string.ascii_uppercase, string.ascii_lowercase)
>>> quote.translate(tr_table)
'simplicity is the ultimate sophistication'
>>> sentence = "Thi1s is34 a senten6ce"
>>> sentence.translate(str.maketrans('', '', string.digits))
'This is a sentence'
>>> greeting.translate(str.maketrans('', '', string.punctuation))
' Have a great day '
- 移除首/尾/兩者的字符串
- 僅移除首/尾連續(xù)的字符
- 默認(rèn)空格會被除去
- 如果指定了多個(gè)字符贴浙,它會被視為集合贾铝,并使用其中所有的組合
>>> greeting = ' Have a nice day :) '
>>> greeting.strip()
'Have a nice day :)'
>>> greeting.rstrip()
' Have a nice day :)'
>>> greeting.lstrip()
'Have a nice day :) '
>>> greeting.strip(') :')
'Have a nice day'
>>> greeting = '===== Have a great day!! ====='
>>> greeting.strip('=')
' Have a great day!! '
- 風(fēng)格化
- width參數(shù)指定了總的輸出字符串長度
>>> ' Hello World '.center(40, '*')
'************* Hello World **************'
>>> sentence = 'thIs iS a saMple StrIng'
>>> sentence.capitalize()
'This is a sample string'
>>> sentence.title()
'This Is A Sample String'
>>> sentence.lower()
'this is a sample string'
>>> sentence.upper()
'THIS IS A SAMPLE STRING'
>>> sentence.swapcase()
'THiS Is A SAmPLE sTRiNG'
>>> 'good'.islower()
True
>>> 'good'.isupper()
False
>>> '1'.isnumeric()
True
>>> 'abc1'.isnumeric()
False
>>> '1.2'.isnumeric()
False
>>> sentence = 'This is a sample string'
>>> 'is' in sentence
True
>>> 'this' in sentence
False
>>> 'This' in sentence
True
>>> 'this' in sentence.lower()
True
>>> 'is a' in sentence
True
>>> 'test' not in sentence
True
>>> sentence = 'This is a sample string'
>>> sentence.count('is')
2
>>> sentence.count('w')
0
>>> word = 'phototonic'
>>> word.count('oto')
1
>>> sentence
'This is a sample string'
>>> sentence.startswith('This')
True
>>> sentence.startswith('The')
False
>>> sentence.endswith('ing')
True
>>> sentence.endswith('ly')
False
- 基于字符序列分割字符串
- 返回列表
- 要使用正則表達(dá)式分割,使用
re.split()
>>> sentence = 'This is a sample string'
>>> sentence.split()
['This', 'is', 'a', 'sample', 'string']
>>> "oranges:5".split(':')
['oranges', '5']
>>> "oranges :: 5".split(' :: ')
['oranges', '5']
>>> "a e i o u".split(' ', maxsplit=1)
['a', 'e i o u']
>>> "a e i o u".split(' ', maxsplit=2)
['a', 'e', 'i o u']
>>> line = '{1.0 2.0 3.0}'
>>> nums = [float(s) for s in line.strip('{}').split()]
>>> nums
[1.0, 2.0, 3.0]
>>> str_list
['This', 'is', 'a', 'sample', 'string']
>>> ' '.join(str_list)
'This is a sample string'
>>> '-'.join(str_list)
'This-is-a-sample-string'
>>> c = ' :: '
>>> c.join(str_list)
'This :: is :: a :: sample :: string'
- 替換字符
- 第三個(gè)參數(shù)指定使用多少次的替換
- 變量必須顯式地重賦值
>>> phrase = '2 be or not 2 be'
>>> phrase.replace('2', 'to')
'to be or not to be'
>>> phrase
'2 be or not 2 be'
>>> phrase.replace('2', 'to', 1)
'to be or not 2 be'
>>> phrase = phrase.replace('2', 'to')
>>> phrase
'to be or not to be'
進(jìn)一步閱讀
正則表達(dá)式
元字符 |
描述 |
^ |
錨定宙项,匹配字符串行首 |
$ |
錨定翻默,匹配字符串行尾 |
. |
匹配除換行符\n之外的字符 |
| |
或操作符,用于匹配多個(gè)模式 |
() |
用于模式分組和提取 |
[] |
字符類 - 匹配多個(gè)字符中的一個(gè) |
\^ |
使用\ 匹配元字符 |
量詞 |
描述 |
* |
匹配之前的字符0或多次 |
+ |
匹配之前的字符1或多次 |
? |
匹配之前的字符0或1次 |
{n} |
匹配n次 |
{n,} |
匹配至少n次 |
{n,m} |
匹配至少n次宜咒,至多m次 |
字符類 |
描述 |
[aeiou] |
匹配任何元音 |
[^aeiou] |
^ 倒置選擇惠赫,所以這會匹配任何的輔音 |
[a-f] |
匹配abcdef中任意字符 |
\d |
匹配數(shù)字把鉴,跟[0-9]一樣 |
\D |
匹配非數(shù)字,跟 [^0-9] 或 [^\d]一樣 |
\w |
匹配字母和下劃線儿咱,跟[a-zA-Z_]一樣 |
\W |
匹配非字母和非下劃線字符庭砍,跟[^a-zA-Z_] 或 [^\w]一樣 |
\s |
匹配空格符,跟[\ \t\n\r\f\v]一樣 |
\S |
匹配非空行符混埠,跟[^\s]一樣 |
\b |
單詞邊界怠缸,單詞定義為字母序列 |
\B |
非單詞邊界 |
編譯標(biāo)記 |
描述 |
re.I |
忽略大小寫 |
re.M |
多行模式,^和$錨定符號可以處理中間行 |
re.S |
單行模式钳宪,.也會匹配\n |
re.V |
冗余模式揭北,提高可讀性和添加注釋 |
變量 |
描述 |
\1, \2, \3 等等 |
引用匹配的模式 |
\g<1>, \g<2>, \g<3> etc |
引用匹配的模式,用于區(qū)分?jǐn)?shù)字和引用 |
模式匹配和提取
- 匹配/提取字符序列
- 使用
re.search()
查看是否一個(gè)字符串包含某個(gè)模式
- 使用
re.findall()
獲得一個(gè)匹配模式列表
- 使用
re.split()
獲得一個(gè)基于模式分割字符串的列表
- 它們的語法如下
re.search(pattern, string, flags=0)
re.findall(pattern, string, flags=0)
re.split(pattern, string, maxsplit=0, flags=0)
>>> import re
>>> string = "This is a sample string"
>>> bool(re.search('is', string))
True
>>> bool(re.search('this', string))
False
>>> bool(re.search('this', string, re.I))
True
>>> bool(re.search('T', string))
True
>>> bool(re.search('is a', string))
True
>>> re.findall('i', string)
['i', 'i', 'i']
- 使用正則表達(dá)式
- 當(dāng)使用正則表達(dá)式元素時(shí)用
r''
格式
>>> string
'This is a sample string'
>>> re.findall('is', string)
['is', 'is']
>>> re.findall('\bis', string)
[]
>>> re.findall(r'\bis', string)
['is']
>>> re.findall(r'\w+', string)
['This', 'is', 'a', 'sample', 'string']
>>> re.split(r'\s+', string)
['This', 'is', 'a', 'sample', 'string']
>>> re.split(r'\d+', 'Sample123string54with908numbers')
['Sample', 'string', 'with', 'numbers']
>>> re.split(r'(\d+)', 'Sample123string54with908numbers')
['Sample', '123', 'string', '54', 'with', '908', 'numbers']
>>> quote = "So many books, so little time"
>>> re.search(r'([a-z]{2,}).*\1', quote, re.I)
<_sre.SRE_Match object; span=(0, 17), match='So many books, so'>
>>> re.search(r'([a-z])\1', quote, re.I)
<_sre.SRE_Match object; span=(9, 11), match='oo'>
>>> re.findall(r'([a-z])\1', quote, re.I)
['o', 't']
搜索和替換
語法
re.sub(pattern, repl, string, count=0, flags=0)
- 簡單替換
-
re.sub
不會改變傳入變量的值吏颖,必須顯式地指定
>>> sentence = 'This is a sample string'
>>> re.sub('sample', 'test', sentence)
'This is a test string'
>>> sentence
'This is a sample string'
>>> sentence = re.sub('sample', 'test', sentence)
>>> sentence
'This is a test string'
>>> re.sub('/', '-', '25/06/2016')
'25-06-2016'
>>> re.sub('/', '-', '25/06/2016', count=1)
'25-06/2016'
>>> greeting = '***** Have a great day *****'
>>> re.sub('\*', '=', greeting)
'===== Have a great day ====='
>>> words = 'night and day'
>>> re.sub(r'(\w+)( \w+ )(\w+)', r'\3\2\1', words)
'day and night'
>>> line = 'Can you spot the the mistakes? I i seem to not'
>>> re.sub(r'\b(\w+) \1\b', r'\1', line, flags=re.I)
'Can you spot the mistakes? I seem to not'
>>> import math
>>> numbers = '1 2 3 4 5'
>>> def fact_num(n):
... return str(math.factorial(int(n.group(1))))
...
>>> re.sub(r'(\d+)', fact_num, numbers)
'1 2 6 24 120'
>>> re.sub(r'(\d+)', lambda m: str(math.factorial(int(m.group(1)))), numbers)
'1 2 6 24 120'
編譯正則表達(dá)式
>>> swap_words = re.compile(r'(\w+)( \w+ )(\w+)')
>>> swap_words
re.compile('(\\w+)( \\w+ )(\\w+)')
>>> words = 'night and day'
>>> swap_words.search(words).group()
'night and day'
>>> swap_words.search(words).group(1)
'night'
>>> swap_words.search(words).group(2)
' and '
>>> swap_words.search(words).group(3)
'day'
>>> swap_words.search(words).group(4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: no such group
>>> bool(swap_words.search(words))
True
>>> swap_words.findall(words)
[('night', ' and ', 'day')]
>>> swap_words.sub(r'\3\2\1', words)
'day and night'
>>> swap_words.sub(r'\3\2\1', 'yin and yang')
'yang and yin'
正則表達(dá)式進(jìn)一步閱讀