暴力匹配(brute-force substring search)就是用模式的字符串去和目標(biāo)文本的字符串盆繁,一個一個字符的匹配過去掀淘。時間復(fù)雜性為 O(m×n)。
這里兩種 Python 實現(xiàn)方法油昂,思路大同小異革娄,都是沿著目標(biāo)文本做遍歷,同時有一個小循環(huán)對模式字符串做遍歷冕碟。如果模式字符串能和目標(biāo)文本的子字符串一一對應(yīng)拦惋,就返回目標(biāo)文本的這個位置的下標(biāo)。兩種方法都使用了兩個針對目標(biāo)文本字符串的指針i安寺,以及模式字符串的指針j厕妖。區(qū)別是外層指針是否隨著內(nèi)層指針一起前進(jìn)。
方法1
兩個 while 循環(huán)嵌套挑庶,參考Algorithms 4th, Robert Sedgewick, page 760言秸。在內(nèi)層指針前進(jìn)時,外層指針不動迎捺,直到內(nèi)從指針循環(huán)完以后才前進(jìn)一位举畸。
def brute_force_search(txt, pattern):
N = len(txt)
M = len(pattern)
i = 0 # pointer into the text
while i <= (N - M):
j = 0 # pointer into the patter
while j < M:
if txt[i+j] != pattern[j]:
break
j += 1
if j == M:
return i
i += 1
return -1
def main():
txt = "abcde"
pat = "cde"
result = brute_force_search(txt, pat)
print "result:", result
txt_another = "abcde"
pat_another = "123"
result_another = brute_force_search(txt_another, pat_another)
print "result_another:", result_another
if __name__ == '__main__':
main()
結(jié)果是
result: 2
result_another: -1
方法2
只有一個 while,但也是控制兩個指針凳枝。如果不匹配抄沮,目標(biāo)文本字符串的指針向前走一位,而模式字符串的指針會到下標(biāo) 0 的位置岖瑰。這種方法相當(dāng)于上一種方法的改進(jìn)型叛买,特點是兩個指針一起前進(jìn)。
參考數(shù)據(jù)結(jié)構(gòu)與算法:Python語言描述蹋订,第114頁聪全。但是這里做了修改,查找到以后立即就做了返回辅辩,而不是書上遍歷完以后才返回难礼。
def naive_search(txt, pattern):
N = len(txt)
M = len(pattern)
i = 0 # pointer into text
j = 0 # pointer into pattern
while i < N and j < M:
if txt[i] == pattern[j]:
i += 1
j += 1
if j == M:
return (i-j)
else:
i = i - j + 1
j = 0
return -1
def main():
txt = "abcde"
pat = "cde"
naive_result = naive_search(txt, pat)
print "naive_result:", naive_result
txt_another = "abcde"
pat_another = "123"
naive_result_another = naive_search(txt_another, pat_another)
print "naive_result_another:", naive_result_another
if __name__ == '__main__':
main()
結(jié)果是
naive_result: 2
naive_result_another: -1
書上的示例代碼也是一樣的效果娃圆,區(qū)別是把 if i == m:
放在循環(huán)外面。實際上蛾茉,當(dāng)匹配到字符串的時候讼呢,程序會跳出 while 循環(huán),所以結(jié)果一致谦炬。
def naive_matching(t, p):
j, i = 0, 0
n, m = len(t), len(p)
while j < n and i < m: # i==m means a matching
if t[j] == p[i]: # ok! consider next char in p
j, i = j+1, i+1
else: # no! consider next position in t
j, i = j-i+1, 0
if i == m: # find a matching, return its index
return j-i
return -1 # no matching, return special value
def main():
txt = "abcde"
pat = "cde"
naive_matching_result = naive_matching(txt, pat)
print "naive_matching_result:", naive_matching_result
txt_another = "abcde"
pat_another = "123"
naive_matching_result_another = naive_matching(txt_another, pat_another)
print "naive_matching_result_another:", naive_matching_result_another
if __name__ == '__main__':
main()