1. 打印出文本中出現(xiàn)頻率最多的10個單詞踢星。
例如文本命名為nlp_2018_5_8.txt:
You are watching a film in which two men are having a fight.They hit one another hard.At the start they only fight with their fists.But soon they begin hitting one another over the heads with chairs.And so it goes on until one of the men crashes through a window and falls thirty feet to the ground below.He is dead村缸!Of course he isn't really dead. With any luck he isn't even hurt.Why? Because the men who fall out of high windows or jump from fast moving trains, who crash cars of even catch fire, are professionals.They do this for a living.These men are called stuntmen.That is to say, they perform tricks.There are two sides to their work.They actually do most of the things you see on the ......
代碼:
如果使用的是Counter模塊,可以參考Counter模塊的使用http://www.reibang.com/p/24dd4e194a97
import operator
with open('nlp_2018_5_8.txt','r') as f:
????counter = {}
????while 1:
????????# 單行讀取菠发,減少內(nèi)存壓力
????????data = f.readline()
????????if data is not None and len(data) > 0:
????????????# 首字母小寫
????????????word = map(lambda x:x.lower(),data.split())
????????????for token in word:
????????????????if counter.has_key(token):
????????????????????counter[token] +=1
????????????????else:
????????????????????counter[token] = 1
????else: break
# 直接調(diào)用sorted函數(shù)不用自己實現(xiàn)排序
sorted_x = sorted(counter.items(), key=operator.itemgetter(1), reverse=True)
print sorted_x[:10]
2. 版本發(fā)布中隊列中,有一個發(fā)布失敗士八,后面就會全部發(fā)布失敗牺堰,給定一個發(fā)布隊列,找出第一個發(fā)布失敗的版本搅裙。(已知判斷發(fā)布失敗的函數(shù)是isFailVersion())
import random
# 模擬版本隊列
def random_sample():
????sample_validate = [int(0) for i in range(random.randint(1,100))]
????sample_unvalidate = [int(1) for i in range(random.randint(1,100))]
????sample = sample_validate+sample_unvalidate
????return sample
# 模擬版本驗證函數(shù)
def isValidatingVersion(ver):
????if ver == 0:
????????return True
????elif ver == 1:
????????return False
????else:
????????return None
# 利用二分查找降低復雜度妓局,復雜度O(logN)
def binary_search(sample):
????low = 0
????high = len(sample)-1
? ? ?while low < high:
? ? ? ? # low < high 总放,(low + high)/ 2 < (high? +high?)/ 2,默認mid值 < high值
????????mid = (low+high)/2
????????if isValidatingVersion(sample[mid]):
? ? ? ? ? ? # 如果mid值是正常版本好爬,那么非正常版本一定在后面,第一個非正常版本至少在mid值下一位
????????????low = mid + 1
????????else:
? ? ? ? ? ? # 如果mid是非正常版本甥啄,那么第一個非正常版本最多是mid位
? ? ? ? ? ? high = mid
? ? # 當low == high的時候跳出循環(huán)存炮,所以第一個非正常版本是low或者high都可以
????print "version index:%d"%high
????print "version status",sample[high]
????print "version index:%d"%low
????print "version status",sample[low]
# 驗證函數(shù)復雜度O(N)
def validate(sample):
????for i in sample:
????????if i == 1:
????????????print "version index:%d"%sample.index(i)
????????????print "version status",i
????????????break
if __name__ == '__main__':
????sample = random_sample()
????binary_search(sample)
????validate(sample)