任務(wù)要點(diǎn)
在詞表中衩辟,一些單詞重復(fù),并有重復(fù)例句赁豆。找出所有重復(fù)單詞的索引仅醇,并將重復(fù)例句合并。最后將整張?jiān)~表分割成重復(fù)值和非重復(fù)值部分魔种。
核心代碼
1析二、使用xlwt和xlrd模塊讀寫(xiě)Excel
讀取Excel的步驟在于,獲得所有sheet名字的數(shù)組节预,通過(guò)名字讀取某一個(gè)sheet的內(nèi)容叶摄,然后使用sheet.row_values()和sheet.col_values()獲取某一行或列的內(nèi)容。
initialData = ‘...’ #需要讀取的excel的路徑
workbook = xlrd.open_workbook(initialData)
sheet_names = workbook.sheet_names()
sheet = workbook.sheet_by_name(sheet_names[0])
data = sheet.col_values(4)
寫(xiě)入EXCEL的步驟在于安拟,使用xlwt.Workbook()新建一個(gè)Excel緩存蛤吓,然后使用.add_sheet()指定名字新建sheet。
book = xlwt.Workbook(encoding='utf-8', style_compression=0)
wSheet1 = book.add_sheet("noRepetition")
wSheet2 = book.add_sheet("repetition")
2糠赦、使用set(data)去除所有重復(fù)值
構(gòu)建矩陣allData柱衔,儲(chǔ)存所有單詞的序號(hào)、重復(fù)次數(shù)愉棱、單詞內(nèi)容唆铐。
data_unique = set(data)
allData = []
for item in data_unique:
id = data.index(item)
num = data.count(item)
allData.append([id,num,data[id].strip()])
3、查找所有例句
核心思想是使用.index()查找重復(fù)單詞的所有例句奔滑,.index()只能查找找到的第一個(gè)單詞的索引艾岂。根據(jù)重復(fù)單詞的重復(fù)次數(shù),把之前找到的單詞有其他內(nèi)容代替朋其,然后循環(huán)查找王浴,就能找到所有例句了。(引自:https://blog.csdn.net/qq_33094993/article/details/53584379梅猿,也叫“偷梁換柱”)
nid = id
for n in range(num-1):
data[nid] = 'quchu'
print(id, num, data[nid])
nid = data.index(word)
nwordData = sheet.row_values(nid)
wSheet2.write(c2, 1+dlen+4*n, nwordData[6])
wSheet2.write(c2, 1+dlen+4*n+1, nwordData[7])
wSheet2.write(c2, 1+dlen+4*n+2, nwordData[8])
wSheet2.write(c2, 1+dlen+4*n+3, nwordData[9])
所有代碼
import xlwt,xlrd
initialData = 'book.xlsx'
workbook = xlrd.open_workbook(initialData)
sheet_names = workbook.sheet_names()
sheet = workbook.sheet_by_name(sheet_names[0])
data = sheet.col_values(4)
print(len(data))
for i in range(len(data)):
data[i] = data[i].strip()
data_unique = set(data)
allData = []
for item in data_unique:
id = data.index(item)
num = data.count(item)
allData.append([id,num,data[id].strip()])
book = xlwt.Workbook(encoding='utf-8', style_compression=0)
wSheet1 = book.add_sheet("noRepetition")
wSheet2 = book.add_sheet("repetition")
c1 = 0
c2 = 0
for d in allData:
id = d[0]
num = d[1]
word = d[2]
wordData = sheet.row_values(int(id))
if num > 1:
wSheet2.write(c2, 0, num)
dlen = len(wordData)
for i in range(dlen):
wSheet2.write(c2, i+1, wordData[i])
nid = id
for n in range(num-1):
data[nid] = 'quchu'
print(id, num, data[nid])
nid = data.index(word)
nwordData = sheet.row_values(nid)
wSheet2.write(c2, 1+dlen+4*n, nwordData[6])
wSheet2.write(c2, 1+dlen+4*n+1, nwordData[7])
wSheet2.write(c2, 1+dlen+4*n+2, nwordData[8])
wSheet2.write(c2, 1+dlen+4*n+3, nwordData[9])
c2 = c2 + 1
else:
for i in range(len(wordData)):
wSheet1.write(c1, i, wordData[i])
c1 = c1 + 1
savePath = 'book_分離.xls'
book.save(savePath)