First of all,這個(gè)script思路借鑒這位大大的
https://blog.csdn.net/qq_45037797/article/details/94882848
首先引入一個(gè)概念,python這個(gè)語(yǔ)言里面有很多很成熟的“包”,也就是各種操作的moudle阔馋,對(duì)于某一個(gè)特定的project里面的code,我們可以調(diào)用各種各樣的包來(lái)幫助我們完成某些操作,這里不由得感嘆一句创葡,那些開(kāi)發(fā)出這些包的人真的是蠻厲害的
在調(diào)用這些interpreter之前,我們要先加載這些包绢慢,這個(gè)操作可以在Pycharm里面進(jìn)行灿渴,preference->Project: Projectname->Project interpreter->"+" sign
讀了一些各位大牛的博客之后發(fā)現(xiàn),好像python的話常用來(lái)操作excel的有這個(gè)openpyxl呐芥,xlwt逻杖,xlrd等等,大概看了看大佬們的推薦思瘟,感覺(jué)openpyxl的功能比較全面荸百,而且只有一個(gè)包就夠了
好像這個(gè)的缺點(diǎn)是處理數(shù)據(jù)需要的時(shí)間相對(duì)來(lái)說(shuō)比較長(zhǎng),不過(guò)我的數(shù)據(jù)處理量也沒(méi)有那么大滨攻,于是就選擇了openpyxl這個(gè)包來(lái)折騰
首先明確一下這個(gè)腳本的目的够话,就是讀取從server上下載的.dat文件,這個(gè)文件里面的格式大概是這樣的
對(duì)于這樣一個(gè)文件光绕,我們把它轉(zhuǎn)換成Excel的文件會(huì)更容易進(jìn)行直觀的編輯女嘲,我們想要的結(jié)果大概是這樣的
而我們有一大堆這樣的文件,分別處于以規(guī)律的數(shù)字命名的directory里(其實(shí)也就是提交job時(shí)候的文件夾)诞帐,于是我們就需要一個(gè)script來(lái)分別打開(kāi)這樣的.dat文件欣尼,再把他們寫(xiě)入到一個(gè)workbook里面,每一個(gè).dat文件獨(dú)占一個(gè)worksheet停蕉,worksheet以讀取該.dat文件的directory的名字命名愕鼓,這就是我們這個(gè)script的意義
步驟如下:
1. 讀取.dat文件
使用python里面的open打開(kāi)文件,讀取每一行的數(shù)據(jù)
def ReadTxt(file):
ls = list()
with open(file, 'r', encoding='utf-8-sig') as f:
再將讀取的數(shù)據(jù)分割成columns慧起,因?yàn)閿?shù)據(jù)一共7個(gè)cols菇晃,所以這里分割成了7個(gè),這個(gè)寫(xiě)得有點(diǎn)愚蠢蚓挤,大大們不要噴我哈磺送,以后再?lài)L試改成更精簡(jiǎn)的方法
for line in f.readlines():
# To check if the given data can be split
try:
lists = line.split(None, 7)
# Split the list 'line' into 7 different parts, according to if it is an alpha
co1 = lists[0]
co2 = lists[1]
co3 = lists[2]
co4 = lists[3]
co5 = lists[4]
co6 = lists[5]
co7 = lists[6]
# split the given data into cols
ls.append((float(co1), float(co2), float(co3), float(co4), float(co5), float(co6), float(co7)))
# Put remainders in the list
except:
print('Wrong format!')
這樣我們就得到了一個(gè)存著被分割好的數(shù)據(jù)的list(這個(gè)list感覺(jué)就像c里面的一個(gè)2d array驻子,不知道我理解得有沒(méi)有問(wèn)題),然后我們把這個(gè)list寫(xiě)進(jìn).xlsx文件就可以了
2. 寫(xiě)入.xlsx文件
a. 首先用pip3安裝openpyxl
openpyxl這個(gè)Moudle似乎比較特殊估灿,要先安裝jdcal這個(gè)moudle崇呵,否則是無(wú)法安裝成功的?這里有大大懂這個(gè)的話希望可以幫忙解釋一下
pip3 install jdcal
提示成功之后再如法炮制安裝openpyxl
pip3 install openpyxl
或者直接在pycharm里面添加甲捏,這里不贅述了演熟,詳情參見(jiàn)這位大大的博客:https://blog.csdn.net/hpwzjz/article/details/82859711
b. 安裝完成之后可以開(kāi)始寫(xiě)調(diào)用這個(gè)包的code了
這里我們用到的openpyxl里面的東西主要是這個(gè)load_workbook
from openpyxl import Workbook, load_workbook
之后可以寫(xiě)出
wb = load_workbook(path)
這樣就可以打開(kāi)/創(chuàng)建一個(gè)處于給定的path的xlsx文件了,這個(gè)文件是一個(gè)workbook司顿,里面最開(kāi)始會(huì)自動(dòng)創(chuàng)建一個(gè)worksheet芒粹,名叫Sheet,這個(gè)操作就和直接新建一個(gè)空白的xlsx是一樣的
在創(chuàng)建了這個(gè)文件之后大溜,我們可以對(duì)這個(gè)文件進(jìn)行寫(xiě)入和讀出等操作化漆,首先是創(chuàng)建一個(gè)以讀取.dat文件的directory的名字命名的worksheet
sheet = wb.create_sheet(sheet_name)
然后將之前的list寫(xiě)入,保存文件就可以了钦奋,記得寫(xiě)完之后要保存座云,不然就像我們平時(shí)寫(xiě)東西但是退出的時(shí)候選擇’不保存‘一樣,白干一場(chǎng)了
index = len(value)
for i in range(index):
sheet.append(value[i])
# write the list into worksheet rank by rank
# Save the workbook in previous path
wb.save(path)
3. 整理一下并且加入一些prompts
加入一些讀取格式錯(cuò)誤的prompts付材,讓這些prompts能夠print到log文件里朦拖,以后出了問(wèn)題查起來(lái)也方便,當(dāng)然最好是不出問(wèn)題
最后的代碼就是這個(gè)樣子了
#!/usr/bin/env python
# a script to read text and transfer to xlsx file
#-*- coding:utf-8 -*-
from openpyxl import Workbook, load_workbook
# Read from Text (.dat)
def ReadTxt(file):
ls = list()
with open(file, 'r', encoding='utf-8-sig') as f:
num = 1
# A statistic num of ranks
for line in f.readlines():
# To check if the given data can be split
try:
lists = line.split(None, 7)
# Split the list 'line' into 7 different parts, according to if it is an alpha
co1 = lists[0]
co2 = lists[1]
co3 = lists[2]
co4 = lists[3]
co5 = lists[4]
co6 = lists[5]
co7 = lists[6]
# split the given data into cols
if co1 != "#":
# Chew up the first line title
ls.append((float(co1), float(co2), float(co3), float(co4), float(co5), float(co6), float(co7)))
# Put remainders in the list
num = num +1
except:
print('Wrong format in line ' + str(num) + '!')
num = num +1
# return as a list
return ls
# Write in xlsx
def Write_Excel(path, sheet_name, value):
index = len(value)
# To detect how many ranks in the list
wb = load_workbook(path)
# Open a workbook (No matter already exist or not) in a specific path, which we can specify when we call this func
sheet = wb.create_sheet(sheet_name)
# Create a new worksheet in this workbook, named as given name
#sheet.column_dimensions['B'].width = 115
# Set cell format, width, height...
for i in range(index):
sheet.append(value[i])
# write the list into worksheet rank by rank
# Save the workbook in previous path
wb.save(path)
print("Current txt " + sheet_name + " has been wrote, Tadaaaaaa!")
# Remove empty sheet
def Remove_empty(path):
wb = load_workbook(path)
ws = wb['Sheet']
wb.remove(ws)
# Remove empty sheet
wb.save(path)
print('Empty sheet has been removed successfully')
# Main func
if __name__=='__main__':
book_name_xlsx = r'/path/sum.xlsx'
# .xlsx file path where we want to generate this file
wb = Workbook()
wb.save(book_name_xlsx)
# Create and save file as given name and path
# Create a worksheet named as given word
for name in range(12,21):
# Set the range as 12 to 20, which is the range of interested interlayer distance
sheet_name_xlsx = str(name)
# Use target directory name as sheet name
art = ReadTxt(r'/path/'+sheet_name_xlsx+'/ACF.dat')
# Call previous ReadText func, the path is the parent folder of tasks
# Insert the title
art.insert(0, ('#', 'X', 'Y', 'Z', 'CHARGE', 'MIN DIST', 'ATOMIC VOL'))
Write_Excel(book_name_xlsx, sheet_name_xlsx, art)
# Call previous write func
Remove_empty(book_name_xlsx)
# Remove empty sheet
這個(gè)就是最后的半成品了厌衔,可以再加一些其他的功能璧帝,整體的框架大概就是這樣,引入了一個(gè)openpyxl的包富寿,實(shí)現(xiàn)了對(duì).xlsx文件的創(chuàng)建以及寫(xiě)入睬隶,希望能幫到苦于機(jī)械式重復(fù)操作的你
關(guān)于這個(gè)過(guò)程中我踩到的坑們的一些總結(jié):
split的方法:
https://blog.csdn.net/stenwaves/article/details/81988203
range () 函數(shù)的使用是這樣的:
range(start, stop, [step]),分別是起始页徐、終止和步長(zhǎng)苏潜,實(shí)際范圍是從start 到 stop-1
python 操作excel :
https://blog.csdn.net/qq_45037797/article/details/94882848
https://blog.csdn.net/weixin_43094965/article/details/82226263
https://blog.csdn.net/weixin_33835690/article/details/88736400
https://blog.csdn.net/bananaooo/article/details/79413742