本章的目的是學(xué)習(xí)簡(jiǎn)單的數(shù)據(jù)處理窿侈,首先給出了一些文本數(shù)據(jù)念秧,需要將這些文本數(shù)據(jù)讀取,并轉(zhuǎn)換為列表聊闯,然后對(duì)列表中的數(shù)據(jù)進(jìn)行統(tǒng)一格式化工猜,最后進(jìn)行排序。
本章所需的數(shù)據(jù)獲取地址:獲取數(shù)據(jù)
數(shù)據(jù)處理
未優(yōu)化的代碼
# 對(duì)時(shí)間字符串進(jìn)行格式化菱蔬,統(tǒng)一形式為mins.secs
def sanitize(time_string):
if '-' in time_string:
splitter = '-'
elif ':' in time_string:
splitter = ':'
else:
return(time_string)
(mins, secs) = time_string.split(splitter)
return(mins + '.' + secs)
# 讀取文件篷帅,并將記錄時(shí)間轉(zhuǎn)換成列表
with open('james.txt') as jaf:
data = jaf.readline()
james = data.strip().split(',')
with open('julie.txt') as juf:
data = juf.readline()
julie = data.strip().split(',')
with open('mikey.txt') as mif:
data = mif.readline()
mikey = data.strip().split(',')
with open('sarah.txt') as saf:
data = saf.readline()
sarah = data.strip().split(',')
clean_james = []
clean_julie = []
clean_mikey = []
clean_sarah = []
---------臃腫的部分------------
for each_t in james:
clean_james.append(sanitize(each_t))
for each_t in julie:
clean_julie.append(sanitize(each_t))
for each_t in mikey:
clean_mikey.append(sanitize(each_t))
for each_t in sarah:
clean_sarah.append(sanitize(each_t))
print(sorted(clean_james))
print(sorted(clean_julie))
print(sorted(clean_mikey))
print(sorted(clean_sarah))
---------臃腫的部分------------
優(yōu)化的代碼
def sanitize(time_str):
if '-' in time_str:
spliter = '-'
elif ':' in time_str:
spliter = ':'
else:
return time_str
(mins, secs) = time_str.split(spliter)
return(mins + '.' + secs)
# 將讀取文件的代碼抽取成函數(shù)
def get_coach_data(filename):
try:
with open(filename) as file:
data = file.readline();
return data.strip().split(',')
except IOError as err:
print('File error:' + str(err))
return None
# 去除列表中的重復(fù)數(shù)據(jù)
def clean_data(data):
clean_data = []
for item in data:
if item not in clean_data:
clean_data.append(item)
return clean_data
james = get_coach_data('james.txt')
julie = get_coach_data('julie.txt')
mikey = get_coach_data('mikey.txt')
sarah = get_coach_data('sarah.txt')
james_format = [sanitize(data) for data in james]
julie_format = [sanitize(data) for data in julie]
mikey_format = [sanitize(data) for data in mikey]
sarah_format = [sanitize(data) for data in sarah]
clean_james = clean_data(james_format)
clean_julie = clean_data(julie_format)
clean_mikey = clean_data(mikey_format)
clean_sarah = clean_data(sarah_format)
print(sorted(clean_james)[0:3])
print(sorted(clean_julie)[0:3])
print(sorted(clean_mikey)[0:3])
print(sorted(clean_sarah)[0:3])
兩種排序方法
原地排序(In-place sorting):data.sort()
該方法會(huì)對(duì)排列數(shù)據(jù)(data)按指定的順序進(jìn)行排序,然后用排好順序的數(shù)據(jù)替換掉原有的數(shù)據(jù)拴泌,因此原有的數(shù)據(jù)順序會(huì)丟失魏身。
復(fù)制排序(Copied sorting):sorted(data)
對(duì)數(shù)據(jù)按指定的順序進(jìn)行排序,然后返回原數(shù)據(jù)的一個(gè)有序副本蚪腐。原數(shù)據(jù)依然保留箭昵,只是對(duì)副本進(jìn)行排序
>>> data = [6,3,1,2,5,4]
>>> data
[6, 3, 1, 2, 5, 4]
>>> data.sort() # 對(duì)數(shù)據(jù)進(jìn)行原地排序
>>> data
[1, 2, 3, 4, 5, 6] # 原數(shù)據(jù)的順已經(jīng)改變
>>>
>>> data = [6,3,1,2,5,4]
>>> data_sort = sorted(data) # 對(duì)數(shù)據(jù)進(jìn)行復(fù)制排序,返回一個(gè)有序副本
>>> data
[6, 3, 1, 2, 5, 4] # 原數(shù)據(jù)順序仍然存在
>>> data_sort
[1, 2, 3, 4, 5, 6]
列表推導(dǎo)式(list comprehension)
使用方法
[表達(dá)式 for 變量 in 列表] 或者 [表達(dá)式 for 變量 in 列表 if 條件]
使用示例
# 將分鐘數(shù)轉(zhuǎn)化成秒數(shù)
>>> mins = [1,2,3]
>>> secs = [m * 60 for m in mins]
>>> secs
[60, 120, 180]
# 求列表中數(shù)字的平方
>>> data = [1,2,3,4]
>>> data_square = [num * num for num in data]
>>> data_square
[1, 4, 9, 16]
# 還可以跟其他條件回季,對(duì)列表中的數(shù)據(jù)進(jìn)行篩選處理
>>> result = [num * 2 for num in data if num > 2]
>>> result
[6, 8]
# 也可以增加更多的for語(yǔ)句的部分:
>>> result = [[x,y] for x in range(2) for y in range(2)]
>>> result
[[0, 0], [0, 1], [1, 0], [1, 1]]
>>>
Set:無(wú)序家制、不可重復(fù)
初始化
1.創(chuàng)建一個(gè)空的set
distances = set()
2.為set提供一個(gè)數(shù)據(jù)列表(需要用大括號(hào)包圍)
>>> distances = {10.6,11,8,10.6,"two",7}
>>> distances
{8, 10.6, 11, 'two', 7} # 自動(dòng)過(guò)濾掉了重復(fù)的數(shù)據(jù)
3.為set指定一個(gè)現(xiàn)有的列表
>>> list = [2,2,3,5,6]
>>> distances = set(list)
>>> distances
{2, 3, 5, 6}
零碎知識(shí)點(diǎn)
list列表分片
列表分片主要用于獲取列表的一個(gè)子部分,即通過(guò)L[x:y]取得并返回列表L在偏移量x到y(tǒng)(包括x不包括y)之間的一個(gè)新列表泡一,如下所示:
>>> [1,2,3,4,5,6][2:5]
[3, 4, 5]
另外颤殴,如果偏移量留空,則第一個(gè)偏移量默認(rèn)為列表的頭部鼻忠,第二個(gè)默認(rèn)為末尾:
>>> [1,2,3,4,5,6][:]
[1, 2, 3, 4, 5, 6]
如果這樣做诅病,相當(dāng)于對(duì)原列表做一個(gè)淺拷貝。
分片實(shí)際還接收第三個(gè)參數(shù)粥烁,其代表步長(zhǎng)贤笆,默認(rèn)情況下,該值為1讨阻。下面將步長(zhǎng)改為2:
>>> [1,2,3,4,5,6][::2]
[1, 3, 5]
如果把步長(zhǎng)設(shè)為負(fù)值會(huì)有什么效果呢芥永?
>>> [1,2,3,4,5,6][::-2]
[6, 4, 2]
相當(dāng)于反轉(zhuǎn)了列表,從列表的尾部開(kāi)始遍歷钝吮。
工廠函數(shù)
工廠函數(shù)用于創(chuàng)建某種類型的新的數(shù)據(jù)項(xiàng)埋涧,例如set()
就是一個(gè)工廠函數(shù),因?yàn)樗鼤?huì)創(chuàng)建一個(gè)新的集合奇瘦。
如果覺(jué)得有用棘催,歡迎關(guān)注我的微信,有問(wèn)題可以直接交流:
![你的關(guān)注是對(duì)我最大的鼓勵(lì)耳标!](https://www.github.com/hoxis/token4md/raw/master/wechat-qcode.jpg)