official link:
https://docs.python.org/3/library/collections.html#module-collections
The following contents are extracted from python's source code:
'''This module implements specialized container datatypes providing
alternatives to Python's general purpose built-in containers, dict,
list, set, and tuple.
* namedtuple factory function for creating tuple subclasses with named fields
* deque list-like container with fast appends and pops on either end
* ChainMap dict-like class for creating a single view of multiple mappings
* Counter dict subclass for counting hashable objects
* OrderedDict dict subclass that remembers the order entries were added
* defaultdict dict subclass that calls a factory function to supply missing values
* UserDict wrapper around dictionary objects for easier dict subclassing
* UserList wrapper around list objects for easier list subclassing
* UserString wrapper around string objects for easier string subclassing
'''
namedtuple
我們知道tuple可以表示不變集合客情,例如吊档,一個(gè)點(diǎn)的二維坐標(biāo)就可以表示成:
>>> p = (1, 2)
但是七咧,看到(1, 2)偎漫,很難看出這個(gè)tuple是用來(lái)表示一個(gè)坐標(biāo)的。
這時(shí),namedtuple就派上了用場(chǎng):
>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(1, 2)
>>> p.x
>>> p.y
類(lèi)似的,如果要用坐標(biāo)和半徑表示一個(gè)圓,也可以用namedtuple定義:
namedtuple('名稱(chēng)', [屬性list]):
Circle = namedtuple('Circle', ['x', 'y', 'r'])
deque
使用list存儲(chǔ)數(shù)據(jù)時(shí)阱扬,按照索引訪問(wèn)元素很快,但是插入和刪除元素就很慢了伸辟,因?yàn)閘ist是現(xiàn)行存儲(chǔ)麻惶,數(shù)據(jù)量大的時(shí)候,插入和刪除效率很低信夫。
deque是高效實(shí)現(xiàn)插入和刪除操作的雙向列表窃蹋,適合用于隊(duì)列和棧:
>>> from collections import deque
>>> q = deque(['a', 'b', 'c'])
>>> q.append('x')
>>> q.appendleft('y')
>>> q
deque(['y', 'a', 'b', 'c', 'x'])
deque除了實(shí)現(xiàn)list的append()和pop()外,還支持appendleft()和popleft()静稻,這樣就可以非常高效地往頭部添加或刪除元素警没。
OrderedDict
使用dict時(shí),Key是無(wú)序的振湾。在對(duì)dict做迭代時(shí)杀迹,我們無(wú)法確定Key的順序。
如果要保持Key的順序树酪,可以用OrderedDict:
>>> from collections import OrderedDict
>>> d = dict([('a', 1), ('b', 2), ('c', 3)])
>>> d # dict的Key是無(wú)序的
{'a': 1, 'c': 3, 'b': 2}
>>> od = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> od # OrderedDict的Key是有序的
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
注意垂谢,OrderedDict的Key會(huì)按照插入的順序排列,不是Key本身排序:
>>> od = OrderedDict()
>>> od['z'] = 1
>>> od['y'] = 2
>>> od['x'] = 3
>>> od.keys() # 按照插入的Key的順序返回
['z', 'y', 'x']
defaultdict
有如下值集合 [11,22,33,44,55,66,77,88,99,90...]疮茄,將所有大于 66 的值保存至字典的第一個(gè)key中滥朱,將小于 66 的值保存至第二個(gè)key的值中。
即: {'k1': 大于66 , 'k2': 小于66}
原生字典的解決辦法
values = [11, 22, 33,44,55,66,77,88,99,90]
my_dict = {}
for value in values:
if value>66:
if my_dict.has_key('k1'):
my_dict['k1'].append(value)
else:
my_dict['k1'] = [value]
else:
if my_dict.has_key('k2'):
my_dict['k2'].append(value)
else:
my_dict['k2'] = [value]
defaultdict 的解決辦法
from collections import defaultdict
values = [11, 22, 33,44,55,66,77,88,99,90]
my_dict = defaultdict(list)
for value in values:
if value>66:
my_dict['k1'].append(value)
else:
my_dict['k2'].append(value)
defaultdict字典解決方法
Counter
Counter類(lèi)的目的是用來(lái)跟蹤值出現(xiàn)的次數(shù)力试。它是一個(gè)無(wú)序的容器類(lèi)型徙邻,以字典的鍵值對(duì)形式存儲(chǔ),其中元素作為key懂版,其計(jì)數(shù)作為value鹃栽。計(jì)數(shù)值可以是任意的Interger(包括0和負(fù)數(shù))躏率。Counter類(lèi)和其他語(yǔ)言的bags或multisets很相似躯畴。
輸出:Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1})</pre>
計(jì)算一篇文章出現(xiàn)的單詞數(shù)目
fname = input('Please enter your filename:')
with open(fname,'r') as f:
num_words = 0
for line in f:
words = line.split()
num_words += len(words)
print('Num_words:',num_words)
計(jì)算一篇文章每個(gè)單詞的頻率
fname = input('please enter a filename:')
word_dict = dict()
with open(fname,'r',encoding='utf-8') as f:
for line in f:
for words in line.split(' '):
if words not in word_dict:
word_dict[words] = 1
else:
word_dict[words] += 1
c = Counter(word_dict)
print(c.most_common())