今天看了哈希表原理姻氨,理解了個(gè)大概:
python 的 字典是基于哈希表實(shí)現(xiàn)的,之所以字典的存取操作的時(shí)間復(fù)雜度為O1也得益于哈希表穆壕。
我們知道,數(shù)組的時(shí)間復(fù)雜度為O1其屏,是因?yàn)閿?shù)組實(shí)現(xiàn)了以索引存取值喇勋,這樣在存取數(shù)據(jù)的時(shí)候,只需要
將索引(內(nèi)存地址)拿到偎行,就能很快地將數(shù)據(jù)進(jìn)行存取操作川背,一步到位。當(dāng)然插入和刪除因?yàn)橐婕叭吭?所以時(shí)間復(fù)雜度是另一個(gè)級(jí)別蛤袒。
而字典熄云,其實(shí)就是一個(gè)變相的數(shù)組。它將我們要存儲(chǔ)的字符串先轉(zhuǎn)化為 數(shù)字 索引妙真,然后再存到列表里缴允。
這里 可以想象,由于同樣的字符串轉(zhuǎn)化為數(shù)字索引的時(shí)候(如果是用同一種算法)珍德,可定會(huì)發(fā)生重復(fù)练般,即
碰撞。這里不寫(xiě)解決碰撞的方法锈候,本例中 思路是將每個(gè)字符串存進(jìn)表的時(shí)候薄料,掛上一個(gè)獨(dú)一無(wú)二的id。
talk is cheap晴及,代碼實(shí)現(xiàn)如下:
import random
import time
class Hastable(object):
def __init__(self):
self.table_size = 100007
self.table = [0] * self.table_size
def add(self, key, value):
index = 1
f = 1
for s in key:
index += ord(s) * f
f *= 10
index = index % self.table_size
data = [key, value]
if self.table[index] == 0:
self.table[index] = [data]
else:
self.table[index].append(data)
def get(self, key, defult=None):
index = 1
f = 1
for k in key:
index += ord(k) * f
f *= 10
index = index % self.table_size
if isinstance(self.table[index], list):
for ls in self.table[index]:
if ls[0] == key:
return ls[1]
else:
return defult
# -----------------以下為測(cè)試時(shí)間-----------
ls = []
def ran():
seed = 'abcdefghijklmnopqrstuvwxyz'
s = ''
for i in range(5):
random_index = random.randint(0, len(seed) - 1)
s += seed[random_index]
ls.append(s)
def ye(n):
i = 0
while i < n:
ran()
i += 1
def test():
import uuid
t1 = time.time()
print('list add start', t1)
ye(500000)
t2 = time.time()
print('list add end', t2)
print('total time', t2 - t1)
print("=====================")
t3 = time.time()
print('Hastable add start', t3)
ht = Hastable()
for key in ls:
value = uuid.uuid4()
# print('value', value)
ht.add(key, value)
t4 = time.time()
print('Hastable add total', t4 - t3)
print("=====================")
t5 = time.time()
print('Hastable get start', t5)
for key in ls:
v = ht.get(key)
t6 = time.time()
print('Hastable get total', t6 - t5)
print("=====================")
t7 = time.time()
print('list get start', t7)
for i in range(len(ls)):
ls.__getitem__(i)
t8 = time.time()
print('list get total', t8 - t7)
print('all time', t8 - t1)
if __name__ == '__main__':
test()
輸出如下:
list add start 1527833095.2639012
list add end 1527833102.5443177
total time 7.280416488647461
=====================
Hastable add start 1527833102.5443177
Hastable add total 6.015344142913818
=====================
Hastable get start 1527833108.5596619
Hastable get total 1.8111035823822021
=====================
list get start 1527833110.3707654
list get total 0.07000398635864258
all time 15.176868200302124
[Finished in 15.7s]