HashMap源碼分析
HashMap是對(duì)Map接口的一種實(shí)現(xiàn)踢代,底層數(shù)據(jù)結(jié)構(gòu)使用了散列表(Hash table)。假設(shè)一個(gè)數(shù)組足夠長(zhǎng)瓜客,而且存在一個(gè)函數(shù)可以將每一個(gè)需要存儲(chǔ)的值的key映射到唯一的一個(gè)數(shù)組下標(biāo)中适瓦,那么就可以在O(1)
的時(shí)間復(fù)雜度內(nèi)完成指定位置元素的讀寫(xiě)操作。但是資源是有限的谱仪,存儲(chǔ)空間是有限的,也沒(méi)辦法設(shè)計(jì)出一個(gè)完全保證一個(gè)值對(duì)應(yīng)一個(gè)數(shù)據(jù)索引的函數(shù)否彩,但是散列表就是基于這樣一種思想產(chǎn)生的疯攒。
散列表有兩個(gè)重要的概念,一個(gè)是散列函數(shù)
列荔,將一個(gè)key
映射到一個(gè)數(shù)組索引的函數(shù)敬尺。一個(gè)是沖突
,因?yàn)闆](méi)辦法設(shè)計(jì)出完美的散列函數(shù)
贴浙,所以當(dāng)兩個(gè)不同的key
散列到同一個(gè)索引時(shí)就會(huì)產(chǎn)生沖突砂吞。沖突的解決也是散列表的關(guān)鍵。
繼續(xù)介紹HashMap
崎溃,我們先看一下官方文檔
Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
基于散列表實(shí)現(xiàn)蜻直,非同步,允許null鍵值等等。
底層結(jié)構(gòu)
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
transient Node<K,V>[] table;
...
/**
* The number of times this HashMap has been structurally modified
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash). This field is used to make iterators on Collection-views of
* the HashMap fail-fast. (See ConcurrentModificationException).
*/
transient int modCount;
int threshold;
final float loadFactor;
table
就是剛才我們說(shuō)的概而,理想中無(wú)限大的數(shù)組呼巷。在HashMap
創(chuàng)建的時(shí)候并沒(méi)有初始化,而是延遲到首次使用的時(shí)候赎瑰。HashMap
要求table
的大小是2^n
王悍,下面會(huì)介紹這樣要求的目的。
modCount
記錄HashMap
結(jié)構(gòu)化修改的次數(shù)餐曼,如果在迭代過(guò)程中压储,出現(xiàn)結(jié)構(gòu)化修改的情況,那么迭代時(shí)modCount
的值與迭代前的值就不同源譬,此時(shí)會(huì)拋出ConcurrentModificationException
集惋。這是HashMap
的fail-fast
機(jī)制。注意瓶佳,put新鍵值對(duì)芋膘,但是某個(gè)key
對(duì)應(yīng)的value
值被覆蓋不屬于結(jié)構(gòu)變化。
loadFactor
是負(fù)載因子(默認(rèn)值是0.75)霸饲,threshold
是HashMap
所能容納的最大數(shù)據(jù)量的Node
(鍵值對(duì))個(gè)數(shù)为朋。threshold = length * loadFactor
。也就是說(shuō)厚脉,在數(shù)組定義好長(zhǎng)度之后习寸,負(fù)載因子越大,所能容納的鍵值對(duì)個(gè)數(shù)越多傻工。
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
static final int MAXIMUM_CAPACITY = 1 << 30; //2^30
static final float DEFAULT_LOAD_FACTOR = 0.75f;
/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*/
static final int TREEIFY_THRESHOLD = 8;
/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
static final int UNTREEIFY_THRESHOLD = 6;
/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
*/
static final int MIN_TREEIFY_CAPACITY = 64;
DEFAULT_INITIAL_CAPACITY
和MAXIMUM_CAPACITY
分別為初始默認(rèn)容量和最大容量霞溪。
DEFAULT_LOAD_FACTOR
為默認(rèn)的狀態(tài)因子,當(dāng)數(shù)組中已使用的桶(bin)的數(shù)量超過(guò)容量和裝填因子的乘積中捆,就會(huì)進(jìn)行擴(kuò)容鸯匹。
HashMap
解決沖突使用的是拉鏈法,在JDK8以前只是采用了單向鏈表的方式泄伪,哈希碰撞會(huì)給查找?guī)?lái)災(zāi)難性的影響殴蓬,最差情況下,HashMap會(huì)退化為一個(gè)單鏈表蟋滴。查找時(shí)間由O(1)退化為O(n)染厅。而在JDK 8中,如果單鏈表過(guò)長(zhǎng)則會(huì)轉(zhuǎn)換為一顆紅黑樹(shù)津函,使得最壞情況下查找的時(shí)間復(fù)雜度為 O(log n) 肖粮。紅黑樹(shù)節(jié)點(diǎn)的空間占用相較于普通節(jié)點(diǎn)要高出許多,通常只有在比較極端的情況下才會(huì)由單鏈表轉(zhuǎn)化為紅黑樹(shù)尔苦。通過(guò)TREEIFY_THRESHOLD
涩馆、UNTREEIFY_THRESHOLD
和MIN_TREEIFY_CAPACITY
來(lái)控制轉(zhuǎn)換需要的閾值行施。
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
Node(int hash, K key, V value, Node<K,V> next) {
this.hash = hash;
this.key = key;
this.value = value;
this.next = next;
}
...
}
HashMap
不僅僅是存儲(chǔ)值,而是將鍵值都存儲(chǔ)到數(shù)組中凌净,就是這個(gè)Node
靜態(tài)類(lèi)悲龟。Node
類(lèi)包括了鍵、值冰寻、下一個(gè)節(jié)點(diǎn)的引用须教,以及鍵的hash
值,避免重復(fù)計(jì)算hash
斩芭。其實(shí)這個(gè)Node
就是單向鏈表中的一個(gè)節(jié)點(diǎn)轻腺。
初始化
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}
public HashMap(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
public HashMap(Map<? extends K, ? extends V> m) {
this.loadFactor = DEFAULT_LOAD_FACTOR;
putMapEntries(m, false);
}
/**
* Returns a power of two size for the given target capacity.
*/
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
上文說(shuō)HashMap
需要保證容量為2^n
,那么如果來(lái)保證呢划乖?其實(shí)關(guān)鍵在tableSizeFor()
方法贬养。之前的文章在介紹ArrayDeque
時(shí)也有介紹過(guò),使用五次右移和位或操作可以保證得到2^n-1
的數(shù)琴庵,如下所示误算。然后再加1就可以得到2^n
的數(shù)。
0 0 0 0 1 ? ? ? ? ? //n
0 0 0 0 1 1 ? ? ? ? //n |= n >>> 1;
0 0 0 0 1 1 1 1 ? ? //n |= n >>> 2;
0 0 0 0 1 1 1 1 1 1 //n |= n >>> 4;
哈希計(jì)算
要設(shè)計(jì)出一個(gè)分布均勻的散列函數(shù)是很困難的迷殿,而且也不是我們所關(guān)心的儿礼,Java中的String
和其它基本類(lèi)型的包裝類(lèi)的hashCode()
返回的散列值已經(jīng)分布得很不錯(cuò)了,我們直接拿來(lái)用就可以了庆寺。
要將hashCode()
方法返回的散列值再映射到數(shù)組的索引值蚊夫,我們能夠想到的一般是通過(guò)模運(yùn)算。例如懦尝,數(shù)組的長(zhǎng)度為length
知纷,那么我們可以通過(guò)hashCode() % length
來(lái)得到數(shù)組中放置bin
的位置。但是HashMap
并不是這樣的陵霉。
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
在設(shè)計(jì)hash
函數(shù)時(shí)琅轧,因?yàn)橐髷?shù)組table
的長(zhǎng)度length
必須為2^n
,因而(length-1)的二進(jìn)制表示為0...011...11
的形式踊挠,位與操作后保留了 h 的低位鹰晨,實(shí)際上就是 h%length。所以計(jì)算下標(biāo)的時(shí)候止毕,可以使用&
位操作,而不是%
求余)漠趁。如下:
(n - 1) & hash
但是映射之后真正生效的是低位信息扁凛,高位信息被忽略了,所以容易發(fā)生沖突(collide)闯传。所以將高位和低位異或谨朝,引入高位信息,減少?zèng)_突的概率。
put方法實(shí)現(xiàn)
put方法大致的思路是:
- 對(duì)
key
的hashCode()
做hash
字币,然后計(jì)算index
- 如果沒(méi)有沖突就直接放到桶(bin)里
- 如果沖突了则披,就以鏈表的形式存到
bin
的后面 - 如果碰撞導(dǎo)致鏈表過(guò)長(zhǎng)(大于等于TREEIFY_THRESHOLD),就把鏈表轉(zhuǎn)換成紅黑樹(shù)
- 如果節(jié)點(diǎn)已經(jīng)存在就替換old value(保證key的唯一性)
- 如果
size
超過(guò)load factor*current capacity
洗出,就要resize
具體代碼如下:
public V put(K key, V value) {
// 對(duì)key的hashCode()做hash
return putVal(hash(key), key, value, false, true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// tab為空則創(chuàng)建
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// 計(jì)算index士复,并對(duì)null做處理
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
// 節(jié)點(diǎn)存在
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
// 該鏈為樹(shù)
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
// 該鏈為鏈表
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
// 寫(xiě)入
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
// 超過(guò)load factor*current capacity,resize
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
get方法實(shí)現(xiàn)
大致思路如下:
- 求出
key
的hash
翩活,再求出index
- 檢查
bin
的第一個(gè)節(jié)點(diǎn)阱洪,直接命中 - 如果有沖突,則通過(guò)key.equals(k)去查找對(duì)應(yīng)的entry
若為樹(shù)菠镇,則在樹(shù)中通過(guò)key.equals(k)查找冗荸,O(logn);
若為鏈表利耍,則在鏈表中通過(guò)key.equals(k)查找蚌本,O(n)。
具體代碼實(shí)現(xiàn)如下:
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
// 直接命中
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
// 未命中
if ((e = first.next) != null) {
// 在樹(shù)中g(shù)et
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
// 在鏈表中g(shù)et
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
因?yàn)?code>HashMap允許null
值存在隘梨,所以調(diào)用get(Object key)
方法返回null
值不代表map
中不存在這個(gè)key
的映射程癌,也有可能這個(gè)key
對(duì)應(yīng)的值是null
〕鲟冢可以通過(guò)containsKey
方法來(lái)區(qū)別兩種情況席楚。
resize方法的實(shí)現(xiàn)
當(dāng)put時(shí),如果發(fā)現(xiàn)目前HashMap
的size
大于load factor*current capacity
税稼,那么就會(huì)發(fā)生resize
烦秩。在resize
的過(guò)程中,簡(jiǎn)單的說(shuō)就是將bin
擴(kuò)充為2倍郎仆,并重新計(jì)算index只祠,把節(jié)點(diǎn)放入新的bin
中。
Initializes or doubles table size. If null, allocates in accord with initial capacity target held in field threshold. Otherwise, because we are using power-of-two expansion, the elements from each bin must either stay at same index, or move with a power of two offset in the new table.
大致意思就是說(shuō)扰肌,當(dāng)超過(guò)限制的時(shí)候會(huì)resize抛寝,然而又因?yàn)槲覀兪褂玫氖?次冪的擴(kuò)展(指長(zhǎng)度擴(kuò)為原來(lái)2倍),所以曙旭,元素的位置要么是在原位置盗舰,要么是在原位置再移動(dòng)2次冪的位置。
怎么理解呢桂躏?例如我們從16(0x0F)擴(kuò)展為32(0x1F)時(shí)钻趋,具體的變化如下所示:
因此元素在重新計(jì)算hash之后,因?yàn)閚變?yōu)?倍剂习,那么n-1的mask范圍在高位多1bit(紅色)蛮位,因此新的index就會(huì)發(fā)生這樣的變化:
因此较沪,我們?cè)跀U(kuò)充HashMap的時(shí)候,不需要重新計(jì)算hash失仁,只需要看看原來(lái)的hash值新增的那個(gè)bit是1還是0就好了尸曼,是0的話索引沒(méi)變,是1的話索引變成“原索引+oldCap”萄焦】亟危可以看看下圖為16擴(kuò)充為32的resize示意圖:
這個(gè)設(shè)計(jì)確實(shí)非常的巧妙,既省去了重新計(jì)算hash值的時(shí)間楷扬,而且同時(shí)解幽,由于新增的1bit是0還是1可以認(rèn)為是隨機(jī)的,因此resize的過(guò)程烘苹,均勻的把之前的沖突的節(jié)點(diǎn)分散到新的bucket了躲株。
注:Java7的resize
實(shí)現(xiàn)會(huì)倒置鏈表,而Java8不會(huì)镣衡。
具體實(shí)現(xiàn)如下:
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
// 超過(guò)最大值就不再擴(kuò)充了霜定,就只好隨你碰撞去吧
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
// 沒(méi)超過(guò)最大值,就擴(kuò)充為原來(lái)的2倍
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
// 計(jì)算新的resize上限
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if (oldTab != null) {
// 把每個(gè)bucket都移動(dòng)到新的buckets中
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
// 原索引
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
// 原索引+oldCap
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
// 原索引放到bucket里
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
// 原索引+oldCap放到bucket里
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
小結(jié)
以Entry[]數(shù)組實(shí)現(xiàn)的哈希桶數(shù)組廊鸥,用Key的哈希值取模桶數(shù)組的大小可得到數(shù)組下標(biāo)望浩。
插入元素時(shí),如果兩條Key落在同一個(gè)桶(比如哈希值1和17取模16后都屬于第一個(gè)哈希桶)惰说,我們稱(chēng)之為哈希沖突磨德。
JDK的做法是鏈表法,Entry用一個(gè)next屬性實(shí)現(xiàn)多個(gè)Entry以單向鏈表存放吆视。查找哈希值為17的key時(shí)典挑,先定位到哈希桶,然后鏈表遍歷桶里所有元素啦吧,逐個(gè)比較其Hash值然后key值您觉。
在JDK8里,新增默認(rèn)為8的閾值授滓,當(dāng)一個(gè)桶里的Entry超過(guò)閥值琳水,就不以單向鏈表而以紅黑樹(shù)來(lái)存放以加快Key的查找速度。
當(dāng)然般堆,最好還是桶里只有一個(gè)元素在孝,不用去比較。所以默認(rèn)當(dāng)Entry數(shù)量達(dá)到桶數(shù)量的75%時(shí)淮摔,哈希沖突已比較嚴(yán)重浑玛,就會(huì)成倍擴(kuò)容桶數(shù)組,并重新分配所有原來(lái)的Entry噩咪。擴(kuò)容成本不低顾彰,所以也最好有個(gè)預(yù)估值。
取模用與操作(hash & (arrayLength-1))會(huì)比較快胃碾,所以數(shù)組的大小永遠(yuǎn)是2的N次方涨享, 你隨便給一個(gè)初始值比如17會(huì)轉(zhuǎn)為32。默認(rèn)第一次放入元素時(shí)的初始值是16仆百。
iterator()時(shí)順著哈希桶數(shù)組來(lái)遍歷厕隧,看起來(lái)是個(gè)亂序。