[toc]
無(wú)論是大廠還是不知名的小公司,HashMap都是一個(gè)繞不開(kāi)的話題⊙灿铮基本上,如果通過(guò)HashMap能聊半小時(shí)以上淮菠,基本offer就沒(méi)什么大礙了∧泄現(xiàn)在我們也看看HashMap的源碼,看看為什么這么被面試官待見(jiàn)合陵。
1.類的結(jié)構(gòu)及重要屬性
1.1 類的基本結(jié)構(gòu)
HashMap的本質(zhì)還是hash表枢赔,在前面解決哈希沖突的常用方法分析一文中分析了對(duì)于hash表,hash沖突之后的解決方法曙寡。主要有開(kāi)放定址法糠爬、拉鏈法、再哈希法、公共溢出區(qū)等方法。而且再討論ThreadLocal的時(shí)候也討論了ThreadLocalMap實(shí)際上是采用開(kāi)放定址法中的線性探查法來(lái)解決hash沖突的屋摔,詳見(jiàn)java中的reference(四): WeakReference的應(yīng)用--ThreadLocal源碼分析一文副瀑。實(shí)際上HashMap則是采用拉鏈法解決哈希沖突最具代表性且最廣泛被使用的應(yīng)用挽鞠。在1.7及之前版本完全是采用鏈表來(lái)解決嫁赏,而1.8版本中則比較復(fù)雜堤魁,在鏈表的基礎(chǔ)上侧漓,當(dāng)鏈表達(dá)到一定長(zhǎng)度的時(shí)候會(huì)轉(zhuǎn)換為紅黑樹(shù)。
我們先通過(guò)idea的Dragrams功能來(lái)看一看HashMap的基本結(jié)構(gòu):
可以看到龙亲,HashMap繼承了AbstractMap嚎花,并實(shí)現(xiàn)了Map、Cloneable、Serializable接口锦亦。
源碼如下:
/**
* Hash table based implementation of the <tt>Map</tt> interface. This
* implementation provides all of the optional map operations, and permits
* <tt>null</tt> values and the <tt>null</tt> key. (The <tt>HashMap</tt>
* class is roughly equivalent to <tt>Hashtable</tt>, except that it is
* unsynchronized and permits nulls.) This class makes no guarantees as to
* the order of the map; in particular, it does not guarantee that the order
* will remain constant over time.
*
* <p>This implementation provides constant-time performance for the basic
* operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
* disperses the elements properly among the buckets. Iteration over
* collection views requires time proportional to the "capacity" of the
* <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
* of key-value mappings). Thus, it's very important not to set the initial
* capacity too high (or the load factor too low) if iteration performance is
* important.
*
* <p>An instance of <tt>HashMap</tt> has two parameters that affect its
* performance: <i>initial capacity</i> and <i>load factor</i>. The
* <i>capacity</i> is the number of buckets in the hash table, and the initial
* capacity is simply the capacity at the time the hash table is created. The
* <i>load factor</i> is a measure of how full the hash table is allowed to
* get before its capacity is automatically increased. When the number of
* entries in the hash table exceeds the product of the load factor and the
* current capacity, the hash table is <i>rehashed</i> (that is, internal data
* structures are rebuilt) so that the hash table has approximately twice the
* number of buckets.
*
* <p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs. Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>). The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations. If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.
*
* <p>If many mappings are to be stored in a <tt>HashMap</tt>
* instance, creating it with a sufficiently large capacity will allow
* the mappings to be stored more efficiently than letting it perform
* automatic rehashing as needed to grow the table. Note that using
* many keys with the same {@code hashCode()} is a sure way to slow
* down performance of any hash table. To ameliorate impact, when keys
* are {@link Comparable}, this class may use comparison order among
* keys to help break ties.
*
* <p><strong>Note that this implementation is not synchronized.</strong>
* If multiple threads access a hash map concurrently, and at least one of
* the threads modifies the map structurally, it <i>must</i> be
* synchronized externally. (A structural modification is any operation
* that adds or deletes one or more mappings; merely changing the value
* associated with a key that an instance already contains is not a
* structural modification.) This is typically accomplished by
* synchronizing on some object that naturally encapsulates the map.
*
* If no such object exists, the map should be "wrapped" using the
* {@link Collections#synchronizedMap Collections.synchronizedMap}
* method. This is best done at creation time, to prevent accidental
* unsynchronized access to the map:<pre>
* Map m = Collections.synchronizedMap(new HashMap(...));</pre>
*
* <p>The iterators returned by all of this class's "collection view methods"
* are <i>fail-fast</i>: if the map is structurally modified at any time after
* the iterator is created, in any way except through the iterator's own
* <tt>remove</tt> method, the iterator will throw a
* {@link ConcurrentModificationException}. Thus, in the face of concurrent
* modification, the iterator fails quickly and cleanly, rather than risking
* arbitrary, non-deterministic behavior at an undetermined time in the
* future.
*
* <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
* as it is, generally speaking, impossible to make any hard guarantees in the
* presence of unsynchronized concurrent modification. Fail-fast iterators
* throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
* Therefore, it would be wrong to write a program that depended on this
* exception for its correctness: <i>the fail-fast behavior of iterators
* should be used only to detect bugs.</i>
*
* <p>This class is a member of the
* <a href="{@docRoot}/../technotes/guides/collections/index.html">
* Java Collections Framework</a>.
*
* @param <K> the type of keys maintained by this map
* @param <V> the type of mapped values
*
* @author Doug Lea
* @author Josh Bloch
* @author Arthur van Hoff
* @author Neal Gafter
* @see Object#hashCode()
* @see Collection
* @see Map
* @see TreeMap
* @see Hashtable
* @since 1.2
*/
public class HashMap<K,V> extends AbstractMap<K,V>
implements Map<K,V>, Cloneable, Serializable {
}
注釋大意為:
HashMap實(shí)現(xiàn)了基本的Map接口,這個(gè)實(shí)現(xiàn)提供了map的所有操作方法。允許null的key和value坯屿。除開(kāi)HashMap的非同步性和允許空key和value之外撤奸,HashMap與HashTable等價(jià)矢棚。但是HashMap并不保證順序性钝满。
這個(gè)實(shí)現(xiàn)提供了恒定性能的get和put方法孔轴,假設(shè)HashMap將所有的元素均勻的分散在buckets中,對(duì)這個(gè)容器進(jìn)行迭代或者匯總的時(shí)間與容器的容量成正比,因此黎休,不要將初始化的容量設(shè)置得過(guò)大或者負(fù)載因子設(shè)置得過(guò)低對(duì)于迭代的性能是非常重要的。
HashMap的一個(gè)構(gòu)造方法有兩個(gè)參數(shù),初始化容量和負(fù)載因子。初始容量是hash表創(chuàng)建時(shí)初始buckets的數(shù)量,負(fù)載因子則是哈希表允許其內(nèi)部元素有多滿的度量,在其容量擴(kuò)容之前獲取,當(dāng)哈希表的size超過(guò)了負(fù)載因子和當(dāng)前容量的乘積,這個(gè)哈希表通過(guò)rehash方式進(jìn)行重建擴(kuò)容眶诈,大概是之前buckets數(shù)量的兩倍乓土。
通常情況下梯轻,負(fù)載因子的大小是0.75伊诵,這是對(duì)時(shí)間和空間的權(quán)衡歉提。數(shù)值越高版扩,則空間的開(kāi)銷越小恋拷,但是檢索的時(shí)間成本就越高窄刘。包括get和put方法翻伺。設(shè)置初始容量的時(shí)候辣辫,需要考慮map中預(yù)期的條目數(shù)和其負(fù)載因子卖鲤,以便最小化rehash操作的次數(shù)甩恼,如果初始化的容量大于條目數(shù)除以負(fù)載因子钉蒲,則不會(huì)發(fā)生rehash操作茵瀑。
如果有許多數(shù)據(jù)要存儲(chǔ)在HashMap的實(shí)例中,那么足夠大的初始化容量來(lái)創(chuàng)建這個(gè)哈希表將比讓這個(gè)哈希表隨著元素的添加而自動(dòng)擴(kuò)容更加有效率饥脑。注意乳附,如果有多個(gè)元素的hashcode相同,這會(huì)導(dǎo)致hashTable的性能降低棱貌。為了改善這個(gè)影響吟宦,可以將Comparable作為key偷卧,使用這些元素之間的比較順序來(lái)避免這個(gè)問(wèn)題瞧挤。
注意HashMap是非同步的,如果多線程并發(fā)環(huán)境下校辩,最后的這個(gè)線程如果修改了HashMap的結(jié)構(gòu),它必須在外部加上同步方法。結(jié)構(gòu)修改是指任何添加或者刪除操作,僅僅是改變value則不是屬于結(jié)構(gòu)修改。這通常是在自然封裝的映射對(duì)象上同步辙培。
如果沒(méi)有這些對(duì)象,那么請(qǐng)使用Collections.synchronizedMap方法缘圈。最好是在對(duì)象創(chuàng)建完成之后訪問(wèn)夭坪,以防止不同步的訪問(wèn)方法潜叛。
Map m = Collections.synchronizedMap(new HashMap(...));
迭代器返回了這個(gè)類的所有集合視圖方法昔穴,這個(gè)方法也是fail-fast的崭放,在迭代器創(chuàng)建之后鸽凶,如果Map的結(jié)構(gòu)被修改币砂,將會(huì)拋出ConcurrentModificationException。因此玻侥,在并行修改之前决摧,迭代器最好死fail-fast,而不是冒險(xiǎn)執(zhí)行哪些未確定的行為。
注意掌桩,迭代器快速失敗行為并不能得到保證边锁,因?yàn)橥ǔ6裕诜峭降牟l(fā)修改時(shí)不可能做出任何確定性的保證波岛。fail-fast機(jī)制將最大努力的拋出ConcurrentModificationException茅坛,但是你的程序需要依賴這個(gè)異常來(lái)保證其正確性,那就錯(cuò)了则拷,fail-fast機(jī)制只能用來(lái)檢測(cè)bug贡蓖。
1.2 成員變量及常量
1.2.1 常量
在HashMap的源碼中,首先需要知道的就是定義在HashMap中的這些常量煌茬,對(duì)于HashMap的作用非常關(guān)鍵斥铺。見(jiàn)如下源碼:
private static final long serialVersionUID = 362498820763181265L;
/*
* Implementation notes.
*
* This map usually acts as a binned (bucketed) hash table, but
* when bins get too large, they are transformed into bins of
* TreeNodes, each structured similarly to those in
* java.util.TreeMap. Most methods try to use normal bins, but
* relay to TreeNode methods when applicable (simply by checking
* instanceof a node). Bins of TreeNodes may be traversed and
* used like any others, but additionally support faster lookup
* when overpopulated. However, since the vast majority of bins in
* normal use are not overpopulated, checking for existence of
* tree bins may be delayed in the course of table methods.
*
* Tree bins (i.e., bins whose elements are all TreeNodes) are
* ordered primarily by hashCode, but in the case of ties, if two
* elements are of the same "class C implements Comparable<C>",
* type then their compareTo method is used for ordering. (We
* conservatively check generic types via reflection to validate
* this -- see method comparableClassFor). The added complexity
* of tree bins is worthwhile in providing worst-case O(log n)
* operations when keys either have distinct hashes or are
* orderable, Thus, performance degrades gracefully under
* accidental or malicious usages in which hashCode() methods
* return values that are poorly distributed, as well as those in
* which many keys share a hashCode, so long as they are also
* Comparable. (If neither of these apply, we may waste about a
* factor of two in time and space compared to taking no
* precautions. But the only known cases stem from poor user
* programming practices that are already so slow that this makes
* little difference.)
*
* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins. In
* usages with well-distributed user hashCodes, tree bins are
* rarely used. Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million
*
* The root of a tree bin is normally its first node. However,
* sometimes (currently only upon Iterator.remove), the root might
* be elsewhere, but can be recovered following parent links
* (method TreeNode.root()).
*
* All applicable internal methods accept a hash code as an
* argument (as normally supplied from a public method), allowing
* them to call each other without recomputing user hashCodes.
* Most internal methods also accept a "tab" argument, that is
* normally the current table, but may be a new or old one when
* resizing or converting.
*
* When bin lists are treeified, split, or untreeified, we keep
* them in the same relative access/traversal order (i.e., field
* Node.next) to better preserve locality, and to slightly
* simplify handling of splits and traversals that invoke
* iterator.remove. When using comparators on insertion, to keep a
* total ordering (or as close as is required here) across
* rebalancings, we compare classes and identityHashCodes as
* tie-breakers.
*
* The use and transitions among plain vs tree modes is
* complicated by the existence of subclass LinkedHashMap. See
* below for hook methods defined to be invoked upon insertion,
* removal and access that allow LinkedHashMap internals to
* otherwise remain independent of these mechanics. (This also
* requires that a map instance be passed to some utility methods
* that may create new nodes.)
*
* The concurrent-programming-like SSA-based coding style helps
* avoid aliasing errors amid all of the twisty pointer operations.
*/
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
*/
static final int MAXIMUM_CAPACITY = 1 << 30;
/**
* The load factor used when none specified in constructor.
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;
/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*/
static final int TREEIFY_THRESHOLD = 8;
/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
static final int UNTREEIFY_THRESHOLD = 6;
/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
*/
static final int MIN_TREEIFY_CAPACITY = 64;
這里面有一段非常重要的注釋,其大意如下:
實(shí)現(xiàn)的注意事項(xiàng)坛善,這個(gè)map通常是由一個(gè)個(gè)bin(bucket)組成的哈希表晾蜘。當(dāng)當(dāng)個(gè)bin的長(zhǎng)度變大之后,將會(huì)轉(zhuǎn)換為紅黑樹(shù)的TreeNode眠屎,這比較類似于java.util.TreeMap的結(jié)構(gòu)剔交,大多數(shù)方法都使用bin的數(shù)據(jù)結(jié)構(gòu)(可以用instanceof 來(lái)判斷是否是一個(gè)Node)。樹(shù)化的bin也支持普通bin的操作组力,但是樹(shù)化之后敷燎,查詢性能會(huì)高于傳統(tǒng)的bin,然而凑懂,由于正常情況下勾笆,絕大多數(shù)的bucket都不會(huì)有過(guò)多的數(shù)量,因此候衍,檢查是否存在樹(shù)化的bin方法可能會(huì)被延遲笼蛛。
樹(shù)化的bins,其每個(gè)元素都是TreeNode,根據(jù)hashcode進(jìn)行排序蛉鹿,但是如果在hashcode相同的情況下滨砍,如果有兩個(gè)元素具有相同的Comparable實(shí)現(xiàn),則他們會(huì)通過(guò)compareTo方法進(jìn)行排序妖异。我們通過(guò)反射保守的來(lái)驗(yàn)證泛型方法惋戏。當(dāng)key具有不同的hash值或者可排序的時(shí)候,增加樹(shù)化容器的復(fù)雜性他膳,提供了最壞為O(log n)的時(shí)間復(fù)雜度响逢,因此,在意外或者惡意使用的情況下棕孙,hashcode的值要是離散程度不夠的話舔亭,性能會(huì)優(yōu)雅的下降些膨。如果有許多key的hashcode相同,只要他們是可以比較的钦铺,hashMap也是支持的订雾。如果這兩種方式都不適用,那么不采取預(yù)防措施相比矛洞,我們可能會(huì)在時(shí)間和空間上浪費(fèi)大約兩倍的時(shí)間洼哎,這種情況的唯一的原因是由于不良的編碼所致。
因?yàn)門reeNode節(jié)點(diǎn)的大小大約是常規(guī)節(jié)點(diǎn)的兩倍沼本,所以我們僅在bins足以完全容納的時(shí)候才使用谱净。請(qǐng)參閱TREEIFY_THRESHOLD,當(dāng)每個(gè)樹(shù)化的bin的數(shù)量變小時(shí)擅威,他們又會(huì)轉(zhuǎn)換為普通的bin壕探。即變成鏈表。在使用離散性能很好的hashCodes方法下郊丛,很少會(huì)造成樹(shù)化的hashMap李请。在隨機(jī)的hashCodes方法中,bin節(jié)點(diǎn)的頻率服從泊松分布厉熟,(http://en.wikipedia.org/wiki/Poisson_distribution)由于默認(rèn)的負(fù)載因子為0.75导盅,平均參數(shù)約為0.5,盡管由于調(diào)整粒度的差異很大,在忽略方差的情況下揍瑟,列表的大小k的預(yù)期出現(xiàn)的次數(shù)是:
(exp(-0.5)* pow(0.5白翻,k)/ factorial(k))
計(jì)算的值如下表:
次數(shù) | 概率 |
---|---|
0 | 0.60653066 |
1 | 0.30326533 |
2 | 0.07581633 |
3 | 0.01263606 |
4 | 0.00157952 |
5 | 0.00015795 |
6 | 0.00001316 |
7 | 0.00000094 |
8 | 0.00000006 |
之后都低于千萬(wàn)分之一。
樹(shù)化的bin的根通常是它的第一個(gè)節(jié)點(diǎn)绢片,然而滤馍,有時(shí)候根可能會(huì)在其他地方(目前只在iterator.remove上出現(xiàn)),但是可以恢復(fù)到父節(jié)點(diǎn)底循,通過(guò)TreeNode.root()方法執(zhí)行巢株。
所有使用的內(nèi)部方法都接受一個(gè)hashcode參數(shù),(通常由公共的方法提供熙涤,Object對(duì)象就有)阁苞,允許無(wú)須重新計(jì)算的用戶hashcode既可互相調(diào)用。大多數(shù)的內(nèi)部方法也而已接受一個(gè)tab參數(shù)祠挫,通常是當(dāng)前的表那槽,但也可能是新表或者舊表,在調(diào)整大小的時(shí)候或者轉(zhuǎn)換的情況下等舔。
當(dāng)bin列表被樹(shù)化骚灸,拆分或者未被樹(shù)化時(shí),我們將其保持在相同的相對(duì)訪問(wèn)/遍歷順序软瞎,即Node的next屬性中逢唤。并略微簡(jiǎn)化的對(duì)調(diào)用iterator.remove的拆分和遍歷處理。當(dāng)在插入時(shí)使用comparators時(shí)涤浇,為了在總體排序下重新平衡鳖藕,我們將類和identityHashCodes進(jìn)行最終的比較。
其子類LinkedHashMap在鏈表和紅黑樹(shù)之間的轉(zhuǎn)換變得非常復(fù)雜只锭,請(qǐng)參考下面的hook方法著恩,這些鉤子函數(shù)定義在插入、刪除蜻展、和訪問(wèn)時(shí)被調(diào)用喉誊,這些方法允許LinkedHashMap內(nèi)部保持獨(dú)立于當(dāng)前的機(jī)制。這還要求將map實(shí)例傳遞給一些可能創(chuàng)建新節(jié)點(diǎn)的實(shí)用程序方法纵顾。
類似于并行編程的基于SSL的編碼風(fēng)格很有幫組伍茄,能避免在所有扭曲的指針操作中出現(xiàn)混疊錯(cuò)誤。
以上是這一段注釋的大意施逾,非常重要敷矫,我們從中可以了解到,為什么HashMap的bin在長(zhǎng)度大于8之后會(huì)被樹(shù)化汉额。其他的一些重要的常量見(jiàn)下表:
常量名 | 取值 | 說(shuō)明 |
---|---|---|
DEFAULT_INITIAL_CAPACITY | 1<<4=16 | 默認(rèn)的初始化容量曹仗,必須是2的倍數(shù)。 |
MAXIMUM_CAPACITY | 1<<30=1073741824 | 最大容量蠕搜,當(dāng)隱式指定較高的值的時(shí)候怎茫,使用任何帶參數(shù)的構(gòu)造函數(shù)執(zhí)行。 |
DEFAULT_LOAD_FACTOR | 0.75 | 負(fù)載因子妓灌,在構(gòu)造函數(shù)中如果沒(méi)有指定轨蛤,則使用這個(gè)值. |
TREEIFY_THRESHOLD | 8 | 鏈表轉(zhuǎn)換為紅黑樹(shù)的閾值,這個(gè)數(shù)值必須大于2且至少為8虫埂,才能確保在收縮的時(shí)候轉(zhuǎn)換為鏈表俱萍。 |
UNTREEIFY_THRESHOLD | 6 | 由紅黑樹(shù)轉(zhuǎn)換為鏈表的閾值,當(dāng)紅黑樹(shù)隨著remove操作收縮的時(shí)候告丢,達(dá)到這個(gè)值則轉(zhuǎn)換為鏈表枪蘑。這個(gè)值必須小于TREEIFY_THRESHOLd |
MIN_TREEIFY_CAPACITY | 64 | 容器可能被樹(shù)化的最小表容量,否則岖免,當(dāng)容器中的節(jié)點(diǎn)增加的時(shí)候岳颇,會(huì)調(diào)整容器table的大小。至少應(yīng)該為4的TREEIFY_THRESHOLD颅湘,以避免treesize和treeifcation沖突话侧。 |
1.2.2 成員變量
主要的成員變量如下:
/* ---------------- Fields -------------- */
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
transient Node<K,V>[] table;
/**
* Holds cached entrySet(). Note that AbstractMap fields are used
* for keySet() and values().
*/
transient Set<Map.Entry<K,V>> entrySet;
/**
* The number of key-value mappings contained in this map.
*/
transient int size;
/**
* The number of times this HashMap has been structurally modified
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash). This field is used to make iterators on Collection-views of
* the HashMap fail-fast. (See ConcurrentModificationException).
*/
transient int modCount;
/**
* The next size value at which to resize (capacity * load factor).
*
* @serial
*/
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;
/**
* The load factor for the hash table.
*
* @serial
*/
final float loadFactor;
1.2.2.1 table
transient Node<K,V>[] table;
table是一個(gè)Nodes數(shù)組,在第一次使用的時(shí)候初始化闯参,并調(diào)整為其必要的大小瞻鹏,當(dāng)分配的時(shí)候悲立,其長(zhǎng)度總是2的冪。在某些操作中新博,我們也允許長(zhǎng)度為0薪夕,當(dāng)前不需要這種機(jī)制。
另外需要注意的是赫悄,table采用transient修飾原献。其本身實(shí)現(xiàn)了特點(diǎn)的序列化操作。而不是傳統(tǒng)的對(duì)象序列化方式埂淮。
1.2.2.2 entrySet
transient Set<Map.Entry<K,V>> entrySet;
實(shí)際上是個(gè)Map中元素的緩存姑隅,注意,抽象類AbstractMap使用的是keySet()和values()倔撞。
此時(shí)使用的是Map.Entry讲仰,這是一個(gè)接口,實(shí)際上Node就是這個(gè)接口的一個(gè)實(shí)現(xiàn)類痪蝇。
1.2.2.3 size
transient int size
hashMap中總元素的數(shù)目叮盘。
1.2.2.4 modCount
transient int modCount
這個(gè)HashMap在結(jié)構(gòu)上倍修改的次數(shù),結(jié)構(gòu)修改是指改變Map的數(shù)量或者修改內(nèi)部結(jié)構(gòu)霹俺,這樣可以讓建立在HashMap集合上的視圖的迭代器快速故障柔吼,返回ConcurrentModificationException。
1.2.2.5 threshold
int threshold;
調(diào)整table大小的笑一個(gè)閾值丙唧。其結(jié)果為當(dāng)前的容量*負(fù)載因子愈魏。
如果table沒(méi)有分配,則這個(gè)閾值為0或者為DEFAULT_INITIAL_CAPACITY想际。
1.2.2.6 loadFactor
final float loadFactor;
hashMap的負(fù)載因子培漏,只能被賦值一次。
1.3 重要的內(nèi)部類
仔細(xì)閱讀java的HashMap源代碼胡本,我們可以發(fā)現(xiàn)其內(nèi)部有很多內(nèi)部類牌柄。在之前閱讀其他源碼的時(shí)候也會(huì)存在這種情況。很多類雖然與外部的類重名侧甫,但是在其內(nèi)部采用另外一種方式來(lái)實(shí)現(xiàn)珊佣。我們來(lái)分別進(jìn)行分析。
1.3.1 Node
類Node實(shí)現(xiàn)了了Map.Entry接口披粟。這個(gè)接口中定義了一系列的方法咒锻,如getKey、getValue守屉、setValue惑艇、equals、hashCode等方法。還實(shí)現(xiàn)了比較器Comparator滨巴。
這是一個(gè)基本的bin的node節(jié)點(diǎn)思灌,TreeNode是其子類,在LinkedHashMap中恭取,其Entry是Node的子類泰偿。
/**
* Basic hash bin node, used for most entries. (See below for
* TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
*/
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
Node(int hash, K key, V value, Node<K,V> next) {
this.hash = hash;
this.key = key;
this.value = value;
this.next = next;
}
public final K getKey() { return key; }
public final V getValue() { return value; }
public final String toString() { return key + "=" + value; }
public final int hashCode() {
return Objects.hashCode(key) ^ Objects.hashCode(value);
}
public final V setValue(V newValue) {
V oldValue = value;
value = newValue;
return oldValue;
}
public final boolean equals(Object o) {
if (o == this)
return true;
if (o instanceof Map.Entry) {
Map.Entry<?,?> e = (Map.Entry<?,?>)o;
if (Objects.equals(key, e.getKey()) &&
Objects.equals(value, e.getValue()))
return true;
}
return false;
}
}
Node是構(gòu)成bucket中的鏈表的基本元素,其主要的屬性有key和value兩個(gè)泛型類型的成員變量秽荤。然后由于是鏈表結(jié)構(gòu),其還維護(hù)了一個(gè)next的指針柠横。指向其下一個(gè)元素窃款。
需要注意的是其hashcode方法:
Objects.hashCode(key) ^ Objects.hashCode(value)。
也就是說(shuō)牍氛,如果key和value為同一對(duì)象的話,Node的hashcode為0。
另外還實(shí)現(xiàn)了equals方法墓卦,當(dāng)二者的key和value都相等或者equalse的方法為true的時(shí)候返回true隙姿。
1.3.2 TreeNode
TreeNode則是hashMap樹(shù)化之后,組成樹(shù)的基本節(jié)點(diǎn)唉擂。需要注意的是餐屎,TreeNode繼承了LiknedHashMap.Entry
,LinkedHashMap.Entry又繼承了Node玩祟。由于TreeNode比較復(fù)雜腹缩,關(guān)于TreeNode的操作,我們?cè)诤竺嬖敿?xì)介紹空扎。
TreeNode的繼承結(jié)構(gòu)如下:
1.3.3 視圖型內(nèi)部類KeySet藏鹊、Values、EntrySet
在HashMap中存在一系列的試圖型的內(nèi)部類转锈,如KeySet盘寡、Values、EntrySet撮慨。
1.3.3.1 KeySet
HashMap提供了一個(gè)返回所有key的視圖竿痰。其繼承了AbstractSet,其中的元素還是Map.Entry<K,V>砌溺。實(shí)際上就是上面哈希表中的Node菇曲。或者TreeNode抚吠。
/**
* Returns a {@link Set} view of the keys contained in this map.
* The set is backed by the map, so changes to the map are
* reflected in the set, and vice-versa. If the map is modified
* while an iteration over the set is in progress (except through
* the iterator's own <tt>remove</tt> operation), the results of
* the iteration are undefined. The set supports element removal,
* which removes the corresponding mapping from the map, via the
* <tt>Iterator.remove</tt>, <tt>Set.remove</tt>,
* <tt>removeAll</tt>, <tt>retainAll</tt>, and <tt>clear</tt>
* operations. It does not support the <tt>add</tt> or <tt>addAll</tt>
* operations.
*
* @return a set view of the keys contained in this map
*/
public Set<K> keySet() {
Set<K> ks = keySet;
if (ks == null) {
ks = new KeySet();
keySet = ks;
}
return ks;
}
final class KeySet extends AbstractSet<K> {
public final int size() { return size; }
public final void clear() { HashMap.this.clear(); }
public final Iterator<K> iterator() { return new KeyIterator(); }
public final boolean contains(Object o) { return containsKey(o); }
public final boolean remove(Object key) {
return removeNode(hash(key), key, null, false, true) != null;
}
public final Spliterator<K> spliterator() {
return new KeySpliterator<>(HashMap.this, 0, -1, 0, 0);
}
public final void forEach(Consumer<? super K> action) {
Node<K,V>[] tab;
if (action == null)
throw new NullPointerException();
if (size > 0 && (tab = table) != null) {
int mc = modCount;
for (int i = 0; i < tab.length; ++i) {
for (Node<K,V> e = tab[i]; e != null; e = e.next)
action.accept(e.key);
}
if (modCount != mc)
throw new ConcurrentModificationException();
}
}
}
其注釋大意為常潮,提供了一個(gè)包含map中全部key的視圖,需要注意的是楷力,這里僅僅是一個(gè)視圖喊式,對(duì)集合的任何操作都會(huì)反應(yīng)到視圖中孵户,同理,在視圖中的任何操作也會(huì)反饋到Map上岔留∠目蓿可以在視圖中對(duì)元素進(jìn)行刪除等操作,具體支持的操作有Iterator.remove献联、Set.remove竖配、removeAll、retainAll和clear操作里逆。但是并不支持add和addAll操作进胯。
實(shí)際上通過(guò)源碼可以看到,支持的這些操作只是在類中對(duì)Map本身屬性或者方法的封裝原押。
我們對(duì)KeySet的最基本的操作就是通過(guò)keySet獲取一個(gè)迭代器胁镐,之后對(duì)其中的key進(jìn)行迭代。通過(guò)源代碼可以發(fā)現(xiàn)诸衔,如果調(diào)用KeySet.contains和調(diào)用EntrySet.contains實(shí)際上都是在對(duì)哈希表中的table進(jìn)行操作盯漂。只是在返回的時(shí)候在forEach中的accept方法中只傳入了key:
action.accept(e.key);
這是keySet與valueSet、EntrySet最大的區(qū)別笨农。
因此我們?cè)诒闅vHashMap的時(shí)候直接通過(guò)EntrySet就能完成就缆。而不是很多人認(rèn)為的需要先遍歷Key再get。
1.3.3.2 Values
Values與keySet同理谒亦,只是forEach方法中的accept不同违崇。
/**
* Returns a {@link Collection} view of the values contained in this map.
* The collection is backed by the map, so changes to the map are
* reflected in the collection, and vice-versa. If the map is
* modified while an iteration over the collection is in progress
* (except through the iterator's own <tt>remove</tt> operation),
* the results of the iteration are undefined. The collection
* supports element removal, which removes the corresponding
* mapping from the map, via the <tt>Iterator.remove</tt>,
* <tt>Collection.remove</tt>, <tt>removeAll</tt>,
* <tt>retainAll</tt> and <tt>clear</tt> operations. It does not
* support the <tt>add</tt> or <tt>addAll</tt> operations.
*
* @return a view of the values contained in this map
*/
public Collection<V> values() {
Collection<V> vs = values;
if (vs == null) {
vs = new Values();
values = vs;
}
return vs;
}
final class Values extends AbstractCollection<V> {
public final int size() { return size; }
public final void clear() { HashMap.this.clear(); }
public final Iterator<V> iterator() { return new ValueIterator(); }
public final boolean contains(Object o) { return containsValue(o); }
public final Spliterator<V> spliterator() {
return new ValueSpliterator<>(HashMap.this, 0, -1, 0, 0);
}
public final void forEach(Consumer<? super V> action) {
Node<K,V>[] tab;
if (action == null)
throw new NullPointerException();
if (size > 0 && (tab = table) != null) {
int mc = modCount;
for (int i = 0; i < tab.length; ++i) {
for (Node<K,V> e = tab[i]; e != null; e = e.next)
action.accept(e.value);
}
if (modCount != mc)
throw new ConcurrentModificationException();
}
}
}
我們可以看到其中forEach方法中:
action.accept(e.value);
此時(shí)就返回了value。需要注意的是values沒(méi)用remove方法诊霹。
1.3.3.2 EntrySet
其原理也與keySe和Values相同:
/**
* Returns a {@link Set} view of the mappings contained in this map.
* The set is backed by the map, so changes to the map are
* reflected in the set, and vice-versa. If the map is modified
* while an iteration over the set is in progress (except through
* the iterator's own <tt>remove</tt> operation, or through the
* <tt>setValue</tt> operation on a map entry returned by the
* iterator) the results of the iteration are undefined. The set
* supports element removal, which removes the corresponding
* mapping from the map, via the <tt>Iterator.remove</tt>,
* <tt>Set.remove</tt>, <tt>removeAll</tt>, <tt>retainAll</tt> and
* <tt>clear</tt> operations. It does not support the
* <tt>add</tt> or <tt>addAll</tt> operations.
*
* @return a set view of the mappings contained in this map
*/
public Set<Map.Entry<K,V>> entrySet() {
Set<Map.Entry<K,V>> es;
return (es = entrySet) == null ? (entrySet = new EntrySet()) : es;
}
final class EntrySet extends AbstractSet<Map.Entry<K,V>> {
public final int size() { return size; }
public final void clear() { HashMap.this.clear(); }
public final Iterator<Map.Entry<K,V>> iterator() {
return new EntryIterator();
}
public final boolean contains(Object o) {
if (!(o instanceof Map.Entry))
return false;
Map.Entry<?,?> e = (Map.Entry<?,?>) o;
Object key = e.getKey();
Node<K,V> candidate = getNode(hash(key), key);
return candidate != null && candidate.equals(e);
}
public final boolean remove(Object o) {
if (o instanceof Map.Entry) {
Map.Entry<?,?> e = (Map.Entry<?,?>) o;
Object key = e.getKey();
Object value = e.getValue();
return removeNode(hash(key), key, value, true, true) != null;
}
return false;
}
public final Spliterator<Map.Entry<K,V>> spliterator() {
return new EntrySpliterator<>(HashMap.this, 0, -1, 0, 0);
}
public final void forEach(Consumer<? super Map.Entry<K,V>> action) {
Node<K,V>[] tab;
if (action == null)
throw new NullPointerException();
if (size > 0 && (tab = table) != null) {
int mc = modCount;
for (int i = 0; i < tab.length; ++i) {
for (Node<K,V> e = tab[i]; e != null; e = e.next)
action.accept(e);
}
if (modCount != mc)
throw new ConcurrentModificationException();
}
}
}
可以通過(guò)entry進(jìn)行remove羞延。forEach方法中:
action.accept(e);
此時(shí)是整個(gè)對(duì)象。
1.3.4 迭代器HashIterator及并行迭代器HashMapSpliterator
hashMap中的剩余內(nèi)部類都是與迭代器相關(guān)的脾还,再單線程模式下伴箩,我們會(huì)使用Iterator,在前面說(shuō)到鄙漏,HashMap實(shí)際上是將內(nèi)部的table分成了KeySet嗤谚、Values、EntrySet等三部分視圖怔蚌。那么就需要KeyIterator巩步、ValueIterator和EntryIterator配合一起進(jìn)行迭代¤胗唬考慮到HashMap的廣泛使用場(chǎng)景椅野,如果要加快遍歷迭代的速度,可能會(huì)在多線程下進(jìn)行,因此HashMap內(nèi)部還提供了支持并行的迭代器KeySpliterator竟闪、ValueSpliterator离福、EntrySpliterator等。
具體相關(guān)內(nèi)容將在后續(xù)文章詳細(xì)分析炼蛤,限于篇幅限制妖爷,本文重點(diǎn)還是在HashMap基本的原理。
2.基本原理
2.1 HashMap的基本結(jié)構(gòu)
通過(guò)對(duì)上面第一部分的了解理朋,我們知道了HashMap的一些基本的概念和基本的操作絮识。在HashMap中,有如下幾個(gè)概念是需要理解的:
- bucket:HashMap中數(shù)組對(duì)應(yīng)的索引位置嗽上,或者稱為槽位次舌。實(shí)際上就是數(shù)組table的每一項(xiàng)元素。見(jiàn)下圖炸裆,在HashMap中垃它,根據(jù)每個(gè)Node的key的hashcode鲜屏,再與table的size取模烹看,計(jì)算出對(duì)應(yīng)的bucket。
- bin : 再HashMap中洛史,當(dāng)有多個(gè)元素的key都計(jì)算到同一個(gè)bucket之后惯殊,那么將通過(guò)鏈表或者紅黑樹(shù)的方式組合取來(lái)。這個(gè)鏈表/紅黑樹(shù)就被稱為一個(gè)bin也殖。
上圖僅僅表示HashMap的基本構(gòu)成土思。紅黑樹(shù)和HashMap的bucket總數(shù)不具備現(xiàn)實(shí)中的參考意義。
通過(guò)上圖我們可以知道忆嗜,HashMap實(shí)際上就是一個(gè)內(nèi)部由Node組成的數(shù)組加鏈表/紅黑樹(shù)結(jié)構(gòu)己儒。
當(dāng)鏈表的長(zhǎng)度大于或者等于8的時(shí)候,同時(shí)HashMap的size大于等于64的時(shí)候捆毫,入果此時(shí)沒(méi)有觸發(fā)HashMap擴(kuò)容闪湾,那么這個(gè)bin將由鏈表變成紅黑樹(shù)。鏈表轉(zhuǎn)紅黑樹(shù)的條件必須要注意绩卤,不一定為8就會(huì)直接轉(zhuǎn)途样。可能會(huì)導(dǎo)致table擴(kuò)容濒憋。而且何暇,再size小于64的時(shí)候,可能會(huì)出現(xiàn)bin的長(zhǎng)度大于8的情況凛驮。
如下代碼所示:
public static void main(String[] args) {
HashMap map = new HashMap();
map.put(new CustomerKey(1),1);
map.put(new CustomerKey(17),17);
map.put(new CustomerKey(33),33);
map.put(new CustomerKey(49),49);
map.put(new CustomerKey(65),65);
map.put(new CustomerKey(81),81);
map.put(new CustomerKey(97),97);
map.put(new CustomerKey(113),113);
map.put(new CustomerKey(129),129);
map.size();
}
private static class CustomerKey{
int key;
public CustomerKey(int key) {
this.key = key;
}
@Override
public int hashCode() {
return 1;
}
}
我們將所有元素的hashcode都變成1裆站,理論上,當(dāng)達(dá)到8個(gè)元素的時(shí)候,根據(jù)以往我們的認(rèn)知遏插,hashMap中的Node將會(huì)轉(zhuǎn)變?yōu)榧t黑樹(shù)捂贿。實(shí)際上,結(jié)果任然是一個(gè)很長(zhǎng)的鏈表:
所以我們必須正確理解Hashmap鏈表轉(zhuǎn)為紅黑樹(shù)的條件胳嘲。一定是當(dāng)鏈表長(zhǎng)度大于等8且size沒(méi)有觸發(fā)擴(kuò)容或者size低于64厂僧。
(bin > 8)&&(size < 64 || size < threshold )
2.2 HashMap中的位運(yùn)算操作
2.2.1 擴(kuò)容
在HashMap中,當(dāng)size觸發(fā)threshold之后了牛,會(huì)進(jìn)行擴(kuò)容颜屠,擴(kuò)容是采用的位移計(jì)算:
newCap = oldCap << 1
也就是說(shuō),HashMap的size大小通常是2的冪鹰祸。由于初始容量為16甫窟,每擴(kuò)容一次,容量就增加2倍蛙婴。需要注意的是粗井,HashMap沒(méi)有提供縮容機(jī)制。這個(gè)Hashmap只能擴(kuò)大街图,不能縮小浇衬。
2.2.2 bucket計(jì)算
HashMap的另外一個(gè)很重要的地方就是如何計(jì)算bucket,我們知道餐济,在常規(guī)情況下耘擂,我們可以通過(guò)取模%來(lái)實(shí)現(xiàn)。但是熟悉計(jì)算機(jī)底層的人都知道絮姆,計(jì)算機(jī)對(duì)于位運(yùn)算的操作是最快的醉冤。在計(jì)算機(jī)系統(tǒng)中,當(dāng)b為2的n次冪的時(shí)候:
//當(dāng)b為2的n次冪的時(shí)候
a % b = a & ( b - 1 )
那么hashMap既然其初始長(zhǎng)度為16而且每次都以左移擴(kuò)展篙悯。那么顯然符合上述規(guī)律蚁阳。我們計(jì)算bucket的時(shí)候的方法如下:
first = tab[(n - 1) & hash]
first表示hash計(jì)算后的bucket的節(jié)點(diǎn)。通過(guò)(n-1)&hash很快就計(jì)算出了bucket鸽照。
上述原理可以通過(guò)如下值計(jì)算螺捐,如189和205的hash值,按n=16來(lái)計(jì)算:
計(jì)算的結(jié)果都是13移宅。
2.2.3 split
我們知道归粉,hashmap會(huì)擴(kuò)容,那么擴(kuò)容之后漏峰,原來(lái)的元素怎么分配呢糠悼?如果沒(méi)有什么規(guī)律,比如擴(kuò)容每次加1浅乔,從8擴(kuò)容到9倔喂,顯然沒(méi)有任何規(guī)律可言埠忘,全部節(jié)點(diǎn)重新計(jì)算一遍取模。但是這種方式對(duì)于hashMap來(lái)說(shuō)顯然是低效的瓮床。在HashMap中,每次都是以2的倍數(shù)擴(kuò)容匣掸。也就是說(shuō)嵌器,每擴(kuò)容一次,容量增加2倍丹泉。這就有規(guī)律可循了筋岛∩睬埃肯定會(huì)將原有的節(jié)點(diǎn)分為兩部分,一部分還在原有的bucket耍目,而分出來(lái)的部分膏斤,將會(huì)是原來(lái)的索引加上新增的長(zhǎng)度oldsize+index。即將原來(lái)的元素分為高低位兩部分,低位繼續(xù)在原有的bucket,而高位則是擴(kuò)容出來(lái)的新位置鼎姐。
這個(gè)區(qū)分高低位的算法非常巧妙:
if ((e.hash & bit) == 0)
這里bit是原有的size大小。
我們用之前的例子進(jìn)行說(shuō)明该抒,擴(kuò)容之后為32 n=31計(jì)算如下:
我們可以看出,實(shí)際上就是在原來(lái)的基礎(chǔ)上增加了1位,所以高位的索引位置很容易就得出了是13+16=29。之后敞映,這兩個(gè)數(shù)字就是在增加的這位上的反應(yīng)不一樣導(dǎo)致會(huì)分配到不同的索引较曼。因此實(shí)際上我們只用關(guān)注這個(gè)新增加的位即可:
size位16:
新增加的位和全部數(shù)據(jù)計(jì)算之后要么為0磷斧,要么不為0,等于16捷犹〕诜梗考慮到代碼的通用性,實(shí)際上不管怎么擴(kuò)容萍歉,只要為0就說(shuō)明保持低位不變侣颂。否則就將該節(jié)點(diǎn)放到高位。
可見(jiàn)Hashmap不愧為大神級(jí)的代碼枪孩,在一開(kāi)始就考慮到了擴(kuò)容憔晒、索引藻肄、拆分的效率問(wèn)題。
2.2.4 為什么HashMap的DEFAULT_INITIAL_CAPACITY為16
這也是在面試中經(jīng)常會(huì)碰到的一個(gè)問(wèn)題拒担。關(guān)于這個(gè)問(wèn)題嘹屯,實(shí)際上是對(duì)計(jì)算機(jī)底層基本原理的考察,是一個(gè)非常有深度的問(wèn)題从撼。
我們知道州弟,HashMap中為了性能的提升,采用了很多位運(yùn)算來(lái)實(shí)現(xiàn)低零,如擴(kuò)容婆翔、索引、拆分等掏婶。正如上面三部分所示啃奴。因此這就要求hashmap的初始化長(zhǎng)度為2的冪。如果不是雄妥,那么第一次擴(kuò)容之后split和bucket就會(huì)出問(wèn)題纺腊。
那么其初始長(zhǎng)度必須是 2、4茎芭、8揖膜、16、32等等梅桩。這些2的冪來(lái)構(gòu)成壹粟。
但是為什么是16呢?這個(gè)沒(méi)有資料來(lái)說(shuō)明宿百,個(gè)人覺(jué)得趁仙,應(yīng)該是個(gè)經(jīng)驗(yàn)值。如果這個(gè)值太小垦页,那么一上來(lái)就會(huì)擴(kuò)容雀费,如果太大則會(huì)造成空間浪費(fèi),16顯然是個(gè)中間的數(shù)字痊焊。
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
注釋中也說(shuō)了盏袄,這個(gè)數(shù)字必須是2的冪。
3.構(gòu)造函數(shù)
現(xiàn)在對(duì)HashMap的構(gòu)造函數(shù)進(jìn)行分析:
3.1 HashMap()
我們使用最多的就是這個(gè)無(wú)參的構(gòu)造函數(shù):
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
它只會(huì)設(shè)置默認(rèn)的負(fù)載因子為0.75.默認(rèn)容量為16薄啥。
3.2 HashMap(int initialCapacity)
Hashmap還提供了指定初始化容量的構(gòu)造函數(shù):
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and the default load factor (0.75).
*
* @param initialCapacity the initial capacity.
* @throws IllegalArgumentException if the initial capacity is negative.
*/
public HashMap(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
其默認(rèn)的負(fù)載因子為0.75辕羽。
3.3 HashMap(int initialCapacity, float loadFactor)
可以同時(shí)指定初始化容量和負(fù)載因子:
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param initialCapacity the initial capacity
* @param loadFactor the load factor
* @throws IllegalArgumentException if the initial capacity is negative
* or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}
對(duì)initialCapacity的有效范圍進(jìn)行判斷,其范圍位于0-MAXIMUM_CAPACITY之間垄惧,否則刁愿,則會(huì)拋出IllegalArgumentException異常。loadFactor不能小于0或者不是一個(gè)數(shù)字到逊。之后根據(jù)數(shù)量計(jì)算tableSizeFor铣口。
3.4 HashMap(Map<? extends K, ? extends V> m)
HashMap也可以將另外一個(gè)HashMap直接變成一個(gè)新的HashMap滤钱。
/**
* Constructs a new <tt>HashMap</tt> with the same mappings as the
* specified <tt>Map</tt>. The <tt>HashMap</tt> is created with
* default load factor (0.75) and an initial capacity sufficient to
* hold the mappings in the specified <tt>Map</tt>.
*
* @param m the map whose mappings are to be placed in this map
* @throws NullPointerException if the specified map is null
*/
public HashMap(Map<? extends K, ? extends V> m) {
this.loadFactor = DEFAULT_LOAD_FACTOR;
putMapEntries(m, false);
}
這個(gè)構(gòu)造函數(shù)底層調(diào)用的是putMapEntries。其可以將Map的Entrys全部插入脑题。
/**
* Implements Map.putAll and Map constructor.
*
* @param m the map
* @param evict false when initially constructing this map, else
* true (relayed to method afterNodeInsertion).
*/
final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
int s = m.size();
//當(dāng)插入的map不為空的時(shí)候
if (s > 0) {
//如果table為空
if (table == null) { // pre-size
//計(jì)算負(fù)載因子
float ft = ((float)s / loadFactor) + 1.0F;
int t = ((ft < (float)MAXIMUM_CAPACITY) ?
(int)ft : MAXIMUM_CAPACITY);
if (t > threshold)
threshold = tableSizeFor(t);
//計(jì)算出閾值菩暗。
}
//如果插入的size大于閾值則擴(kuò)容。
else if (s > threshold)
resize();
//遍歷插入
for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
K key = e.getKey();
V value = e.getValue();
putVal(hash(key), key, value, false, evict);
}
}
}
同時(shí)putMapEntries也是putAll內(nèi)部的實(shí)現(xiàn)方法旭蠕。也就是說(shuō)putAll與通過(guò)new一個(gè)構(gòu)造函數(shù)等價(jià)停团。
4.重要方法
4.1 tableSizeFor
這個(gè)方法的作用是找到傳入的cap的最小2次冪。
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
代碼比較難懂掏熬,我們以10為例佑稠,查看整個(gè)計(jì)算過(guò)程:
可以看到計(jì)算結(jié)果為16。
我們?cè)倏匆粋€(gè)最大值的計(jì)算:
通過(guò)這個(gè)計(jì)算過(guò)程旗芬,我們可以有規(guī)律的發(fā)現(xiàn)舌胶,實(shí)際上就是將當(dāng)前最高位和后面所有位都變成1的過(guò)程。之后再加上1疮丛。
4.2 get
get是HashMap中最基本的方法幔嫂。
/**
* Returns the value to which the specified key is mapped,
* or {@code null} if this map contains no mapping for the key.
*
* <p>More formally, if this map contains a mapping from a key
* {@code k} to a value {@code v} such that {@code (key==null ? k==null :
* key.equals(k))}, then this method returns {@code v}; otherwise
* it returns {@code null}. (There can be at most one such mapping.)
*
* <p>A return value of {@code null} does not <i>necessarily</i>
* indicate that the map contains no mapping for the key; it's also
* possible that the map explicitly maps the key to {@code null}.
* The {@link #containsKey containsKey} operation may be used to
* distinguish these two cases.
*
* @see #put(Object, Object)
*/
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
get方法中隱藏的有兩個(gè)方法,getNode和hash誊薄。
/**
* Implements Map.get and related methods.
*
* @param hash hash for key
* @param key the key
* @return the node, or null if none
*/
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
//通過(guò)位運(yùn)算計(jì)算bucket位置履恩,這一點(diǎn)非常重要
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // always check first node
//如果hash相同,則判斷key是否相等
((k = first.key) == key || (key != null && key.equals(k))))
return first;
//如果不等則遍歷鏈表或者紅黑樹(shù)
if ((e = first.next) != null) {
//首先判斷是否為紅黑樹(shù)呢蔫,紅黑樹(shù)則用紅黑樹(shù)的檢索方法
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
//遍歷鏈表
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
get方法中核心的就是根據(jù)hash值切心,從鏈表或者紅黑樹(shù)中搜索結(jié)果。如果為紅黑樹(shù)片吊,則通過(guò)紅黑樹(shù)的方式查找绽昏。因?yàn)榧t黑樹(shù)是顆排序的樹(shù),紅黑樹(shù)的效率會(huì)比鏈表全表掃描有顯著提高俏脊。
4.3 hash
我們還需要注意其hash方法:
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
其注釋大意為全谤,計(jì)算hashcode,將較高的位擴(kuò)展到較低的位爷贫,因?yàn)閔ash表的長(zhǎng)度都是2的冪认然,而我們?cè)谥癰ucket索引計(jì)算的時(shí)候可以發(fā)現(xiàn),實(shí)際上大于size長(zhǎng)度的高位沸久,根本沒(méi)有參與計(jì)算季眷。因此,我們需要一個(gè)折衷的辦法卷胯,將高位部分也能參與到計(jì)算中來(lái)。這樣可以使得數(shù)據(jù)更加平均的分布在系統(tǒng)中威酒。
我們來(lái)看看為什么是這樣窑睁。在之前的例子中挺峡,我們演示了index的方法。
我們可以看到担钮,在上述例子中橱赠,如果某一組數(shù)據(jù),其變化主要在高位箫津,而低位不太變化的話狭姨,無(wú)論怎么擴(kuò)容,到會(huì)導(dǎo)致計(jì)算出的bucket為同一個(gè)苏遥。這樣的數(shù)據(jù)在現(xiàn)實(shí)中絕對(duì)存在饼拍,但是背離了hashMap想要將數(shù)據(jù)均勻分布的初衷。之前的計(jì)算index的方法田炭,只與低位相關(guān)师抄。當(dāng)n=2時(shí),下標(biāo)的運(yùn)算取決于最低位教硫。當(dāng)n=4時(shí)取決于低2位叨吮。n=8時(shí)取決于最低3位。n=16時(shí)取決于最低4位瞬矩,n=32時(shí)取決最低5位茶鉴。
我們?cè)趯?shí)際使用的時(shí)候,hashmap的bucket很少能達(dá)到高位部分景用,基本上都不會(huì)有這么大蛤铜,那么實(shí)際上也就是說(shuō),通常情況下丛肢,hashmap的key取bucket的算法只與低位有關(guān)系围肥。這樣勢(shì)必會(huì)造成數(shù)據(jù)的不平均分布。為了避免這個(gè)問(wèn)題蜂怎。hashmap又做了再次優(yōu)化穆刻,將最高位與最低位混合通過(guò)異或運(yùn)算盡量多的讓每一位都參與其中。
如上圖我們看到杠步,這樣操作之后氢伟,新組成的數(shù)字,低位就經(jīng)過(guò)了高位參與計(jì)算幽歼。得到的數(shù)據(jù)就會(huì)更平均朵锣。
這就是為什么需要>>>16的原因。就是通過(guò)>>>16取出高位甸私。之后我們?cè)倏礊槭裁匆胇而不是&和|呢诚些。因?yàn)楫惢虿僮鞯玫降慕Y(jié)果的概率是平均的:
不會(huì)造成結(jié)果的不平均。
由此我們可以看出,hashmap的作者為了提升hashMap的性能诬烹,可以說(shuō)是無(wú)所不用其極砸烦。也只有都考慮到這些情況,才能寫(xiě)出高效的代碼绞吁。這讓人想到了 Disruptor框架幢痘。Disruptor的源碼絕對(duì)值得一看。
4.4 put
get也是我們使用HashMap最重要的方法之一家破。
/**
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with <tt>key</tt>, or
* <tt>null</tt> if there was no mapping for <tt>key</tt>.
* (A <tt>null</tt> return can also indicate that the map
* previously associated <tt>null</tt> with <tt>key</tt>.)
*/
public V put(K key, V value) {
//此處調(diào)用的是經(jīng)過(guò)高位混淆的hash方法
return putVal(hash(key), key, value, false, true);
}
底層使用的是putVal方法颜说。
/**
* Implements Map.put and related methods.
*
* @param hash hash for key
* @param key the key
* @param value the value to put
* @param onlyIfAbsent if true, don't change existing value
* @param evict if false, the table is in creation mode.
* @return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
//如果table為null則直接擴(kuò)容,這個(gè)地方可以通過(guò)new一個(gè)大小為0的HashMap驗(yàn)證
n = (tab = resize()).length;
//使用&計(jì)算bucket的索引汰聋,如果為空則當(dāng)前創(chuàng)建的節(jié)點(diǎn)就是根節(jié)點(diǎn)
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
//反之就要進(jìn)行鏈表或者紅黑樹(shù)處理
Node<K,V> e; K k;
//判斷是否相等
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
//如果為紅黑樹(shù)則按紅黑樹(shù)的方式處理
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
//反之則按鏈表的方式處理
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
//判斷是否需要擴(kuò)容
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
// 上邊已經(jīng)檢查完map中是否存在對(duì)應(yīng)key的Node節(jié)點(diǎn)门粪,不存在的新創(chuàng)建節(jié)點(diǎn),這里處理下存在對(duì)應(yīng)key的節(jié)點(diǎn)數(shù)據(jù)
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
// 子類實(shí)現(xiàn)方法的話可以進(jìn)行對(duì)應(yīng)的后置操作
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
//插入之后的平衡操作
afterNodeInsertion(evict);
return null;
}
4.5 resize
resize方法對(duì)hashMap的table進(jìn)行擴(kuò)容马僻。這是一個(gè)非常重要的方法庄拇,在HashMap中,如果觸發(fā)了閾值threshold韭邓,則會(huì)調(diào)用resize方法措近。在實(shí)際上,在new HashMap指定容量的時(shí)候女淑,table并沒(méi)有賦值瞭郑,是空值,只是根據(jù)傳入的cap計(jì)算出了閾值threshold鸭你。
/**
* Initializes or doubles table size. If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
*
* @return the table
*/
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
//oldCap為table的size
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
//判斷當(dāng)oldCap大于0
if (oldCap > 0) {
//如果oldCap為最大值屈张,則設(shè)置threshold也為最大值。
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
//newCap 為oldCap的二倍袱巨。
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
//newThr也增加2倍
newThr = oldThr << 1; // double threshold
}
//如果oldCap為0但是oldThr大于0的話阁谆,說(shuō)明此時(shí)為第一次put 將newCap的值改為oldThr
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else {
//反之則設(shè)置默認(rèn)值
// zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
//如果呢問(wèn)Thr為0 則根據(jù)默認(rèn)值計(jì)算
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
//此時(shí)才重新創(chuàng)建了新的數(shù)組
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
//如果舊數(shù)組不為空則進(jìn)行copy
if (oldTab != null) {
//遍歷舊數(shù)組
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
//如果只有一個(gè)元素,則直接計(jì)算bucket在新數(shù)組中的index
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
//反之則進(jìn)行TreeNode判斷愉老,為TreeNode則拆分
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
//反之則按鏈表遍歷拆分
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
//循環(huán)遍歷鏈表
do {
next = e.next;
//通過(guò)&將鏈表進(jìn)行高低位計(jì)算
//如果結(jié)果為0則在低位
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
//反之在高位
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
//判斷高低位結(jié)果是否為空场绿,并設(shè)置到正確的bucket
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
//高位為j+oldCap
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
這里對(duì)鏈表的拆分也采用了高低位計(jì)算,e.hash & oldCap嫉入。后面在TreeNode部分也有詳細(xì)的介紹焰盗。
4.6 treeifyBin
此方法是將鏈表轉(zhuǎn)紅黑樹(shù)的方法。
/**
* Replaces all linked nodes in bin at index for given hash unless
* table is too small, in which case resizes instead.
*/
final void treeifyBin(Node<K,V>[] tab, int hash) {
int n, index; Node<K,V> e;
//判斷是否需要擴(kuò)容
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
resize();
//否則計(jì)算槽位
else if ((e = tab[index = (n - 1) & hash]) != null) {
TreeNode<K,V> hd = null, tl = null;
//遍歷鏈表咒林,重新替換為TreeNode
do {
TreeNode<K,V> p = replacementTreeNode(e, null);
if (tl == null)
hd = p;
else {
p.prev = tl;
tl.next = p;
}
tl = p;
} while ((e = e.next) != null);
if ((tab[index] = hd) != null)
//調(diào)用紅黑樹(shù)的樹(shù)化方法
hd.treeify(tab);
}
}
// For treeifyBin
TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) {
return new TreeNode<>(p.hash, p.key, p.value, next);
}
此處將鏈表先按鏈表次序轉(zhuǎn)為TreeNode節(jié)點(diǎn)熬拒,此時(shí)的TreeNode節(jié)點(diǎn)還是一個(gè)與原來(lái)相同的鏈表,只是將元素類型進(jìn)行了替換垫竞。之后再調(diào)用TreeNode的樹(shù)化方法澎粟。那么這個(gè)新組成的樹(shù),同時(shí)具有了鏈表和紅黑樹(shù)的特性。在拆分遍歷的時(shí)候可以用鏈表捌议,在查找的時(shí)候可以用紅黑樹(shù)溉知。
假定有如下紅黑樹(shù)卒落,注意,此時(shí)的具體值假定為插入的序號(hào)唬涧,重點(diǎn)在表示鏈表的結(jié)構(gòu)譬正,僅做為舉例參考宫补。不具有實(shí)際意義。
那么鏈表實(shí)際上在這個(gè)紅黑樹(shù)上還存在曾我。通過(guò)next指針指向粉怕。
4.7 remove
remove方法是將指定的key從HashMap的table中移除。
/**
* Removes the mapping for the specified key from this map if present.
*
* @param key key whose mapping is to be removed from the map
* @return the previous value associated with <tt>key</tt>, or
* <tt>null</tt> if there was no mapping for <tt>key</tt>.
* (A <tt>null</tt> return can also indicate that the map
* previously associated <tt>null</tt> with <tt>key</tt>.)
*/
public V remove(Object key) {
Node<K,V> e;
//通過(guò)前面混淆的hash方法來(lái)確定key的hash值
return (e = removeNode(hash(key), key, null, false, true)) == null ?
null : e.value;
}
底層調(diào)用了removeNode方法:
/**
* Implements Map.remove and related methods.
*
* @param hash hash for key
* @param key the key
* @param value the value to match if matchValue, else ignored
* @param matchValue if true only remove if value is equal
* @param movable if false do not move other nodes while removing
* @return the node, or null if none
*/
final Node<K,V> removeNode(int hash, Object key, Object value,
boolean matchValue, boolean movable) {
Node<K,V>[] tab; Node<K,V> p; int n, index;
if ((tab = table) != null && (n = tab.length) > 0 &&
//此處通過(guò)&計(jì)算bucket
(p = tab[index = (n - 1) & hash]) != null) {
Node<K,V> node = null, e; K k; V v;
//判斷是否相等
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
node = p;
//如果不等則判斷next是否為空
else if ((e = p.next) != null) {
//不為空的話判斷是否紅黑樹(shù)抒巢,紅黑樹(shù)則采用紅黑樹(shù)的方法
if (p instanceof TreeNode)
node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
else {
//反之遍歷鏈表查找元素
do {
if (e.hash == hash &&
((k = e.key) == key ||
(key != null && key.equals(k)))) {
node = e;
break;
}
p = e;
} while ((e = e.next) != null);
}
}
if (node != null && (!matchValue || (v = node.value) == value ||
(value != null && value.equals(v)))) {
if (node instanceof TreeNode)
((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
else if (node == p)
tab[index] = node.next;
else
p.next = node.next;
++modCount;
--size;
//后續(xù)操作
afterNodeRemoval(node);
return node;
}
}
return null;
}
由于hashmap不會(huì)縮容贫贝,因此remove方法相對(duì)簡(jiǎn)單。最多只會(huì)判斷紅黑樹(shù)是否小于6的時(shí)候要轉(zhuǎn)換為鏈表蛉谜。
4.8 clear
與前面的方法相比稚晚,clear太簡(jiǎn)單了:
/**
* Removes all of the mappings from this map.
* The map will be empty after this call returns.
*/
public void clear() {
Node<K,V>[] tab;
modCount++;
if ((tab = table) != null && size > 0) {
size = 0;
for (int i = 0; i < tab.length; ++i)
tab[i] = null;
}
}
clear的原理就是將整個(gè)table遍歷然后變成null,這樣會(huì)回收整個(gè)HashMap型诚,只是table的容量不會(huì)再減少客燕。
4.8 containsValue
/**
* Returns <tt>true</tt> if this map maps one or more keys to the
* specified value.
*
* @param value value whose presence in this map is to be tested
* @return <tt>true</tt> if this map maps one or more keys to the
* specified value
*/
public boolean containsValue(Object value) {
Node<K,V>[] tab; V v;
if ((tab = table) != null && size > 0) {
for (int i = 0; i < tab.length; ++i) {
for (Node<K,V> e = tab[i]; e != null; e = e.next) {
if ((v = e.value) == value ||
(value != null && value.equals(v)))
return true;
}
}
}
return false;
}
contaninsValue方法也比較簡(jiǎn)單,只需要按table兩層循環(huán)遍歷就好狰贯,沒(méi)什么特殊的地方也搓。
5.總結(jié)
本文全面的對(duì)HashMap的源碼進(jìn)行了分析。我們可以看到涵紊,再Hashmap中傍妒,為了提升HashMap的性能而做的各種努力。在本文的結(jié)尾摸柄。我們通過(guò)問(wèn)答的方式颤练,再次對(duì)本文的重點(diǎn)進(jìn)行回顧。
- 1.HashMap的基本結(jié)構(gòu)
這個(gè)問(wèn)題可以參考2.1部分塘幅,數(shù)組加鏈表/紅黑樹(shù)昔案。需要注意的是,紅黑樹(shù)實(shí)際上還有一層鏈表結(jié)構(gòu)电媳。如果有面試中遇到此問(wèn)題踏揣,就要從TreeNode的繼承結(jié)構(gòu)開(kāi)始,TreeNode繼承了LinkedHashMap.Entry 而LinkedHashMap.Entry又繼承了Node匾乓,Node再繼承了Map.Entry捞稿,每層都增加了若干屬性。TreeNode的節(jié)點(diǎn)大小大約是Node節(jié)點(diǎn)的2倍。因此娱局,TreeNode中還有原有的鏈表關(guān)系彰亥,這個(gè)再split的時(shí)候非常有用。 - 2.HashMap為什么初始化的大小為16
這是阿里面試的一個(gè)重量級(jí)面試題衰齐,我們?cè)谇懊娴诙糠衷敿?xì)有過(guò)說(shuō)明任斋。再此簡(jiǎn)單回顧,由于HashMap為了進(jìn)一步提升性能耻涛,大量的采用了位運(yùn)算废酷,這體現(xiàn)在擴(kuò)容、拆分抹缕、以及索引計(jì)算的過(guò)程中澈蟆。因此,這就要求實(shí)際的長(zhǎng)度必須為2的冪卓研。HashMap實(shí)際上指定長(zhǎng)度的時(shí)候趴俘,其table并不會(huì)倍創(chuàng)建,而是在resize的過(guò)程中奏赘,根據(jù)閾值進(jìn)行計(jì)算的寥闪。那么滿足2的冪的值只有2、4志珍、8橙垢、16、32等伦糯,太大則浪費(fèi)空間柜某,太小則會(huì)導(dǎo)致程序不斷擴(kuò)容。因此16是個(gè)折衷的數(shù)字敛纲。需要注意的是喂击,HashMap的數(shù)組大小只能在resize過(guò)程中計(jì)算,這個(gè)resize方法中根據(jù)閾值來(lái)計(jì)算的淤翔,這樣一來(lái)翰绊,tableSizeFor方法只會(huì)計(jì)算出比當(dāng)前cap的下一個(gè)2的冪。此處最好還介紹下tableSizeFor的過(guò)程旁壮。 - 3.HashMap為什么樹(shù)化的閾值是8
這也是在面試過(guò)程中容易出現(xiàn)的問(wèn)題监嗜,樹(shù)化的閾值,根據(jù)注釋中的描述可以知道抡谐,在一個(gè)比較離散的哈希函數(shù)中裁奇,哈希沖突的概率服從泊松分布,根據(jù)泊松分布的公式麦撵,再結(jié)合當(dāng)前HashMap的特點(diǎn)刽肠,其計(jì)算公式:
(exp(-0.5)* pow(0.5溃肪,k)/ factorial(k)
當(dāng)為8的時(shí)候概率已經(jīng)小于千萬(wàn)分之一。所以通常認(rèn)為再8的時(shí)候轉(zhuǎn)換為樹(shù)比較合適音五。
4.HashMap中采用了哪些位運(yùn)算操作惫撰,分別又什么用
這一點(diǎn)參考第二部分,都進(jìn)行了總結(jié)躺涝,主要有厨钻,一、位移擴(kuò)容诞挨,左移直接擴(kuò)大兩倍莉撇。二呢蛤、根據(jù)(hashcode&(oldsize-1))計(jì)算bucket的索引惶傻。三、擴(kuò)容的時(shí)候根據(jù)split拆分為高低位的計(jì)算其障,(hashcode&oldsize)為0則在低位银室,不為0則在高位,高位的位置為index+oldsize励翼。四蜈敢、還有一個(gè)地方即hash方法中,采用高低位混淆汽抚。hashcode^(hashcode>>>16) 抓狭。
詳細(xì)見(jiàn)上文的各個(gè)章節(jié)。5.HashMap樹(shù)化的條件
HashMap并不會(huì)在鏈表長(zhǎng)度大于8的時(shí)候就變成紅黑樹(shù)造烁,此外還有兩個(gè)條件否过,要么size大于64,要么觸發(fā)擴(kuò)容惭蟋。詳情參見(jiàn)前文苗桂。
以上就是對(duì)HashMap源碼進(jìn)行閱讀得到的一些結(jié)論。HashMap的源碼需要反復(fù)閱讀告组。