抽絲剝繭—Go 哈希 Map 的鬼魅神功

Go 語言中的哈希 Map 是江湖上極厲害的一門武功,其入門簡單蛋欣，即便是掌握到了 2丐巫、3 層也具有四兩撥千斤的神奇功效.因此成為江湖人士競相研習(xí)的技藝锨侯，風(fēng)頭一時無兩.
但即便是成名已久的高手,也鮮有能修煉到最高層的.
本文不僅介紹了哈希 Map 基本的使用方式,還深入源碼介紹了哈希 Map 的至高心法.希望本文有助于你對 Go 哈希 Map 的理解臻于化境. ## 哈希表
Go 語言中的 Map,又稱為 Hash map(哈希表) 是使用頻率極高的一種數(shù)據(jù)結(jié)構(gòu)饺谬，重要程度高到令人發(fā)指捂刺。
哈希表的原理是將多個 key/value 對分散存儲在 buckets(桶) 中。給定一個 key商蕴，哈希算法會計算出鍵值對存儲的位置叠萍。時常會通過兩步完成,偽代碼如圖所示: hash = hashfunc(key) index = hash % array_size
在此偽代碼中,第一步計算通過 hash 算法計算 key 的 hash 值,其結(jié)果與桶的數(shù)量無關(guān)。
接著通過執(zhí)行取模運算得到0 - array_size?1 之間的 index 序號绪商。
在實踐中苛谷，我們時常將 Map 看做 o(1) 時間復(fù)雜度的操作,通過一個鍵 key 快速尋找其唯一對應(yīng)的 value。 ## Map 基本操作 ### Map 的聲明與初始化首先格郁，來看一看 map 的基本使用方式腹殿。map 聲明的第一種方式如下 var hash map[T]T 其并未對 map 進行初始化的操作,其值為 nil,因此一旦進行hash[key]=alue這樣的賦值操作就會報錯。 panic(plainError("assignment to entry in nil map")) 比較意外的是 Go 語言允許對為 nil 的 map 進行訪問:hash["Go"],雖然其結(jié)果顯然毫無意義.

map 的第二種聲明方式通過 make 進行例书。make 的第二個參數(shù)中代表初始化創(chuàng)建 map 的長度锣尉。當(dāng) NUMBER 為空時，代表默認(rèn)長度為 0.

var hash = make(map[T]T,NUMBER)

此種方式可以正常的對 map 進行訪問與賦值在 map 初始化時决采，還具有字面量形式初始化的方式自沧。其在創(chuàng)建 map 時即在其中添加了元素。

var country = map[string]string{
    "China":  "Beijing",
    "Japan":  "Tokyo",
    "India":  "New Delhi",
    "France": "Paris",
    "Italy":  "Rome",
}

rating := map[string]float64{"c": 5, "Go": 4.5, "Python": 4.5, "C++": 3}

Map 的訪問

map 可以進行兩種形式的訪問:

v  := hash[key]

以及

v,ok := map[key]

當(dāng)返回 2 個參數(shù)時，第 2 個參數(shù)代表當(dāng)前 key 在 map 中是否存在拇厢。不用驚訝于為什么同樣的訪問可以即返回一個值又返回兩個值爱谁，這是在編譯時做到的，后面會介紹孝偎。

Map 的賦值

map 的賦值語法相對簡單

hash[key] = value

其代表將 value 與給 map1 哈希表中的 key 綁定在一起

Map 的刪除

map 的刪除需要用到 delete访敌，其是 Go 語言中的關(guān)鍵字，用于進行 map 的刪除操作衣盾，形如:

delete(hash,key)

可以對相同的 key 進行多次的刪除操作寺旺，而不會報錯

關(guān)于 map 中的 key

很容易理解,如果 map 中的 key 都沒有辦法比較是否相同,那么就不能作為 map 的 key。關(guān)于 Go 語言中的可比較性势决，直接閱讀官方文檔即可:https://golang.org/ref/spec#Comparison_operators 下面簡單列出一些類型的可比較性

布爾值是可比較的
整數(shù)值可比較的
浮點值是可比較的
復(fù)數(shù)值是可比較的
字符串值是可比較
指針值是可比較的阻塑。如果兩個指針值指向相同的變量，或者兩個指針的值均為nil徽龟，則它們相等叮姑。
通道值是可比較的。如果兩個通道值是由相同的make調(diào)用創(chuàng)建的据悔，或者兩個值都為nil传透。
接口值是可比較的。如果兩個接口值具有相同的動態(tài)類型和相等的動態(tài)值极颓，或者兩個接口值都為nil朱盐，則它們相等。
如果結(jié)構(gòu)的所有字段都是可比較的菠隆，則它們的值是可比較的兵琳。
如果數(shù)組元素類型的值可比較，則數(shù)組值可比較骇径。如果兩個數(shù)組的對應(yīng)元素相等躯肌，則它們相等。
切片破衔、函數(shù)清女、map是不可比較的。

關(guān)于 map 并發(fā)沖突問題

和其他語言有些不同的是晰筛，map 并不支持并發(fā)的讀寫,因此下面的操作是錯誤的 aa := make(map[int]int) go func() { for{ aa[0] = 5 } }() go func() { for{ _ = aa[1] } }() 報錯: fatal error: concurrent map read and map write
Go 語言只支持并發(fā)的讀取 Map.因此下面的函數(shù)是沒有問題的 aa := make(map[int]int) go func() { for{ _ = aa[2] } }() go func() { for{ _ = aa[1] } }()

Go 語言為什么不支持并發(fā)的讀寫嫡丙，是一個頻繁被提起的問題。我們可以在 Go 官方文檔Frequently Asked Questions找到問題的答案（https://golang.org/doc/faq#atomic_maps）


Map被設(shè)計為不需要從多個goroutine安全訪問读第，在實際情況下曙博，Map可能是某些已經(jīng)同步的較大數(shù)據(jù)結(jié)構(gòu)或計算的一部分。
因此怜瞒，要求所有Map操作都互斥將減慢大多數(shù)程序的速度父泳，而只會增加少數(shù)程序的安全性。

即這樣做的目的是為了大多數(shù)情況下的效率。

Map 在運行時

介紹了 Map 的基本操作,本節(jié)介紹一下 Map 在運行時的行為尘吗，以便能夠深入 Map 內(nèi)部的實現(xiàn)機制逝她。
明白了 Map 的實現(xiàn)機制，有助于更加靈活的使用 Map 和進行深層次的調(diào)優(yōu)過程睬捶。由于代碼里面的邏輯關(guān)系關(guān)聯(lián)比較復(fù)雜。本節(jié)會首先用多張圖片幫助讀者有一個抽象的理解近刘。 Go 語言 Map 的底層實現(xiàn)如下所示:

// A header for a Go map.
type hmap struct {
    count     int // # live cells == size of map.  Must be first (used by len() builtin)
    flags     uint8
    B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
    hash0     uint32 // hash seed

    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

    extra *mapextra // optional fields
}

其中：

count 代表 Map 中元素的數(shù)量.
flags 代表當(dāng)前 Map 的狀態(tài)（是否處于正在寫入的狀態(tài)等).
B 對數(shù)形式表示的當(dāng)前 Map 中桶的數(shù)量, 2^B = Buckets size.
noverflow 為 Map 中溢出桶的數(shù)量.當(dāng)溢出桶太多的時候擒贸，Map 會進行same-size map growth.其實質(zhì)是為了避免溢出桶過大導(dǎo)致的內(nèi)存泄露問題.
hash0 代表生成 hash 的隨機數(shù)種子.
buckets 指向了當(dāng)前 Map 對應(yīng)的桶的指針.
oldbuckets 是在 Map 進行擴容的時候存儲舊桶的.當(dāng)所有的舊桶中的數(shù)據(jù)都已經(jīng)轉(zhuǎn)移到了新桶,則清空挡逼。
nevacuate 在擴容的時候使用养渴。用于標(biāo)記當(dāng)前舊桶中小于 nevacuate 的桶都已經(jīng)轉(zhuǎn)移到了新桶.
extra 存儲 Map 中的溢出桶

代表桶的bmap結(jié)構(gòu)在運行時只列出了其首個字段: 即一個固定長度為 8 的數(shù)組。此字段順序存儲 key 的哈希值的前 8 位.

type bmap struct {
    tophash [bucketCnt]uint8
}

可能會有疑問痢艺，桶中存儲的 key 和 value 值哪里去了案淋？這是因為 Map 在編譯時即確定了 map 中 key,value,桶的大小座韵。因此在運行時僅僅通過指針操作即可找到特定位置的元素。桶本身在存儲的 tophash 字段之后踢京，會存儲 key 數(shù)組以及 value 數(shù)組

type bmap struct {
    tophash [bucketCnt]uint8
    key   [bucketCnt]T
    value [bucketCnt]T
    ....
}

image

Go 語言選擇將 key 與 value 分開存儲而不是 key/value/key/value 的形式誉碴，是為了在字節(jié)對齊的時候能夠壓縮空間。

在進行hash[key]此類的的 Map 訪問操作時瓣距，會首先找到桶的位置黔帕，如下為偽代碼操作.

hash = hashfunc(key)
index = hash % array_size

接著遍歷 tophash 數(shù)組,如果數(shù)組中找到了相同的 hash,那么就可以接著通過指針的尋址操作找到 key 與 value 值

image

= value賦值操作時,當(dāng)指定桶中的數(shù)據(jù)超過了8個，并不會直接就新開辟一個新桶,而是會將數(shù)據(jù)放置到溢出桶中每個桶的最后還存儲了overflow` 即溢出桶的指針

在正常情況下蹈丸，數(shù)據(jù)是很少會跑到溢出桶里面去的成黄。同理，我們也可以知道逻杖，在 Map 的查找操作時奋岁，如果 key 的 hash 在指定桶的 tophash 數(shù)組中不存在，還會遍歷溢出桶中的數(shù)據(jù)荸百。

image

后面我們會看到闻伶，如果一開始初始化 map 的數(shù)量比較大。則 map 提前創(chuàng)建好一些溢出桶存儲在extra *mapextra 字段.

type mapextra struct {
    overflow    *[]*bmap
    oldoverflow *[]*bmap
    nextOverflow *bmap
}

這樣當(dāng)出現(xiàn)溢出現(xiàn)象是管搪，就可以用提前創(chuàng)建好的桶而不用申請額外的內(nèi)存空間虾攻。當(dāng)預(yù)分配的溢出桶使用完了，溢出桶才會新建更鲁。

當(dāng)發(fā)生以下兩種情況之一霎箍，map 會進行重建：

當(dāng) Map 超過了負(fù)載因子大小
當(dāng)溢出桶的數(shù)量過多

在哈希表中都有負(fù)載因子的概念

負(fù)載因子 = 哈希表中元素數(shù)量 / 桶的數(shù)量

因此隨著負(fù)載因子的增大，意味著越多的元素會分配到同一個桶中澡为。此時其效率會減慢漂坏。
試想如果桶的數(shù)量只有 1 個，此時負(fù)載因子到達最大,此時的搜索效率就成了遍歷數(shù)組。在 Go 語言中的負(fù)載因子為 6.5顶别。
當(dāng)超過了其大小后谷徙，Mpa 會進行擴容，增大兩倍于舊表的大小驯绎。
舊桶的數(shù)據(jù)會首先存到oldbuckets 字段完慧，并想辦法分散的轉(zhuǎn)移到新桶中。

image

map 的重建還存在第二種情況剩失，即溢出桶的數(shù)量太多屈尼。這時只會新建和原來的 map 具有相同大小的桶。進行這樣same size的重建為了是防止溢出桶的數(shù)量可能緩慢增長導(dǎo)致的內(nèi)存泄露.

當(dāng)進行 map 的 delete 操作時, 和賦值操作類似拴孤，會找到指定的桶脾歧，如果存在指定的 key,那么就釋放掉 key 與 value 引用的內(nèi)存。同時 tophash 中指定位置會存儲emptyOne,代表當(dāng)前位置是空的演熟。
同時在刪除操作時鞭执，會探測到是否當(dāng)前要刪除的元素之后都是空的。如果是芒粹，tophash 會存儲為emptyRest. 這樣做的好處是在做查找操作時兄纺，遇到 emptyRest 可以直接退出，因為后面的元素都是空的是辕。

image

Map 深入

上一節(jié)用多張圖片解釋了 Map 的實現(xiàn)原理囤热，本節(jié)會繼續(xù)深入 Go 語言源碼解釋 Map 的具體實現(xiàn)細節(jié)。問題掌握得有多細致获三，理解得就有多透徹旁蔼。

Map 深入: make 初始化

如果我們使用 make 關(guān)鍵字初始化 Map,在 typecheck1 類型檢查階段,節(jié)點 Node 的 op 操作變?yōu)?code>OMAKEMAP,如果指定了 make map 的長度,則會將長度常量值類型轉(zhuǎn)換為 TINT 類型.如果未指定長度，則長度為 0疙教。nodintconst(0)

func typecheck1(n *Node, top int) (res *Node) {
        ...
        case TMAP:
            if i < len(args) {
                l = args[i]
                i++
                l = typecheck(l, ctxExpr)
                l = defaultlit(l, types.Types[TINT])
                if l.Type == nil {
                    n.Type = nil
                    return n
                }
                if !checkmake(t, "size", l) {
                    n.Type = nil
                    return n
                }
                n.Left = l
            } else {
                n.Left = nodintconst(0)
            }
            n.Op = OMAKEMAP

如果 make 的第二個參數(shù)不是整數(shù)棺聊，則會在類型檢查時報錯。

if !checkmake(t, "size", l) {
        n.Type = nil
        return n
}

func checkmake(t *types.Type, arg string, n *Node) bool {
    if !n.Type.IsInteger() && n.Type.Etype != TIDEAL {
        yyerror("non-integer %s argument in make(%v) - %v", arg, t, n.Type)
        return false
    }
}

最后會指定在運行時調(diào)用 runtime.makemap*函數(shù)

func walkexpr(n *Node, init *Nodes) *Node {
           fnname := "makemap64"
            argtype := types.Types[TINT64]
            // Type checking guarantees that TIDEAL hint is positive and fits in an int.
            // See checkmake call in TMAP case of OMAKE case in OpSwitch in typecheck1 function.
            // The case of hint overflow when converting TUINT or TUINTPTR to TINT
            // will be handled by the negative range checks in makemap during runtime.
            if hint.Type.IsKind(TIDEAL) || maxintval[hint.Type.Etype].Cmp(maxintval[TUINT]) <= 0 {
                fnname = "makemap"
                argtype = types.Types[TINT]
            }
            fn := syslook(fnname)
            fn = substArgTypes(fn, hmapType, t.Key(), t.Elem())
            n = mkcall1(fn, n.Type, init, typename(n.Type), conv(hint, argtype), h)
}

不管是 makemap64 還是 makemap,最后都調(diào)用了 makemap 函數(shù)

func makemap64(t *maptype, hint int64, h *hmap) *hmap {
    if int64(int(hint)) != hint {
        hint = 0
    }
    return makemap(t, int(hint), h)
}

保證創(chuàng)建 map 的長度不能超過 int 大小
```
if int64(int(hint)) != hint {
    hint = 0
}
```
makemap 函數(shù)會計算出需要的桶的數(shù)量,即 log2(N),并調(diào)用makeBucketArray函數(shù)生成桶和溢出桶
如果初始化時生成了溢出桶,會放置到 map 的extra字段里去

func makemap(t *maptype, hint int, h *hmap) *hmap {
    ...
    B := uint8(0)
    for overLoadFactor(hint, B) {
        B++
    }
    h.B = B

    if h.B != 0 {
        var nextOverflow *bmap
        h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
        if nextOverflow != nil {
            h.extra = new(mapextra)
            h.extra.nextOverflow = nextOverflow
        }
    }

    return h
}

makeBucketArray 會為 Map 申請內(nèi)存大小贞谓，這里需要注意的是限佩，如果 map 的數(shù)量大于了2^4，則會在初始化的時候生成溢出桶裸弦。溢出桶的大小為 2^(b-4),b 為桶的大小祟同。

func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {
    if b >= 4 {
        nbuckets += bucketShift(b - 4)
        sz := t.bucket.size * nbuckets
        up := roundupsize(sz)
        if up != sz {
            nbuckets = up / t.bucket.size
        }
    }

    if dirtyalloc == nil {
        buckets = newarray(t.bucket, int(nbuckets))
    } else {
        buckets = dirtyalloc
        size := t.bucket.size * nbuckets
        if t.bucket.ptrdata != 0 {
            memclrHasPointers(buckets, size)
        } else {
            memclrNoHeapPointers(buckets, size)
        }
    }

    if base != nbuckets {
        nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))
        last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))
        last.setoverflow(t, (*bmap)(buckets))
    }
    return buckets, nextOverflow
}

Map 深入: 字面量初始化

如果是采取了字面量初始化的方式，其最終任然是需要轉(zhuǎn)換為make操作,其長度是字面量的長度理疙。其編譯時的核心邏輯位于:

func anylit(n *Node, var_ *Node, init *Nodes){
    ...
case OMAPLIT:
        if !t.IsMap() {
            Fatalf("anylit: not map")
        }
            maplit(n, var_, init)

}

func maplit(n *Node, m *Node, init *Nodes) {
    a := nod(OMAKE, nil, nil)
    a.Esc = n.Esc
    a.List.Set2(typenod(n.Type), nodintconst(int64(n.List.Len())))
    if len(entries) > 25 {
        ...
    }
    ...
}

唯一值得一提的是,如果字面量的個數(shù)大于 25 個,編譯時會構(gòu)建一個數(shù)組循環(huán)添加

entries := n.List.Slice()
if len(entries) > 25 {
    // loop adding structure elements to map
    // for i = 0; i < len(vstatk); i++ {
    //  map[vstatk[i]] = vstate[i]
    // }
}

如果字面量的個數(shù)小于 25 個,編譯時會指定會采取直接添加的方式賦值

for _, r := range entries {
    map[key] = value
}

Map 深入: map 訪問

前面介紹過,對 map 的訪問晕城，具有兩種形式。一種是返回單個值

v := hash[key]

一種是返回多個返回值

v, ok := hash[key]

Go 語言沒有函數(shù)重載的概念窖贤，決定返回一個值還是兩個值很明顯只能夠在編譯時完成砖顷。對于 v:= rating["Go"]rating["Go"] 會在編譯時解析為一個 node贰锁，其中左邊 type 為 ONAME,存儲名字:,右邊 type 為 OLITERAL,存儲"Go",節(jié)點的 op 為"OINDEXMAP" 根據(jù)hash[key] 位于賦值號的左邊或右邊，決定要執(zhí)行訪問還是賦值的操作滤蝠。訪問操作會在運行時調(diào)用運行 mapaccess1_XXX 函數(shù),賦值操作會在運行時調(diào)用 mapassign_XXX 函數(shù).

if n.IndexMapLValue() {
            // This m[k] expression is on the left-hand side of an assignment.
            fast := mapfast(t)
            if fast == mapslow {
                // standard version takes key by reference.
                // orderexpr made sure key is addressable.
                key = nod(OADDR, key, nil)
            }
            n = mkcall1(mapfn(mapassign[fast], t), nil, init, typename(t), map_, key)
        } else {
            // m[k] is not the target of an assignment.
            fast := mapfast(t)
            if fast == mapslow {
                // standard version takes key by reference.
                // orderexpr made sure key is addressable.
                key = nod(OADDR, key, nil)
            }

            if w := t.Elem().Width; w <= 1024 { // 1024 must match runtime/map.go:maxZero
                n = mkcall1(mapfn(mapaccess1[fast], t), types.NewPtr(t.Elem()), init, typename(t), map_, key)
            } else {
                z := zeroaddr(w)
                n = mkcall1(mapfn("mapaccess1_fat", t), types.NewPtr(t.Elem()), init, typename(t), map_, key, z)
            }
        }

Go 編譯器根據(jù) map 中的 key 類型和大小選擇不同的 mapaccess1_XXX 函數(shù)進行加速,但是他們在查找邏輯上都是相同的

func mapfast(t *types.Type) int {
    // Check runtime/map.go:maxElemSize before changing.
    if t.Elem().Width > 128 {
        return mapslow
    }
    switch algtype(t.Key()) {
    case AMEM32:
        if !t.Key().HasHeapPointer() {
            return mapfast32
        }
        if Widthptr == 4 {
            return mapfast32ptr
        }
        Fatalf("small pointer %v", t.Key())
    case AMEM64:
        if !t.Key().HasHeapPointer() {
            return mapfast64
        }
        if Widthptr == 8 {
            return mapfast64ptr
        }
        // Two-word object, at least one of which is a pointer.
        // Use the slow path.
    case ASTRING:
        return mapfaststr
    }
    return mapslow
}

func mkmapnames(base string, ptr string) mapnames {
    return mapnames{base, base + "_fast32", base + "_fast32" + ptr, base + "_fast64", base + "_fast64" + ptr, base + "_faststr"}
}

var mapaccess1 = mkmapnames("mapaccess1", "")

最終會在運行時會調(diào)用 mapaccess1_XXXX 的函數(shù)豌熄。

而對于v, ok := hash[key]類型的 map 訪問則有所不同。在編譯時的 op 操作為 OAS2MAPR.會將其轉(zhuǎn)換為在運行時調(diào)用的 mapaccess2_XXXX 前綴的函數(shù)物咳。其偽代碼如下:

//   var,b = mapaccess2*(t, m, i)
//   v = *var

需要注意锣险，如果采用_, ok := hash[key]形式，則不用對第一個參數(shù)進行賦值操作.
在運行時,會根據(jù) key 值以及 hash 種子計算 hash 值:alg.hash(key, uintptr(h.hash0)).
接著 bucketMask 計算出當(dāng)前桶的個數(shù)-1. m := bucketMask(h.B)
Go 語言采用了一種簡單的方式hash&m計算出此 key 應(yīng)該位于哪一個桶中.獲取到桶的位置后览闰，tophash(hash)即可計算出 hash 的前 8 位.
接著此 hash 挨個與存儲在桶中的 tophash 進行對比囱持。如果有 hash 值相同的話.會找到其對應(yīng)的 key 值，查看 key 值是否相同焕济。如果 key 值也相同，即說明查找到了結(jié)果盔几，返回 value 哦晴弃。

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    alg := t.key.alg
    hash := alg.hash(key, uintptr(h.hash0))
    m := bucketMask(h.B)
    b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
    top := tophash(hash)
bucketloop:
    for ; b != nil; b = b.overflow(t) {
        for i := uintptr(0); i < bucketCnt; i++ {
            if b.tophash[i] != top {
                if b.tophash[i] == emptyRest {
                    break bucketloop
                }
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey() {
                k = *((*unsafe.Pointer)(k))
            }
            if alg.equal(key, k) {
                e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
                if t.indirectelem() {
                    e = *((*unsafe.Pointer)(e))
                }
                return e
            }
        }
    }
    return unsafe.Pointer(&zeroVal[0])
}

函數(shù) mapaccess2 的邏輯幾乎是類似的，只是其會返回第二個參數(shù)逊拍，表明 value 值是否存在于桶中.

func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) {
    alg := t.key.alg
    hash := alg.hash(key, uintptr(h.hash0))
    m := bucketMask(h.B)
    b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize)))
    top := tophash(hash)
bucketloop:
    for ; b != nil; b = b.overflow(t) {
        for i := uintptr(0); i < bucketCnt; i++ {
            if b.tophash[i] != top {
                if b.tophash[i] == emptyRest {
                    break bucketloop
                }
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey() {
                k = *((*unsafe.Pointer)(k))
            }
            if alg.equal(key, k) {
                e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
                if t.indirectelem() {
                    e = *((*unsafe.Pointer)(e))
                }
                return e, true
            }
        }
    }
    return unsafe.Pointer(&zeroVal[0]), false
}

Map 深入: 賦值操作

和訪問的情況的比較類似, 最終會調(diào)用運行時 mapassign*函數(shù)上鞠。
賦值操作，map 必須已經(jīng)進行了初始化芯丧。

if h == nil {
    panic(plainError("assignment to entry in nil map"))
}

同時要注意芍阎，由于 Map 不支持并發(fā)的讀寫操作，因此還會檢測是否有協(xié)程在訪問此 Map,如果是缨恒，即會報錯谴咸。

if h.flags&hashWriting != 0 {
    throw("concurrent map writes")
}

和訪問操作一樣,會計算 key 的 hash 值

alg := t.key.alg
hash := alg.hash(key, uintptr(h.hash0))

標(biāo)記當(dāng)前 map 已經(jīng)是寫入狀態(tài)

h.flags ^= hashWriting

如果當(dāng)前沒有桶，還會常見一個新桶骗露。所以初始化的時候還是定一個長度吧岭佳。

if h.buckets == nil {
    h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
}

接著找到當(dāng)前 key 對應(yīng)的桶

bucket := hash & bucketMask(h.B)

如果發(fā)現(xiàn)，當(dāng)前的 map 正好在重建萧锉，還沒有重建完珊随。會優(yōu)先完成重建過程，重建的細節(jié)后面會介紹柿隙。

if h.growing() {
    growWork(t, h, bucket)
}

計算 tophash

b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))
top := tophash(hash)

開始尋找是否有對應(yīng)的 hash叶洞，如果找到了，判斷 key 是否相同禀崖，如果相同衩辟，會找到對應(yīng)的 value 的位置在后面進行賦值

for i := uintptr(0); i < bucketCnt; i++ {
            if b.tophash[i] != top {
                if isEmpty(b.tophash[i]) && inserti == nil {
                    inserti = &b.tophash[i]
                    insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
                    elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
                }
                if b.tophash[i] == emptyRest {
                    break bucketloop
                }
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey() {
                k = *((*unsafe.Pointer)(k))
            }
            if !alg.equal(key, k) {
                continue
            }
            // already have a mapping for key. Update it.
            if t.needkeyupdate() {
                typedmemmove(t.key, k, key)
            }
            elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
            goto done
        }
        ovf := b.overflow(t)
        if ovf == nil {
            break
        }
        b = ovf
    }

要注意的是,如果 tophash 沒找到，還會去溢出桶里尋找是否存在指定的 hash
如果也不存在帆焕，會選擇往第一個空元素中插入數(shù)據(jù) inserti惭婿、insertk 會記錄此空元素的位置不恭，

if isEmpty(b.tophash[i]) && inserti == nil {
    inserti = &b.tophash[i]
    insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
    elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
}

在賦值之前，還需要判斷 Map 是否需要重建

if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
        hashGrow(t, h)
        goto again // Growing the table invalidates everything, so try again
    }

沒有問題后财饥，就會執(zhí)行最后的操作换吧，將新的 key 與 value 值存入數(shù)組中
這里需要注意一點是,如果桶中已經(jīng)沒有了空元素。這時候我們申請一個新的桶給到這個桶钥星。

if inserti == nil {
    // all current buckets are full, allocate a new one.
    newb := h.newoverflow(t, b)
    inserti = &newb.tophash[0]
    insertk = add(unsafe.Pointer(newb), dataOffset)
    elem = add(insertk, bucketCnt*uintptr(t.keysize))
}

申請的新桶一開始是來自于 map 中extra字段初始化時存儲的多余溢出桶沾瓦。如果這些多余的溢出桶都用完了才會申請新的內(nèi)存。一個桶的溢出桶可能會進行延展

image

 *hmap, key unsafe.Pointer) unsafe.Pointer {
    var inserti *uint8
    var insertk unsafe.Pointer
    var elem unsafe.Pointer
bucketloop:
    for {

    // Did not find mapping for key. Allocate new cell & add entry.

    // If we hit the max load factor or we have too many overflow buckets,
    // and we're not already in the middle of growing, start growing.

    if inserti == nil {
        // all current buckets are full, allocate a new one.
        newb := h.newoverflow(t, b)
        inserti = &newb.tophash[0]
        insertk = add(unsafe.Pointer(newb), dataOffset)
        elem = add(insertk, bucketCnt*uintptr(t.keysize))
    }

    // store new key/elem at insert position
    if t.indirectkey() {
        kmem := newobject(t.key)
        *(*unsafe.Pointer)(insertk) = kmem
        insertk = kmem
    }
    if t.indirectelem() {
        vmem := newobject(t.elem)
        *(*unsafe.Pointer)(elem) = vmem
    }
    typedmemmove(t.key, insertk, key)
    *inserti = top
    h.count++

done:
    if h.flags&hashWriting == 0 {
        throw("concurrent map writes")
    }
    h.flags &^= hashWriting
    if t.indirectelem() {
        elem = *((*unsafe.Pointer)(elem))
    }
    return elem
}

Map 深入: Map 重建

當(dāng)發(fā)生以下兩種情況之一谦炒，map 會進行重建：

當(dāng) Map 超過了負(fù)載因子大小 6.5
當(dāng)溢出桶的數(shù)量過多重建時需要調(diào)用 hashGrow 函數(shù),如果是負(fù)載因子超載贯莺，會進行雙倍重建。

bigger := uint8(1)
if !overLoadFactor(h.count+1, h.B) {
    bigger = 0
    h.flags |= sameSizeGrow
}

當(dāng)溢出桶的數(shù)量過多,則會進行等量重建宁改。新桶會會存儲到buckets字段,舊桶會存儲到oldbuckets字段缕探。
map 中 extra 字段的溢出桶也同理的進行了轉(zhuǎn)移。


if h.extra != nil && h.extra.overflow != nil {
    // Promote current overflow buckets to the old generation.
    if h.extra.oldoverflow != nil {
        throw("oldoverflow is not nil")
    }
    h.extra.oldoverflow = h.extra.overflow
    h.extra.overflow = nil
}

hashGrow 代碼一覽

func hashGrow(t *maptype, h *hmap) {
    // If we've hit the load factor, get bigger.
    // Otherwise, there are too many overflow buckets,
    // so keep the same number of buckets and "grow" laterally.
    bigger := uint8(1)
    if !overLoadFactor(h.count+1, h.B) {
        bigger = 0
        h.flags |= sameSizeGrow
    }
    oldbuckets := h.buckets
    newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

    flags := h.flags &^ (iterator | oldIterator)
    if h.flags&iterator != 0 {
        flags |= oldIterator
    }
    // commit the grow (atomic wrt gc)
    h.B += bigger
    h.flags = flags
    h.oldbuckets = oldbuckets
    h.buckets = newbuckets
    h.nevacuate = 0
    h.noverflow = 0

    if h.extra != nil && h.extra.overflow != nil {
        // Promote current overflow buckets to the old generation.
        if h.extra.oldoverflow != nil {
            throw("oldoverflow is not nil")
        }
        h.extra.oldoverflow = h.extra.overflow
        h.extra.overflow = nil
    }
    if nextOverflow != nil {
        if h.extra == nil {
            h.extra = new(mapextra)
        }
        h.extra.nextOverflow = nextOverflow
    }

    // the actual copying of the hash table data is done incrementally
    // by growWork() and evacuate().
}

要注意的是, 在這里并沒有進行實際的將舊桶數(shù)據(jù)轉(zhuǎn)移到新桶的過程还蹲。數(shù)據(jù)轉(zhuǎn)移遵循了copy on write(寫時復(fù)制) 的規(guī)則爹耗。只有在真正賦值的時候，會選擇是否需要進行數(shù)據(jù)轉(zhuǎn)移谜喊。核心邏輯位于函數(shù)growWork and evacuate

bucket := hash & bucketMask(h.B)
if h.growing() {
    growWork(t, h, bucket)
}

在進行寫時復(fù)制的時候潭兽，意味著并不是所有的數(shù)據(jù)都會一次性的進行轉(zhuǎn)移，而只會轉(zhuǎn)移當(dāng)前需要的這個舊桶斗遏。 bucket := hash & bucketMask(h.B)得到了當(dāng)前新桶所在的位置山卦，而要轉(zhuǎn)移的舊桶的位置位于bucket&h.oldbucketmask() xy [2]evacDst 用于存儲要轉(zhuǎn)移到新桶的位置

如果是雙倍重建,那么舊桶轉(zhuǎn)移到新桶的位置總是相距舊桶的數(shù)量.

image

如果是等量重建，則簡單的直接轉(zhuǎn)移即可

image

到哪一些新桶.

其中有一個非常重要的原則是：如果此數(shù)據(jù)計算完 hash 后,hash & bucketMask <= 舊桶的大小 意味著這個數(shù)據(jù)必須轉(zhuǎn)移到和舊桶位置完全對應(yīng)的新桶中去.理由是現(xiàn)在當(dāng)前 key 所在新桶的序號與舊桶是完全相同的诵次。

newbit := h.noldbuckets()
if hash&newbit != 0 {
    useY = 1
}

func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
    ...
    // Unlink the overflow buckets & clear key/elem to help GC.
        if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {
            b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
            // Preserve b.tophash because the evacuation
            // state is maintained there.
            ptr := add(b, dataOffset)
            n := uintptr(t.bucketsize) - dataOffset
            memclrHasPointers(ptr, n)
        }
    }

    if oldbucket == h.nevacuate {
        advanceEvacuationMark(h, t, newbit)
    }
}

Map 深入: delete

刪除的邏輯在之前介紹過账蓉，是比較簡單的。
核心邏輯位于func mapdelete(t *maptype, h *hmap, key unsafe.Pointer)
同樣需要計算出 hash 的前 8 位藻懒、指定的桶等剔猿。
同樣會一直尋找是否有相同的 key，如果找不到,會一直查找當(dāng)前桶的溢出桶下去,知道到達末尾...
如果查找到了指定的 key,則會清空數(shù)據(jù)嬉荆，hash 位設(shè)置為emptyOne. 如果發(fā)現(xiàn)后面沒有元素归敬，則會設(shè)置為emptyRest,并循環(huán)向上檢查前一個元素是否為空。

for {
    b.tophash[i] = emptyRest
    if i == 0 {
        if b == bOrig {
            break // beginning of initial bucket, we're done.
        }
        // Find previous bucket, continue at its last entry.
        c := b
        for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
        }
        i = bucketCnt - 1
    } else {
        i--
    }
    if b.tophash[i] != emptyOne {
        break
    }
}

delete 代碼一覽

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
    alg := t.key.alg
    hash := alg.hash(key, uintptr(h.hash0))
    h.flags ^= hashWriting

    bucket := hash & bucketMask(h.B)
    if h.growing() {
        growWork(t, h, bucket)
    }
    b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
    bOrig := b
    top := tophash(hash)
search:
    for ; b != nil; b = b.overflow(t) {
        for i := uintptr(0); i < bucketCnt; i++ {
            if b.tophash[i] != top {
                if b.tophash[i] == emptyRest {
                    break search
                }
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            k2 := k
            if t.indirectkey() {
                k2 = *((*unsafe.Pointer)(k2))
            }
            if !alg.equal(key, k2) {
                continue
            }
                if t.indirectkey() {
                *(*unsafe.Pointer)(k) = nil
            } else if t.key.ptrdata != 0 {
                memclrHasPointers(k, t.key.size)
            }
            e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
            if t.indirectelem() {
                *(*unsafe.Pointer)(e) = nil
            } else if t.elem.ptrdata != 0 {
                memclrHasPointers(e, t.elem.size)
            } else {
                memclrNoHeapPointers(e, t.elem.size)
            }
            b.tophash[i] = emptyOne
                if i == bucketCnt-1 {
                if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
                    goto notLast
                }
            } else {
                if b.tophash[i+1] != emptyRest {
                    goto notLast
                }
            }
            for {
                b.tophash[i] = emptyRest
                if i == 0 {
                    if b == bOrig {
                        break // beginning of initial bucket, we're done.
                    }
                    c := b
                    for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
                    }
                    i = bucketCnt - 1
                } else {
                    i--
                }
                if b.tophash[i] != emptyOne {
                    break
                }
            }
        notLast:
            h.count--
            break search
        }
    }
    if h.flags&hashWriting == 0 {
        throw("concurrent map writes")
    }
    h.flags &^= hashWriting
}

總結(jié)

本文首先介紹了 Go 語言 Map 增刪查改的基本用法
接著介紹了 Map 使用中 key 的特性,只有可比較的類型才能夠作為 Map 中的 key
接著介紹了 map 禁止并發(fā)讀寫的設(shè)計原因,即為了大部分程序的效率而犧牲了小部分程序的安全鄙早。
最后我們深入源碼介紹了 map 編譯和運行時的具體細節(jié)汪茧。Map 有多種初始化的方式,如果指定了長度 N,在初始化時會生成桶。桶的數(shù)量為 log2(N).如果 map 的長度大于了2^4限番，則會在初始化的時候生成溢出桶舱污。溢出桶的大小為 2^(b-4),b 為桶的大小。
在涉及到訪問弥虐、賦值扩灯、刪除操作時,都會首先計算數(shù)據(jù)的 hash 值,接著簡單的&運算計算出數(shù)據(jù)存儲在桶中的位置媚赖。接著會根據(jù) hash 的前 8 位與存儲在桶中的 hash、key 進行比較珠插，完成最后的賦值與訪問操作惧磺。如果數(shù)據(jù)放不下了還會申請放置到溢出桶中
當(dāng) Map 超過了負(fù)載因子大小會進行雙倍重建,溢出桶太大會進行等量重建。數(shù)據(jù)的轉(zhuǎn)移采取了寫時復(fù)制的策略捻撑，即在用到時才會將舊桶的數(shù)據(jù)打散放入到新桶中磨隘。
因此，可以看出 Map 是簡單高效的 kv 存儲的利器,它非彻嘶迹快但是卻快不到極致番捂。理論上來說，我們總是可以根據(jù)我們數(shù)據(jù)的特點設(shè)計出更好的哈希函數(shù)以及映射機制江解。
Map 的重建的過程提示我們可以評估放入到 Map 的數(shù)據(jù)大小设预，并在初始化時指定
Map 在實踐中極少成為性能的瓶頸，但是卻容易寫出并發(fā)沖突的程序犁河。這提示我們進行合理的設(shè)計以及進行race檢查絮缅。
see you~