幾種常見的hash函數(shù)

概覽

最近在看redis源碼，發(fā)現(xiàn)redis采用了幾種不同的算法來計算Hash Code;因此打算借此整理下JDK中的實現(xiàn)，加深理解;

Redis

Thomas Wang's 32 bit Mix Function

關(guān)于該算法的具體內(nèi)容运悲，可以參考這篇文章,算法源碼如下：

public int hash32shift(int key)
{
  key = ~key + (key << 15); // key = (key << 15) - key - 1;
  key = key ^ (key >>> 12);
  key = key + (key << 2);
  key = key ^ (key >>> 4);
  key = key * 2057; // key = (key + (key << 3)) + (key << 11);
  key = key ^ (key >>> 16);
  return key;
}

可以看到它是計算int類型的hash code,返回結(jié)果也是int類型;另外它還提供了64bit的算法涩咖，有興趣的可以自行查閱:
根據(jù)Thomas Wang所描述，由于上述代碼可以利用CPU的native指令经柴，在HP 9000 workstations機器上只需要11個時鐘周期，速度很快墩朦；
比較費解這邊的常量值是怎么確定的坯认，無奈算法基礎(chǔ)比較差，如果有了解的同學(xué)氓涣，歡迎答疑牛哺；

Austin Appleby's MurmurHash2

MurmurHash是一種非加密型哈希函數(shù)，由Austin Appleby在2008年發(fā)明劳吠，并且有多個變種;該算法的作者2011去了google,開發(fā)出來了新的算法CityHash引润；

unsigned int dictGenHashFunction(const void *key, int len) {
    /* 'm' and 'r' are mixing constants generated offline.
     They're not really 'magic', they just happen to work well.  */
    uint32_t seed = dict_hash_function_seed;
    const uint32_t m = 0x5bd1e995;
    const int r = 24;

    /* Initialize the hash to a 'random' value */
    uint32_t h = seed ^ len;

    /* Mix 4 bytes at a time into the hash */
    const unsigned char *data = (const unsigned char *)key;

    while(len >= 4) {
        uint32_t k = *(uint32_t*)data;

        k *= m;
        k ^= k >> r;
        k *= m;

        h *= m;
        h ^= k;

        data += 4;
        len -= 4;
    }

    /* Handle the last few bytes of the input array  */
    switch(len) {
    case 3: h ^= data[2] << 16;
    case 2: h ^= data[1] << 8;
    case 1: h ^= data[0]; h *= m;
    };

    /* Do a few final mixes of the hash to ensure the last few
     * bytes are well-incorporated. */
    h ^= h >> 13;
    h *= m;
    h ^= h >> 15;

    return (unsigned int)h;
}

Murmur可以計算字符串的hash code,基本思想就是把key分成n組，每組4個字符痒玩，把這4個字符看成是一個uint_32淳附，進行n次運算，得到一個h蠢古，然會在對h進行處理奴曙，得到一個相對離散的hash code;

DJB Hash

unsigned int DJBHash(char *str)    
{    
    unsigned int hash = 5381;    
     
    while (*str){    
        hash = ((hash << 5) + hash) + (*str++); /* times 33 */    
    }    
    hash &= ~(1 << 31); /* strip the highest bit */    
    return hash;    
}

Redis對其進行了部分調(diào)整，不區(qū)分大小寫：

unsigned int dictGenCaseHashFunction(const unsigned char *buf, int len) {
    unsigned int hash = (unsigned int)dict_hash_function_seed;

    while (len--)
        hash = ((hash << 5) + hash) + (tolower(*buf++)); /* hash * 33 + c */
    return hash;
}

JDK

Austin Appleby's MurmurHash3

從JDK7開始草讶，JDK引入了一種新的計算Hash code的算法洽糟，可以在如下的集合對象中使用:

HashMap
Hashtable
HashSet
LinkedHashMap
LinkedHashSet
WeakHashMap
ConcurrentHashMap

final int hash(Object k) {
    int h = hashSeed;
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }

    h ^= k.hashCode();

    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

可以看到只有key為字符類型，而且hashSeed不為0時才采用新的哈希算法堕战；
sun.misc.Hashing.stringHash32實際上調(diào)用的是String.hash32方法:

int hash32() {
    int h = hash32;
    //h==0表示未計算過hash code
    //可以看到hash code只會計算一次
    if (0 == h) {
       //HASHING_SEED是根據(jù)時間戳等計算出來的隨機數(shù)
       h = sun.misc.Hashing.murmur3_32(HASHING_SEED, value, 0, value.length);

       //確保結(jié)果非0坤溃，避免重新計算
       h = (0 != h) ? h : 1;

       hash32 = h;
    }

    return h;
}

可以看到最終調(diào)用的是MurmurHash3算法，那么什么情況下hashSeed不為0呢嘱丢？

final boolean initHashSeedAsNeeded(int capacity) {
    boolean currentAltHashing = hashSeed != 0;
    boolean useAltHashing = sun.misc.VM.isBooted() &&
            (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
    boolean switching = currentAltHashing ^ useAltHashing;
    if (switching) {
        hashSeed = useAltHashing
            ? sun.misc.Hashing.randomHashSeed(this)
            : 0;
    }
    return switching;
}

可以看到是否采用新算法和Holder.ALTERNATIVE_HASHING_THRESHOLD有關(guān)薪介；實際上JDK定義了一個新的屬性jdk.map.althashing.threshold（默認值－1），只有當(dāng)系統(tǒng)容量大于該屬性值時越驻，才會采用新的算法汁政；因此可以將通過-Djdk.map.althashing.threshold＝0設(shè)置該屬性為0道偷，啟用新的哈希函數(shù)；

String.hashCode

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

Object.hashCode

之前曾經(jīng)在Java對象結(jié)構(gòu) 中介紹了Java的MarkWord,其中記錄了對象的hashCode;如果對象計算過hashCode烂完，則會存儲到對象頭中；如果是第一次計算則是調(diào)用

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = intptr_t(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = intptr_t(obj) ;
  } else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

可以看到诵棵，這邊提供了多種不同的實現(xiàn)抠蚣，具體采用哪種，取決于hashCode的值履澳，在Linux機器上,hashCode默認為0:

hashCode.png

因此是通過os::random計算hashCode,可以參考Java對象結(jié)構(gòu) 嘶窄；

其它

Character、Integer距贷、Long柄冲、Double的hashCode方法返回包裝的基本類型;

最后編輯于：2017.12.05 05:30:16

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市忠蝗，隨后出現(xiàn)的幾起案子现横，更是在濱河造成了極大的恐慌，老刑警劉巖阁最，帶你破解...
沈念sama閱讀 206,378評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件戒祠，死亡現(xiàn)場離奇詭異，居然都是意外死亡速种，警方通過查閱死者的電腦和手機姜盈，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,356評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來配阵，“玉大人馏颂，你說我怎么就攤上這事∑灏” “怎么了救拉？”我有些...
開封第一講書人閱讀 152,702評論 0贊 342
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長瘫拣。經(jīng)常有香客問我近上，道長，這世上最難降的妖魔是什么拂铡？我笑而不...
開封第一講書人閱讀 55,259評論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任壹无，我火速辦了婚禮，結(jié)果婚禮上感帅，老公的妹妹穿的比我還像新娘斗锭。我一直安慰自己，他們只是感情好失球，可當(dāng)我...
茶點故事閱讀 64,263評論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布岖是。她就那樣靜靜地躺著帮毁，像睡著了一般。火紅的嫁衣襯著肌膚如雪豺撑。梳的紋絲不亂的頭發(fā)上烈疚，一...
開封第一講書人閱讀 49,036評論 1贊 285
城市分裂傳說
那天，我揣著相機與錄音聪轿，去河邊找鬼爷肝。笑死，一個胖子當(dāng)著我的面吹牛陆错，可吹牛的內(nèi)容都是我干的灯抛。我是一名探鬼主播，決...
沈念sama閱讀 38,349評論 3贊 400
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼音瓷，長吁一口氣：“原來是場噩夢啊……” “哼对嚼！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起绳慎，我...
開封第一講書人閱讀 36,979評論 0贊 259
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤纵竖，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后杏愤，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體磨确，經(jīng)...
沈念sama閱讀 43,469評論 1贊 300
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 35,938評論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年声邦，在試婚紗的時候發(fā)現(xiàn)自己被綠了乏奥。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 38,059評論 1贊 333
活死人
序言：一個原本活蹦亂跳的男人離奇死亡亥曹，死狀恐怖邓了，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情媳瞪，我是刑警寧澤骗炉，帶...
沈念sama閱讀 33,703評論 4贊 323
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站蛇受，受9級特大地震影響句葵，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜兢仰，卻給世界環(huán)境...
茶點故事閱讀 39,257評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一乍丈、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧把将，春花似錦轻专、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,262評論 0贊 19
一樁弒父案请垛，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽催训。三九已至，卻和暖如春宗收，著一層夾襖步出監(jiān)牢的瞬間漫拭，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 31,485評論 1贊 262
情欲美人皮
我被黑心中介騙來泰國打工混稽，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留采驻，地道東北人。一個月前我還...
沈念sama閱讀 45,501評論 2贊 354
代替公主和親
正文我出身青樓荚坞，卻偏偏與公主長得像挑宠，于是被迫代替她去往敵國和親菲盾。傳聞我的和親對象是個殘疾皇子颓影，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 42,792評論 2贊 345