Hash 函數(shù)

1. hash

https://stackoverflow.com/questions/44695684/structured-bindings-when-something-looks-like-a-reference-and-behaves-similarly

https://stackoverflow.com/questions/9729390/how-to-use-unordered-set-with-custom-types

https://stackoverflow.com/questions/4948780/magic-number-in-boosthash-combine

http://www.burtleburtle.net/bob/hash/doobs.html

http://blog.reverberate.org/2012/01/state-of-hash-functions-2012.html

https://www.oschina.net/translate/state-of-hash-functions

https://en.wikipedia.org/wiki/Jenkins_hash_function

https://sites.google.com/site/murmurhash/

https://github.com/google/cityhash

http://burtleburtle.net/bob/hash/spooky.html

2. Boost helper function

This can be difficult; the XOR/bit-shifting method above is probably not a bad start. For a slightly better start, you may use the hash_value and hash_combine function template from the Boost library. The former acts in a similar way as std::hash for standard types (recently also including tuples and other useful standard types); the latter helps you combine individual hash values into one. Here is a rewrite of the hash function that uses the Boost helper functions:

#include <boost/functional/hash.hpp>
?
struct KeyHasher
{
std::size_t operator()(const Key& k) const
{
 using boost::hash_value;
 using boost::hash_combine;
?
 // Start with a hash value of 0    .
 std::size_t seed = 0;
?
 // Modify 'seed' by XORing and bit-shifting in
 // one member of 'Key' after the other:
 hash_combine(seed,hash_value(k.first));
 hash_combine(seed,hash_value(k.second));
 hash_combine(seed,hash_value(k.third));
?
 // Return the result.
 return seed;
}
};

And here’s a rewrite that doesn’t use boost, yet uses good method of combining the hashes:

namespace std
{
 template <>
 struct hash<Key>
 {
 size_t operator()( const Key& k ) const
 {
 // Compute individual hash values for first, second and third
 // http://stackoverflow.com/a/1646913/126995
 size_t res = 17;
 res = res * 31 + hash<string>()( k.first );
 res = res * 31 + hash<string>()( k.second );
 res = res * 31 + hash<int>()( k.third );
 return res;
 }
 };
}

The standard library contains specialisations of std::hash<T> for the fundamental types, for pointers and for std::string (or rather, for all specializations of std::basic_string).

Unfortunately the library does not contain the following vital new-from-old combination function, which is however part of Boost, and which you should copy into your code:

template <class T>
inline void hash_combine(std::size_t & seed, const T & v)
{
 std::hash<T> hasher;
 seed ^= hasher(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}

With this function, you can hash pairs, tuples, arrays, and any sort of range of elements that are themselves hashable. Browse the Boost sources for many examples and useful implementations. And obviously you can use this function to create a hash function for your own types. For example, here's hashing a pair:

template<typename S, typename T> struct pair_hash<std::pair<S, T>>
{
 inline std::size_t operator()(const std::pair<S, T> & v) const
 {
 std::size_t seed = 0;
 hash_combine(seed, v.first);
 hash_combine(seed, v.second);
 return seed;
 }
};

Please be aware, though, that hash-combining does not produce good hash values. The results have very poor statistic qualities (e.g. it is very easy to create hash collisions). Good hashing needs to be able to see all the raw input bits, and cannot be factored through partial hashes. (That's why there isn't a better solution in the current standard library; nobody has been able to come up with a satisfactory design.)

The magic number is supposed to be 32 random bits, where each is equally likely to be 0 or 1, and with no simple correlation between the bits. A common way to find a string of such bits is to use the binary expansion of an irrational number; in this case, that number is the reciprocal of the golden ratio:

phi = (1 + sqrt(5)) / 2
2^32 / phi = 0x9e3779b9

So including this number "randomly" changes each bit of the seed; as you say, this means that consecutive values will be far apart. Including the shifted versions of the old seed makes sure that, even if hash_value() has a fairly small range of values, differences will soon be spread across all the bits.

3. Bob Jenkins' Functions

3.1. one_at_a_time

Jenkins's one_at_a_time hash is adapted here from a WWW page by Bob Jenkins,[1] which is an expanded version of his Dr. Dobb's article.[2] It was originally created to fulfill certain requirements described by Colin Plumb, a cryptographer, but was ultimately not put to use.[3]

 uint32_t jenkins_one_at_a_time_hash(const uint8_t* key, size_t length) {
 size_t i = 0;
 uint32_t hash = 0;
 while (i != length) {
 hash += key[i++];
 hash += hash << 10;
 hash ^= hash >> 6;
 }
 hash += hash << 3;
 hash ^= hash >> 11;
 hash += hash << 15;
 return hash;
}

Sample hash values for one_at_a_time hash function.

one_at_a_time("a", 1)
0xca2e9442
one_at_a_time("The quick brown fox jumps over the lazy dog", 43)
0x519e91f5
image

Avalanche behavior of Jenkins One-at-a-time hash over 3-byte keys

The avalanche behavior of this hash is shown on the right.

Each of the 24 rows corresponds to a single bit in the 3-byte input key, and each of the 32 columns corresponds to a bit in the output hash. Colors are chosen by how well the input key bit affects the given output hash bit: a green square indicates good mixing behavior, a yellow square weak mixing behavior, and red would indicate no mixing. Only a few bits in the last byte of the input key are weakly mixed to a minority of bits in the output hash.

The standard implementation of the Perl programming language includes Jenkins's one-at-a-time hash or a hardened variant of it, along with SipHash, and uses Jenkins's one-at-a-time hash by default.[4][5]

3.2. lookup2

The lookup2 function was an interim successor to one-at-a-time. It is the function referred to as "My Hash" in the 1997 Dr. Dobbs journal article, though it has been obsoleted by subsequent functions that Jenkins has released. Applications of this hash function are found in:

  • the SPIN model checker, for probabilistic error detection. In a paper about this program, researchers Dillinger and Manolios note that lookup2 is "a popular choice among implementers of hash tables and Bloom filters". They study lookup2 and a simple extension of it that produces 96-bit rather than 32-bit hash values.[6]

  • The Netfilter firewall component of Linux,[7] where it replaced an earlier hash function that was too sensitive to collisions. The resulting system, however, was shown to still be sensitive to hash flooding attacks, even when the Jenkins hash is randomized using a secret key.[8]

  • The program that solved the game of kalah used the Jenkins hash function, instead of the Zobrist hashing technique more commonly used for this type of problem; the reasons for this choice were the speed of Jenkins' function on the small representations of kalah boards, as well as the fact that the basic rule of kalah can radically alter the board, negating the benefit of Zobrist's incremental computation of hash functions.[9]

3.3. lookup3

The lookup3 function consumes input in 12 byte (96 bit) chunks.[10] It may be appropriate when speed is more important than simplicity. Note, though, that any speed improvement from the use of this hash is only likely to be useful for large keys, and that the increased complexity may also have speed consequences such as preventing an optimizing compiler from inlining the hash function.

3.4. SpookyHash

In 2011 Jenkins released a new 128-bit hash function called SpookyHash.[11] SpookyHash is significantly faster than lookup3.

Example for V2 (little-endian x64):

The short method for less than 192 bytes (43 bytes):

 Hash128("The quick brown fox jumps over the lazy dog")
 2b12e846aa0693c71d367e742407341b

The standard method for more than 191 bytes (219 bytes):

 Hash128("The quick brown fox jumps over the lazy dog The quick brown fox jumps over the lazy dog The quick brown fox jumps over the lazy dog The quick brown fox jumps over the lazy dog The quick brown fox jumps over the lazy dog")
 f1b71c6ac5af39e7b69363a60dd29c49

4. MurmurHash

5. CityHash 和 SpookyHash

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末摊滔,一起剝皮案震驚了整個濱河市暑诸,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 218,546評論 6 507
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件充包,死亡現(xiàn)場離奇詭異娃豹,居然都是意外死亡,警方通過查閱死者的電腦和手機(jī)订框,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,224評論 3 395
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來兜叨,“玉大人穿扳,你說我怎么就攤上這事」酰” “怎么了矛物?”我有些...
    開封第一講書人閱讀 164,911評論 0 354
  • 文/不壞的土叔 我叫張陵,是天一觀的道長跪但。 經(jīng)常有香客問我履羞,道長,這世上最難降的妖魔是什么屡久? 我笑而不...
    開封第一講書人閱讀 58,737評論 1 294
  • 正文 為了忘掉前任忆首,我火速辦了婚禮,結(jié)果婚禮上被环,老公的妹妹穿的比我還像新娘糙及。我一直安慰自己,他們只是感情好筛欢,可當(dāng)我...
    茶點故事閱讀 67,753評論 6 392
  • 文/花漫 我一把揭開白布浸锨。 她就那樣靜靜地躺著,像睡著了一般版姑。 火紅的嫁衣襯著肌膚如雪柱搜。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 51,598評論 1 305
  • 那天漠酿,我揣著相機(jī)與錄音冯凹,去河邊找鬼。 笑死炒嘲,一個胖子當(dāng)著我的面吹牛宇姚,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播夫凸,決...
    沈念sama閱讀 40,338評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼浑劳,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了夭拌?” 一聲冷哼從身側(cè)響起魔熏,我...
    開封第一講書人閱讀 39,249評論 0 276
  • 序言:老撾萬榮一對情侶失蹤衷咽,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后蒜绽,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體镶骗,經(jīng)...
    沈念sama閱讀 45,696評論 1 314
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,888評論 3 336
  • 正文 我和宋清朗相戀三年躲雅,在試婚紗的時候發(fā)現(xiàn)自己被綠了鼎姊。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 40,013評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡相赁,死狀恐怖相寇,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情钮科,我是刑警寧澤唤衫,帶...
    沈念sama閱讀 35,731評論 5 346
  • 正文 年R本政府宣布,位于F島的核電站绵脯,受9級特大地震影響佳励,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜蛆挫,卻給世界環(huán)境...
    茶點故事閱讀 41,348評論 3 330
  • 文/蒙蒙 一植兰、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧璃吧,春花似錦、人聲如沸废境。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,929評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽噩凹。三九已至巴元,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間驮宴,已是汗流浹背逮刨。 一陣腳步聲響...
    開封第一講書人閱讀 33,048評論 1 270
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點兒被人妖公主榨干…… 1. 我叫王不留堵泽,地道東北人修己。 一個月前我還...
    沈念sama閱讀 48,203評論 3 370
  • 正文 我出身青樓,卻偏偏與公主長得像迎罗,于是被迫代替她去往敵國和親睬愤。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 44,960評論 2 355

推薦閱讀更多精彩內(nèi)容