Java作業(yè) | COM S 228, Fall 2020 Programming Project 5

本次Java作業(yè)是實(shí)現(xiàn)一個完整的哈希表

COM S 228, Fall 2020Programming Project 5

Generating a Perfect Hash Table Implementation

Problem Overview

Graphs are the most general of the common data structures; consider that a scalar object is a degenerate list,which is a degenerate tree, which is itself a degenerate graph, or in other words, every data structure we’ve discussed this semester can be represented as a graph. Due to their extreme generality, it’s rare that you’ll use a generic “graph” data structure. The limited set of things you might do with a list make two generic list implementations—an array based- and a linked-structure—good enough for most implementations, and it’s only if you are doing, e.g., low-level system programming that you might have a good reason to write a custom, one-off list. But there is so much variation in graphs that it is perhaps impossible to write a good and useful generic graph implementation.

Hash tables provide a means to get constant-time look-up performance on a collection. Usually, the data comes to us at runtime, often in an on-line fashion. In this case, we take what foreknowledge we can to inform best practices in hash table construction, choosing a hash function, etc., to build a data structure with as close to optimal performance as possible; however, if the keys in the table are known beforehand, it’s always possible to construct a perfect hash table, a table in which every key maps to a unique slot.There are many algorithms to construct such a table and the associated hash function; almost all of them are implemented using graphs. For this project, you will be implementing a perfect hash table construction algorithm1?

?This algorithm generates a pair of tables of random numbers. Keys are hashed with the values in each these tables, giving two integers (one for each table). These integers are then taken as node indices in a graph, with an edge between them. So long as the resultant graph—after hashing all of the keys—contains no cycles, the algorithm is ready to generate the hash table as a function of the tables (the T tables) used in the graph generation and a function g() on those tables.

Algorithm Example

NOTE: This algorithm looks pretty intimidating at first glance. With some careful reading and workingthrough of examples, you should find it’s actually fairly straightforward. That said, you don’t necessarily have to understand it at all! If you find yourself getting anxious about all this “scary-looking stuff”, jump down to the Requirements sections, read that, then come back here with a more relaxed mindset. It is very common for professional programmers to be required to implement solutions to problems that they don’t actually understand, so if you find yourself doing that, here or elsewhere, please know that it is normal. This is not to advocate for ignorance, however–it’s always better to understand something than not to understand it–only to recognize that sometimes the cost-benefit of understanding is too high! The example below was generated by hand, and uses an optimization that would be very difficult to implement in program control. We note that our original keys (the number names from “one” to “ten”) 1The algorithm is presented in: Zbigniew J. Czech, George Havas, and Bohdan S. Majewski, “An optimal algorithm for generating minimal perfect has functions”, Information Processing Letters, 43(5):257-264, October 1992. are unambiguously differentiable by the first two letters. Thus, we can simplify the presentation by using only the first two letters of the keys. In your Java program, you will always use the entirety of every word; however, the method that hashes the words is already implemented for you, so you don’t need to concern yourself with it unless you are endeavoring to fully understand the implementation.Our keys and the “sub-keys” that we’re using in this example follow:

Key Minimum Unique Sub-key

one on

two tw

three th

four fo

five fi

six si

seven se

eight ei

nine ni

ten te

A modulus must be chosen, giving a maximum number in our tables, and a maximum number of nodes in the resultant graph. The paper authors choose a value that is one more than twice the number of keys, so we do too. This choice has implications on the runtime of the algorithm (smaller moduli increase the probability of cycles in the resultant graph, forcing a restart of the algorithm). If the modulus is too small, the algorithm will enter an infinite loop.

Next, we populate our tables, T1 and T2, with random numbers less than the modulus. Each of the two tables will have the same number of rows as the length l of the sub-key. The i

th row, 0 <= i <= l ? 1,defines a separate mapping from letters at the i

th position (increasing from left to right) within the sub-key to random numbers.

T1

e f h i n o s t w

8 15 10 2 10 19 13 0 13

3 5 0 5 5 6 19 5 6

T2

e f h i n o s t w

20 2 4 19 11 15 3 8 11

16 4 2 6 19 10 18 4 18

These tables define a function that we will use to transform our keys into nodes in a graph. For each key,

we apply the function twice, once each for T1 and T2. Index the tables by the letter position and the letter

value, add the numbers found in the tables, and return a result modulo the modulus2

. We get two values per

key, one for T1, and one for T2. These values are taken as node IDs in our graph, and an edge runs between

them corresponding with the index of the key in the input vector, so “one” corresponds with index zero,

“two” with index one, etc.

2

If you look carefully at the hash function defined in the project source template, you will notice that we always use tables

of size 4 × 64. We’re doing things slightly differently there, to reduce code complexity. Instead of indexing with letter position,

we index with letter position mod 4, and instead of letter value, we use letter value mod 64. The example in this document uses

a simpler (and better) approach that is easy to do when hand tuning, but difficult to automate. The fundamental structure of the

algorithm is unchanged in either case.

Applying our table to the first key, “one”, and recall that we are using only the first two letters in this

example (and that we always use the entirety of every word in our program code), we have T1(0, o) ==

19 and T1(1, n) == 5. 19 + 5 ≡ 3 mod 21, so one end of our “one” edge is node 3. The other end is

given by T2(0, o) == 15 and T2(1, n) == 19, and 15 + 19 ≡ 13 mod 21. So the “one” edge runs from

node 3 to node 13.

The following table gives the calculation for the entire input set:

key T1(0,letter) T1(1,letter) sum mod 21 T2(0,letter) T2(1,letter) sum mod 21

one 19 5 3 15 19 13

two 0 6 6 8 18 5

three 0 0 0 8 2 10

four 15 6 0 2 10 12

five 15 5 20 2 6 8

six 13 5 18 3 6 9

seven 13 3 16 3 16 19

eight 8 5 13 20 6 5

nine 10 5 15 11 6 17

ten 0 3 3 8 16 3

The T1 sum column is also called u(key) and the T2 sum is v(key). We’ll be referring to them by these

names later in the algorithm.

Now our graph is built. The next step is to check if the graph is suitable to build our hash table. A

graph is suitable if it contains no cycles. Cycle detection is usually done with a depth first search, but in this

example, we’ll inspect by eye; in program control we would call a method for this check, one that you will

be implementing!

Look at the final row in the above table and note that the “ten” edge is a self loop; it both begins and

ends on node 3. Therefore, we need to discard this graph and the associated tables. We generate a new set

of random tables:

T1

e f h i n o s t w

1 13 16 3 3 7 10 1 17

19 11 17 6 20 3 0 17 14

T2

e f h i n o s t w

7 5 6 13 20 8 19 10 16

8 3 0 1 13 0 17 14 4

and calculate a new graph:

key T1(0,letter) T1(1,letter) sum mod 21 T2(0,letter) T2(1,letter) sum mod 21

one 7 20 6 8 13 0

two 1 14 15 10 4 14

three 1 17 18 10 0 10

four 13 3 16 5 0 5

five 13 6 19 5 1 6

six 10 6 16 19 1 20

seven 10 19 8 19 8 6

eight 1 6 7 7 1 8

nine 3 6 9 20 1 0

ten 1 19 20 10 8 18

Applying our cycle-detecting eyeballs, we see that this graph has no loops:

9

0

6

19

8

7

15

14

5

16

20

18

10

four (3)

nine (8)

one (0)

five (4)

seven (6)

eight (7)

two (1)

three (2)

six (5)

ten (9)

The final step is to fill the array that defines the g() function (as named in the paper; “g” does not stand

for “graph”). g() maps edges in our graph to indices in our original input array. We do this by traversing

the connected components of the graph. We always start with the lowest-numbered, unvisited vertex. When

traversing through a node where you have a choice of which direction to travel next, the choice does not

matter.

We start with node zero and assign g(0) = 0. Then we traverse the neighbors of node 0; since neighbor

order doesn’t matter, let’s do node 9 next. g(9) = 8 ? g(0) mod 10 ≡ 8, where the 8 in the subtraction

comes from the edge value and 10 is the number of keys in the input vector. Going back to node 1 and

heading the other direction, we have g(6) = 0 ? g(0) mod 10 ≡ 0; again, the 0 in the subtraction is the

edge value. Continuing to node 8, g(8) = 6 ? g(6) mod 10 ≡ 6. Node 7, g(7) = 7 ? g(8) mod 10 ≡ 1.

And node 19, g(19) = 4 ? g(6) mod 10 ≡ 4.

This completes the first connected subgraph. Let’s move to the next subgraph. The lowest-numbered,

unvisited node is node 5. We assign g(5) = 0, and proceed as above. Node 16, g(16) = 3 ? g(5)

mod 10 ≡ 3. Node 20, g(20) = 5 ? g(16) mod 10 ≡ 2. Node 18, g(18) = 9 ? g(20) mod 10 ≡ 7. And

node 10 completes this subgraph, g(10) = 2 ? g(18) mod 10 ≡ 5.

There is one more subgraph, containing nodes 14 and 15. Start with the smaller, g(14) = 0, and

g(15) = 1 ? g(14) mod 10 ≡ 1.

And here is g:

index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

g 0 0 0 0 0 0 0 1 6 8 5 0 0 0 0 1 3 0 7 4 2

Positions in this array that were not assigned in the last step do not matter. We choose to fill them in with

zeros.

As stated above, u(key) is the first “sum mod 21” in the final graph calculation table above, and v(key)

is the second such column. Our hash function is given by:

HASH(key) = (g (u (key)) + g (v (key))) mod 10

thus:

HASH(“seven”) = g(8) + g(6) mod 10

= 6 + 0 mod 10

= 6

which is the seventh element, since arrays are zero indexed; and

HASH(“three”) = g(18) + g(10) mod 10

= 7 + 5 mod 10

= 12 mod 10

= 2

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市莉给,隨后出現(xiàn)的幾起案子兄纺,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 217,907評論 6 506
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異核行,居然都是意外死亡,警方通過查閱死者的電腦和手機(jī)蹬耘,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,987評論 3 395
  • 文/潘曉璐 我一進(jìn)店門钮科,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人婆赠,你說我怎么就攤上這事。” “怎么了休里?”我有些...
    開封第一講書人閱讀 164,298評論 0 354
  • 文/不壞的土叔 我叫張陵蛆挫,是天一觀的道長。 經(jīng)常有香客問我妙黍,道長悴侵,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 58,586評論 1 293
  • 正文 為了忘掉前任拭嫁,我火速辦了婚禮可免,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘做粤。我一直安慰自己浇借,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,633評論 6 392
  • 文/花漫 我一把揭開白布怕品。 她就那樣靜靜地躺著妇垢,像睡著了一般。 火紅的嫁衣襯著肌膚如雪肉康。 梳的紋絲不亂的頭發(fā)上闯估,一...
    開封第一講書人閱讀 51,488評論 1 302
  • 那天,我揣著相機(jī)與錄音吼和,去河邊找鬼涨薪。 笑死,一個胖子當(dāng)著我的面吹牛炫乓,可吹牛的內(nèi)容都是我干的刚夺。 我是一名探鬼主播,決...
    沈念sama閱讀 40,275評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼厢岂,長吁一口氣:“原來是場噩夢啊……” “哼光督!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起塔粒,我...
    開封第一講書人閱讀 39,176評論 0 276
  • 序言:老撾萬榮一對情侶失蹤结借,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后卒茬,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體船老,經(jīng)...
    沈念sama閱讀 45,619評論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,819評論 3 336
  • 正文 我和宋清朗相戀三年圃酵,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了柳畔。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 39,932評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡郭赐,死狀恐怖薪韩,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤俘陷,帶...
    沈念sama閱讀 35,655評論 5 346
  • 正文 年R本政府宣布罗捎,位于F島的核電站,受9級特大地震影響拉盾,放射性物質(zhì)發(fā)生泄漏桨菜。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,265評論 3 329
  • 文/蒙蒙 一捉偏、第九天 我趴在偏房一處隱蔽的房頂上張望倒得。 院中可真熱鬧,春花似錦夭禽、人聲如沸霞掺。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,871評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽根悼。三九已至,卻和暖如春蜀撑,著一層夾襖步出監(jiān)牢的瞬間挤巡,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 32,994評論 1 269
  • 我被黑心中介騙來泰國打工酷麦, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留矿卑,地道東北人。 一個月前我還...
    沈念sama閱讀 48,095評論 3 370
  • 正文 我出身青樓沃饶,卻偏偏與公主長得像母廷,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子糊肤,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,884評論 2 354