Caffeine使用一個(gè)ConcurrencyHashMap來(lái)保存所有數(shù)據(jù)匪补,那它的過(guò)期淘汰策略采用的什么的方式與數(shù)據(jù)結(jié)構(gòu)呢?其中寫過(guò)期是使用writeOrderDeque,這個(gè)比較簡(jiǎn)單無(wú)需多說(shuō),而讀過(guò)期相對(duì)復(fù)雜很多对室,使用W-TinyLFU的結(jié)構(gòu)與算法参滴。
網(wǎng)絡(luò)上有很多文章介紹W-TinyLFU結(jié)構(gòu)的强岸,大家可以去查一下,這里主要是從源碼來(lái)分析砾赔,總的來(lái)說(shuō)它使用了三個(gè)雙端隊(duì)列:accessOrderEdenDeque,accessOrderProbationDeque,accessOrderProtectedDeque,使用雙端隊(duì)列的原因是支持LRU算法比較方便蝌箍。
accessOrderEdenDeque屬于eden區(qū),緩存1%的數(shù)據(jù)暴心,其余的99%緩存在main區(qū)妓盲。
accessOrderProbationDeque屬于main區(qū),緩存main內(nèi)數(shù)據(jù)的20%专普,這部分是屬于冷數(shù)據(jù)悯衬,即將補(bǔ)淘汰。
accessOrderProtectedDeque屬于main區(qū)脆诉,緩存main內(nèi)數(shù)據(jù)的80%甚亭,這部分是屬于熱數(shù)據(jù)贷币,是整個(gè)緩存的主存區(qū)。
我們先看一下淘汰方法入口:
void evictEntries() {
if (!evicts()) {
return;
}
//先從edn區(qū)淘汰
int candidates = evictFromEden();
//eden淘汰后的數(shù)據(jù)進(jìn)入main區(qū)亏狰,然后再?gòu)膍ain區(qū)淘汰
evictFromMain(candidates);
}
accessOrderEdenDeque對(duì)應(yīng)W-TinyLFU的W(window)役纹,這里保存的是最新寫入數(shù)據(jù)的引用,它使用LRU淘汰暇唾,這里面的數(shù)據(jù)主要是應(yīng)對(duì)突發(fā)流量的問(wèn)題促脉,淘汰后的數(shù)據(jù)進(jìn)入
accessOrderProbationDeque.代碼如下:
int evictFromEden() {
int candidates = 0;
Node<K, V> node = accessOrderEdenDeque().peek();
while (edenWeightedSize() > edenMaximum()) {
// The pending operations will adjust the size to reflect the correct weight
if (node == null) {
break;
}
Node<K, V> next = node.getNextInAccessOrder();
if (node.getWeight() != 0) {
node.makeMainProbation();
//先從eden區(qū)移除
accessOrderEdenDeque().remove(node);
//移除的數(shù)據(jù)加入到main區(qū)的probation隊(duì)列
accessOrderProbationDeque().add(node);
candidates++;
lazySetEdenWeightedSize(edenWeightedSize() - node.getPolicyWeight());
}
node = next;
}
return candidates;
}
數(shù)據(jù)進(jìn)入probation隊(duì)列后,繼續(xù)執(zhí)行以下代碼:
void evictFromMain(int candidates) {
int victimQueue = PROBATION;
Node<K, V> victim = accessOrderProbationDeque().peekFirst();
Node<K, V> candidate = accessOrderProbationDeque().peekLast();
while (weightedSize() > maximum()) {
// Stop trying to evict candidates and always prefer the victim
if (candidates == 0) {
candidate = null;
}
// Try evicting from the protected and eden queues
if ((candidate == null) && (victim == null)) {
if (victimQueue == PROBATION) {
victim = accessOrderProtectedDeque().peekFirst();
victimQueue = PROTECTED;
continue;
} else if (victimQueue == PROTECTED) {
victim = accessOrderEdenDeque().peekFirst();
victimQueue = EDEN;
continue;
}
// The pending operations will adjust the size to reflect the correct weight
break;
}
// Skip over entries with zero weight
if ((victim != null) && (victim.getPolicyWeight() == 0)) {
victim = victim.getNextInAccessOrder();
continue;
} else if ((candidate != null) && (candidate.getPolicyWeight() == 0)) {
candidate = candidate.getPreviousInAccessOrder();
candidates--;
continue;
}
// Evict immediately if only one of the entries is present
if (victim == null) {
candidates--;
Node<K, V> evict = candidate;
candidate = candidate.getPreviousInAccessOrder();
evictEntry(evict, RemovalCause.SIZE, 0L);
continue;
} else if (candidate == null) {
Node<K, V> evict = victim;
victim = victim.getNextInAccessOrder();
evictEntry(evict, RemovalCause.SIZE, 0L);
continue;
}
// Evict immediately if an entry was collected
K victimKey = victim.getKey();
K candidateKey = candidate.getKey();
if (victimKey == null) {
Node<K, V> evict = victim;
victim = victim.getNextInAccessOrder();
evictEntry(evict, RemovalCause.COLLECTED, 0L);
continue;
} else if (candidateKey == null) {
candidates--;
Node<K, V> evict = candidate;
candidate = candidate.getPreviousInAccessOrder();
evictEntry(evict, RemovalCause.COLLECTED, 0L);
continue;
}
// Evict immediately if the candidate's weight exceeds the maximum
if (candidate.getPolicyWeight() > maximum()) {
candidates--;
Node<K, V> evict = candidate;
candidate = candidate.getPreviousInAccessOrder();
evictEntry(evict, RemovalCause.SIZE, 0L);
continue;
}
// Evict the entry with the lowest frequency
candidates--;
//最核心算法在這里:從probation的頭尾取出兩個(gè)node進(jìn)行比較頻率策州,頻率更小者將被remove
if (admit(candidateKey, victimKey)) {
Node<K, V> evict = victim;
victim = victim.getNextInAccessOrder();
evictEntry(evict, RemovalCause.SIZE, 0L);
candidate = candidate.getPreviousInAccessOrder();
} else {
Node<K, V> evict = candidate;
candidate = candidate.getPreviousInAccessOrder();
evictEntry(evict, RemovalCause.SIZE, 0L);
}
}
}
上面的代碼邏輯是從probation的頭尾取出兩個(gè)node進(jìn)行比較頻率瘸味,頻率更小者將被remove,其中尾部元素就是上一部分從eden中淘汰出來(lái)的元素够挂,如果將兩步邏輯合并起來(lái)講是這樣的: 在eden隊(duì)列通過(guò)lru淘汰出來(lái)的”候選者“與probation隊(duì)列通過(guò)lru淘汰出來(lái)的“被驅(qū)逐者“進(jìn)行頻率比較旁仿,失敗者將被從cache中真正移除。下面看一下它的比較邏輯admit:
boolean admit(K candidateKey, K victimKey) {
int victimFreq = frequencySketch().frequency(victimKey);
int candidateFreq = frequencySketch().frequency(candidateKey);
//如果候選者的頻率高就淘汰被驅(qū)逐者
if (candidateFreq > victimFreq) {
return true;
//如果被驅(qū)逐者比候選者的頻率高孽糖,并且候選者頻率小于等于5則淘汰者
} else if (candidateFreq <= 5) {
// The maximum frequency is 15 and halved to 7 after a reset to age the history. An attack
// exploits that a hot candidate is rejected in favor of a hot victim. The threshold of a warm
// candidate reduces the number of random acceptances to minimize the impact on the hit rate.
return false;
}
//隨機(jī)淘汰
int random = ThreadLocalRandom.current().nextInt();
return ((random & 127) == 0);
}
從frequencySketch取出候選者與被驅(qū)逐者的頻率枯冈,如果候選者的頻率高就淘汰被驅(qū)逐者,如果被驅(qū)逐者比候選者的頻率高办悟,并且候選者頻率小于等于5則淘汰者尘奏,如果前面兩個(gè)條件都不滿足則隨機(jī)淘汰。
整個(gè)過(guò)程中你是不是發(fā)現(xiàn)protectedDeque并沒(méi)有什么作用病蛉,那它是怎么作為主存區(qū)來(lái)保存大部分?jǐn)?shù)據(jù)的呢?
//onAccess方法觸發(fā)該方法
void reorderProbation(Node<K, V> node) {
if (!accessOrderProbationDeque().contains(node)) {
// Ignore stale accesses for an entry that is no longer present
return;
} else if (node.getPolicyWeight() > mainProtectedMaximum()) {
return;
}
long mainProtectedWeightedSize = mainProtectedWeightedSize() + node.getPolicyWeight();
//先從probation中移除
accessOrderProbationDeque().remove(node);
//加入到protected中
accessOrderProtectedDeque().add(node);
node.makeMainProtected();
long mainProtectedMaximum = mainProtectedMaximum();
//從protected中移除
while (mainProtectedWeightedSize > mainProtectedMaximum) {
Node<K, V> demoted = accessOrderProtectedDeque().pollFirst();
if (demoted == null) {
break;
}
demoted.makeMainProbation();
//加入到probation中
accessOrderProbationDeque().add(demoted);
mainProtectedWeightedSize -= node.getPolicyWeight();
}
lazySetMainProtectedWeightedSize(mainProtectedWeightedSize);
}
當(dāng)數(shù)據(jù)被訪問(wèn)時(shí)并且該數(shù)據(jù)在probation中炫加,這個(gè)數(shù)據(jù)就會(huì)移動(dòng)到protected中去,同時(shí)通過(guò)lru從protected中淘汰一個(gè)數(shù)據(jù)進(jìn)入到probation中铺然。
這樣數(shù)據(jù)流轉(zhuǎn)的邏輯全部通了:新數(shù)據(jù)都會(huì)進(jìn)入到eden中俗孝,通過(guò)lru淘汰到probation,并與probation中通過(guò)lru淘汰的數(shù)據(jù)進(jìn)行使用頻率pk,如果勝利了就繼續(xù)留在probation中,如果失敗了就會(huì)被直接淘汰探熔,當(dāng)這條數(shù)據(jù)被訪問(wèn)了驹针,則移動(dòng)到protected烘挫。當(dāng)其它數(shù)據(jù)被訪問(wèn)了诀艰,則它可能會(huì)從protected中通過(guò)lru淘汰到probation中。
TinyLFU
傳統(tǒng)LFU一般使用key-value形式來(lái)記錄每個(gè)key的頻率饮六,優(yōu)點(diǎn)是數(shù)據(jù)結(jié)構(gòu)非常簡(jiǎn)單其垄,并且能跟緩存本身的數(shù)據(jù)結(jié)構(gòu)復(fù)用,增加一個(gè)屬性記錄頻率就行了卤橄,它的缺點(diǎn)也比較明顯就是頻率這個(gè)屬性會(huì)占用很大的空間绿满,但如果改用壓縮方式存儲(chǔ)頻率呢? 頻率占用空間肯定可以減少,但會(huì)引出另外一個(gè)問(wèn)題:怎么從壓縮后的數(shù)據(jù)里獲得對(duì)應(yīng)key的頻率呢?
TinyLFU的解決方案是類似位圖的方法窟扑,將key取hash值獲得它的位下標(biāo)喇颁,然后用這個(gè)下標(biāo)來(lái)找頻率漏健,但位圖只有0、1兩個(gè)值橘霎,那頻率明顯可能會(huì)非常大蔫浆,這要怎么處理呢? 另外使用位圖需要預(yù)占非常大的空間,這個(gè)問(wèn)題怎么解決呢?
TinyLFU根據(jù)最大數(shù)據(jù)量設(shè)置生成一個(gè)long數(shù)組姐叁,然后將頻率值保存在其中的四個(gè)long的4個(gè)bit位中(4個(gè)bit位不會(huì)大于15)瓦盛,取頻率值時(shí)則取四個(gè)中的最小一個(gè)。
Caffeine認(rèn)為頻率大于15已經(jīng)很高了外潜,是屬于熱數(shù)據(jù)原环,所以它只需要4個(gè)bit位來(lái)保存,long有8個(gè)字節(jié)64位处窥,這樣可以保存16個(gè)頻率嘱吗。取hash值的后左移兩位,然后加上hash四次滔驾,這樣可以利用到16個(gè)中的13個(gè)柜与,利用率挺高的,或許有更好的算法能將16個(gè)都利用到嵌灰。
public void increment(@Nonnull E e) {
if (isNotInitialized()) {
return;
}
int hash = spread(e.hashCode());
int start = (hash & 3) << 2;
// Loop unrolling improves throughput by 5m ops/s
int index0 = indexOf(hash, 0); //indexOf也是一種hash方法弄匕,不過(guò)會(huì)通過(guò)tableMask來(lái)限制范圍
int index1 = indexOf(hash, 1);
int index2 = indexOf(hash, 2);
int index3 = indexOf(hash, 3);
boolean added = incrementAt(index0, start);
added |= incrementAt(index1, start + 1);
added |= incrementAt(index2, start + 2);
added |= incrementAt(index3, start + 3);
//當(dāng)數(shù)據(jù)寫入次數(shù)達(dá)到數(shù)據(jù)長(zhǎng)度時(shí)就重置
if (added && (++size == sampleSize)) {
reset();
}
}
給對(duì)應(yīng)位置的bit位四位的Int值加1:
boolean incrementAt(int i, int j) {
int offset = j << 2;
long mask = (0xfL << offset);
//當(dāng)已達(dá)到15時(shí),次數(shù)不再增加
if ((table[i] & mask) != mask) {
table[i] += (1L << offset);
return true;
}
return false;
}
獲得值的方法也是通過(guò)四次hash來(lái)獲得沽瞭,然后取最小值:
public int frequency(@Nonnull E e) {
if (isNotInitialized()) {
return 0;
}
int hash = spread(e.hashCode());
int start = (hash & 3) << 2;
int frequency = Integer.MAX_VALUE;
//四次hash
for (int i = 0; i < 4; i++) {
int index = indexOf(hash, i);
//獲得bit位四位的Int值
int count = (int) ((table[index] >>> ((start + i) << 2)) & 0xfL);
//取最小值
frequency = Math.min(frequency, count);
}
return frequency;
}
當(dāng)數(shù)據(jù)寫入次數(shù)達(dá)到數(shù)據(jù)長(zhǎng)度時(shí)就會(huì)將次數(shù)減半迁匠,一些冷數(shù)據(jù)在這個(gè)過(guò)程中將歸0,這樣會(huì)使hash沖突降低:
void reset() {
int count = 0;
for (int i = 0; i < table.length; i++) {
count += Long.bitCount(table[i] & ONE_MASK);
table[i] = (table[i] >>> 1) & RESET_MASK;
}
size = (size >>> 1) - (count >>> 2);
}
關(guān)注個(gè)人微信公眾號(hào):肌肉碼農(nóng)驹溃,回復(fù)“好友”討論問(wèn)題城丧。