2. LevelDB源碼剖析之基礎(chǔ)部件-AtomicPointer靡努、Arena、Slice

在分析LevelDB各種實現(xiàn)細節(jié)之前晓折，先來了解一下LevelDB的各個基礎(chǔ)部件惑朦。

2.1 AtomicPointer

LevelDB有一個port目錄，port目錄下所有實現(xiàn)都是平臺相關(guān)的漓概，而所有在port之外的代碼則是平臺無關(guān)的漾月。這就保證了LevelDB的跨平臺特性，而AtomicPointer也在其中胃珍。
當(dāng)然梁肿，跨平臺只是AtomicPointer的附加屬性蜓陌，其根本目的在于實現(xiàn)原子指針，代碼如下：

class AtomicPointer
{
private:
  void *rep_;

public:
  AtomicPointer() {}
  explicit AtomicPointer(void *p) : rep_(p) {}
  inline void *NoBarrier_Load() const { return rep_; }
  inline void NoBarrier_Store(void *v) { rep_ = v; }
  inline void *Acquire_Load() const
  {
    void *result = rep_;
    MemoryBarrier();
    return result;
  }
  inline void Release_Store(void *v)
  {
    MemoryBarrier();
    rep_ = v;
  }

剛剛提到AtomicPointer用于實現(xiàn)原子指針的描述是有偏頗的栈雳，考慮如下兩個問題：

代碼中NoBarrier_Store/NoBarrier_Load操作只是最簡單的指針操作护奈，那么這些操作是原子的么缔莲？
Acquire_Load/Release_Store操作增加了MemoryBarrier操作哥纫，其作用是什么？又如何保證原子性呢痴奏？

2.1.1 指針操作的原子性

《Intel? 64 and IA-32 Architectures
Software Developer’s Manual》8.1.1節(jié)描述如下：

8.1.1 Guaranteed Atomic Operations
The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will always be carried out atomically:
? Reading or writing a byte
? Reading or writing a word aligned on a 16-bit boundary
? Reading or writing a doubleword aligned on a 32-bit boundary
The Pentium processor (and newer processors since) guarantees that the following additional memory operations will always be carried out atomically:
? Reading or writing a quadword aligned on a 64-bit boundary
? 16-bit accesses to uncached memory locations that fit within a 32-bit data bus
The P6 family processors (and newer processors since) guarantee that the following additional memory operation will always be carried out atomically:
? Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache line
Accesses to cacheable memory that are split across cache lines and page boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel? Atom?, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and should be avoided.

簡單來說蛀骇，在不跨越cacheline情況下，Intel處理器保證指針操作的原子性读拆；跨域cacheline情況下擅憔，部分處理器提供了原子保證。在通常情況下檐晕，C++ new出來的指針及對象內(nèi)部數(shù)據(jù)都是cacheline對其的暑诸，但如果使用 align 1 byte或者采用c++ placement new等特性時可能出現(xiàn)指針對象跨越cacheline的情況。
在LevelDB中辟灰，指針操作是cacheline對齊的个榕，因此問題一種NoBarrier_*的指針操作本身是原子的。那么芥喇，為何還需要Acqiure_Load和Release_Store呢西采？來看下一節(jié)。

2.1.2 Memory Barrier

CPU可以保證指針操作的原子性继控，但編譯器械馆、CPU指令優(yōu)化--重排序(reorder)可能導(dǎo)致指令亂序，在多線程情況下程序運行結(jié)果不符合預(yù)期武通。關(guān)于重排序說明如下:

單核單線程時霹崎，重排序保證單核單線程下程序運行結(jié)果一致。
單核多線程時冶忱，編譯器reorder可能導(dǎo)致運行結(jié)果不一致尾菇。參見《memory-ordering-at-compile-time》。
多核多線程時朗和，編譯器reorder错沽、CPU reorder將導(dǎo)致運行結(jié)果不一致。參見《memory-reordering-caught-in-the-act》眶拉。

避免編譯器Reorder通常的做法是引入Compiler Barrier(或稱之為Memory Barrier)千埃，避免CPU Reorder通常的做法是引入CPU Barrier(或稱之為Full Memory Barrier)。LevelDB引入的是Memory Barrier忆植，必然只是為了解決編譯器Reorder問題放可。
不同處理器支持的Memory Barrier指令不同谒臼，有些甚至不支持Memory Barrier，對于此類場景LevelDB采用C++ 11標(biāo)準(zhǔn)庫中的std::atomic<T>實現(xiàn)耀里。以x86下的MemoryBarrier為例：

inline void MemoryBarrier()
{
  // See http://gcc.gnu.org/ml/gcc/2003-04/msg01180.html for a discussion on
  // this idiom. Also see http://en.wikipedia.org/wiki/Memory_ordering.
  asm volatile(""
               :
               :
               : "memory");
}

volatile表示阻止編譯器對該值進行優(yōu)化蜈缤，強制變量使用精確內(nèi)存地址（非 cache或register），memory表示對內(nèi)存有修改操作冯挎，需要重新讀入底哥。
AtomicPointer只解決了編譯器重排序問題，對CPU重排序并未做任何防護房官，這是否意味著Leveldb在多核環(huán)境下運行是有問題的呢?? 實際上不然趾徽，因為Leveldb做了一個隱含保證：所有的AtomicPointer都是多讀單寫的，CPU重排序只有在并發(fā)寫場景下才會有問題翰守。

最后需要說明的是孵奶，如果AtomicPointer中不是inline函數(shù)(顯示指定非inline，避免編譯器優(yōu)化為inline)蜡峰，我們并不需要采用Memory Barrier了袁，因為函數(shù)調(diào)用本身就是一種Memory Barrier。引用《memory-ordering-at-compile-time》中相關(guān)描述：

In fact, the majority of function calls act as compiler barriers, whether they contain their own compiler barrier or not. This excludes inline functions, functions declared with thepure attribute, and cases where link-time code generation is used. Other than those cases, a call to an external function is even stronger than a compiler barrier, since the compiler has no idea what the function’s side effects will be. It must forget any assumptions it made about memory that is potentially visible to that function.

當(dāng)然湿颅，這并不是說作者多此一舉载绿，采用inline+Memory Barrier將獲取更好的性能、并解除了對編譯器依賴肖爵。
至此卢鹦，我們分別回答了文章開始提出的兩個問題，總結(jié)如下：

問：代碼中NoBarrier_Store/NoBarrier_Load操作只是最簡單的指針操作劝堪，那么這些操作是原子的么冀自？
答：在不跨越cacheline情況下，Intel處理器保證指針操作的原子性秒啦；跨域cacheline情況下熬粗，部分處理器提供了原子保證。LevelDB場景下不存在跨cacheline場景余境，因此這部分操作是原子的驻呐。
問：Acquire_Load/Release_Store操作增加了MemoryBarrier操作，其作用是什么芳来？又如何保證原子性呢含末？
答：增加Memory Barrier是為了避免編譯器重排序，保證MemoryBarrier前的全部操作真正在Memory Barrier前執(zhí)行即舌。

再來追加提出幾個問題佣盒，相信解答這幾個問題后，你對AtomicPointer會有一個完整的理解：

問：為何要設(shè)計這樣兩組操作顽聂？
答：性能肥惭。NoBarrier_Store/NoBarrier_Load的性能要優(yōu)于Acquire_Load/Release_Store盯仪，但Acquire_Load/Release_Store可以避免編譯器優(yōu)化，由此保證load/store時指針里面的數(shù)據(jù)一定是最新的蜜葱。
問：LevelDB代碼中如何選擇何時使用何種操作全景？
答：時刻小心。在任意一個用到指針的場景牵囤，結(jié)合上下文+并發(fā)考量選擇合適的load/store方法爸黄。當(dāng)然，一個比較保守的做法是奔浅，所有的場景下都使用帶Memory Barrier的load/store方法馆纳，僅當(dāng)確定可以使用NoBarrier的load/store方法才將其替換掉诗良。

2.2 Arena

Arena用于內(nèi)存管理汹桦，其存在的價值在于：

提高程序性能。減少Heap調(diào)用次數(shù)鉴裹，由Arena統(tǒng)一分配后返回到應(yīng)用層舞骆。
降低程序復(fù)雜度。分配后無需執(zhí)行dealloc径荔，當(dāng)Arena對象釋放時督禽，統(tǒng)一釋放由其創(chuàng)建的所有內(nèi)存。
便于內(nèi)存統(tǒng)計总处。如Arena分配的整體內(nèi)存大小等信息狈惫。

class Arena
{
public:
  Arena();
  ~Arena();

  // Return a pointer to a newly allocated memory block of "bytes" bytes.
  char *Allocate(size_t bytes);

  // Allocate memory with the normal alignment guarantees provided by malloc
  char *AllocateAligned(size_t bytes);

  // Returns an estimate of the total memory usage of data allocated
  // by the arena.
  size_t MemoryUsage() const
  {
    return reinterpret_cast<uintptr_t>(memory_usage_.NoBarrier_Load());
  }

private:
  char *AllocateFallback(size_t bytes);
  char *AllocateNewBlock(size_t block_bytes);

  // Allocation state
  char *alloc_ptr_;              //當(dāng)前block當(dāng)前位置指針
  size_t alloc_bytes_remaining_; //當(dāng)前block可用內(nèi)存大小

  // Array of new[] allocated memory blocks
  std::vector<char *> blocks_; //創(chuàng)建的全部內(nèi)存塊

  // Total memory usage of the arena.
  port::AtomicPointer memory_usage_; //目前為止分配的內(nèi)存總量

  // No copying allowed
  Arena(const Arena &);
  void operator=(const Arena &);
};

Arena為LevelDB定制的內(nèi)存管理器，并不保證線程安全鹦马，消費者為MemTable胧谈、SkipList，有幾個小技巧值得學(xué)習(xí)荸频。

2.2.1 非邊界對齊內(nèi)存分配

函數(shù)定義：

inline char *Arena::Allocate(size_t bytes)
{
  // The semantics of what to return are a bit messy if we allow
  // 0-byte allocations, so we disallow them here (we don't need
  // them for our internal use).
  assert(bytes > 0);
  //優(yōu)先從已分配內(nèi)存中做二次分配
  if (bytes <= alloc_bytes_remaining_)
  {
    char *result = alloc_ptr_;
    alloc_ptr_ += bytes;
    alloc_bytes_remaining_ -= bytes;
    return result;
  }

  //僅當(dāng)現(xiàn)有內(nèi)存不足時菱肖，從操作系統(tǒng)中分配
  return AllocateFallback(bytes);
}

唯一消費者：

void MemTable::Add(SequenceNumber s, ValueType type,
                   const Slice& key,
                   const Slice& value) {
  // Format of an entry is concatenation of:
  //  key_size     : varint32 of internal_key.size()
  //  key bytes    : char[internal_key.size()]
  //  value_size   : varint32 of value.size()
  //  value bytes  : char[value.size()]
  size_t key_size = key.size();
  size_t val_size = value.size();
  size_t internal_key_size = key_size + 8;
  const size_t encoded_len =
      VarintLength(internal_key_size) + internal_key_size +
      VarintLength(val_size) + val_size;
  
  // 分配數(shù)據(jù)區(qū)
  char* buf = arena_.Allocate(encoded_len);
  char* p = EncodeVarint32(buf, internal_key_size);
  memcpy(p, key.data(), key_size);
  p += key_size;
  EncodeFixed64(p, (s << 8) | type);
  p += 8;
  p = EncodeVarint32(p, val_size);
  memcpy(p, value.data(), val_size);
  assert((p + val_size) - buf == encoded_len);
  table_.Insert(buf);
}

allocate函數(shù)出現(xiàn)有幾個目的:

分配數(shù)據(jù)區(qū)。唯一消費者MemTable中存儲的是數(shù)據(jù)對象旭从，而非數(shù)據(jù)結(jié)構(gòu)稳强。
性能優(yōu)化。包括采用inline形式和悦、預(yù)先分配4k內(nèi)存等退疫。
和AllocateAligned相比，更充分利用內(nèi)存鸽素，減少實際像OS申請內(nèi)存的次數(shù)褒繁。

2.2.2 邊界對齊的內(nèi)存分配

char *Arena::AllocateAligned(size_t bytes)
{
  //最小8字節(jié)對齊
  const int align = (sizeof(void *) > 8) ? sizeof(void *) : 8;
  assert((align & (align - 1)) == 0); // Pointer size should be a power of 2
  size_t current_mod = reinterpret_cast<uintptr_t>(alloc_ptr_) & (align - 1);
  size_t slop = (current_mod == 0 ? 0 : align - current_mod);
  size_t needed = bytes + slop;
  char *result;
  if (needed <= alloc_bytes_remaining_)
  {
    result = alloc_ptr_ + slop;
    alloc_ptr_ += needed;
    alloc_bytes_remaining_ -= needed;
  }
  else
  {
    // AllocateFallback always returned aligned memory
    result = AllocateFallback(bytes);
  }
  assert((reinterpret_cast<uintptr_t>(result) & (align - 1)) == 0);
  return result;
}

唯一消費者：

template<typename Key, class Comparator>
typename SkipList<Key,Comparator>::Node*
SkipList<Key,Comparator>::NewNode(const Key& key, int height) {
  char* mem = arena_->AllocateAligned(
      sizeof(Node) + sizeof(port::AtomicPointer) * (height - 1));
  return new (mem) Node(key);
}

總結(jié)：

AllocateAligned用于分配數(shù)據(jù)結(jié)構(gòu)對象。不采用Allocate是為了避免出現(xiàn)邊界不對其導(dǎo)致指針操作的非原子性付鹿。
性能優(yōu)化澜汤。包括采用inline形式蚜迅、預(yù)先分配4k內(nèi)存等。

2.3 Slice

Slice的含義和其名稱一致俊抵，代表了一個數(shù)據(jù)塊谁不，data_為數(shù)據(jù)地址，size_為數(shù)據(jù)長度徽诲。在LevelDB中一般用于傳遞Key刹帕、Value或編解碼處理后的數(shù)據(jù)塊。
Slice一般和Arena配合使用谎替，其僅保持了數(shù)據(jù)信息偷溺，并未擁有數(shù)據(jù)的所有權(quán)。而數(shù)據(jù)在Arena對象的整個聲明周期內(nèi)有效钱贯。
和string相比挫掏，Slice具有的明顯好處包括：避免不必要的拷貝動作、具有比string更豐富的語義(可包含任意內(nèi)容)秩命。

class Slice {
    public:
            ......
    private:
        const char* data_;
        size_t size_;
    };

2.4 總結(jié)

原子指針(AtomicPointer)是通用的工具類尉共，為了高性能犧牲了部分可讀性(不可避免)。

Arena和Slice是為LevelDB定制的數(shù)據(jù)結(jié)構(gòu)弃锐，通過Arena有效減少了實際內(nèi)存分配頻率袄友，但降低了內(nèi)存使用率。Slice則用于各個流程間數(shù)據(jù)傳遞霹菊，減少不必要的數(shù)據(jù)拷貝開銷剧蚣。

額外聊一點，DPDK(Data Plane Development Kit)也是對性能要求極高的開源框架旋廷，但定位和LevelDB完全不同鸠按。DPDK主要處理網(wǎng)絡(luò)數(shù)據(jù)包轉(zhuǎn)發(fā)，應(yīng)用于NFV場景柳洋。其對內(nèi)存的處理上采用了大頁內(nèi)存待诅、連續(xù)物理內(nèi)存等方式提升程序性能，但這要求其獨占一臺VM熊镣。LevelDB后續(xù)版本卑雁，如果性能上想進一步提升可以從這點上做些文章。

參考文章：
https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf
http://www.voidcn.com/blog/chj90220/article/p-6069844.html
http://www.pandademo.com/2016/03/atomicpointer-leveldb-source-dissect-2/
其他相關(guān)資料：
an-introduction-to-lock-free-programming
memory-ordering-at-compile-time
acquire-and-release-fences
memory-barriers-are-like-source-control-operations
memory-reordering-caught-in-the-act
acquire-and-release-semantics

轉(zhuǎn)載請注明：【隨安居士】http://www.reibang.com/p/3161784e7573

最后編輯于：2017.12.08 03:03:36

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末绪囱，一起剝皮案震驚了整個濱河市测蹲，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌鬼吵，老刑警劉巖扣甲，帶你破解...
沈念sama閱讀 222,104評論 6贊 515
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異，居然都是意外死亡琉挖，警方通過查閱死者的電腦和手機启泣，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 94,816評論 3贊 399
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來示辈，“玉大人寥茫，你說我怎么就攤上這事》椋” “怎么了纱耻？”我有些...
開封第一講書人閱讀 168,697評論 0贊 360
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長险耀。經(jīng)常有香客問我弄喘，道長，這世上最難降的妖魔是什么甩牺？我笑而不...
開封第一講書人閱讀 59,836評論 1贊 298
?港島之戀（遺憾婚禮）
正文為了忘掉前任蘑志，我火速辦了婚禮，結(jié)果婚禮上柴灯，老公的妹妹穿的比我還像新娘卖漫。我一直安慰自己，他們只是感情好赠群，可當(dāng)我...
茶點故事閱讀 68,851評論 6贊 397
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著旱幼，像睡著了一般查描。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上柏卤，一...
開封第一講書人閱讀 52,441評論 1贊 310
城市分裂傳說
那天冬三，我揣著相機與錄音，去河邊找鬼缘缚。笑死勾笆，一個胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的桥滨。我是一名探鬼主播窝爪，決...
沈念sama閱讀 40,992評論 3贊 421
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼齐媒！你這毒婦竟也來了蒲每？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 39,899評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤喻括，失蹤者是張志新（化名）和其女友劉穎邀杏，沒想到半個月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體唬血，經(jīng)...
沈念sama閱讀 46,457評論 1贊 318
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡望蜡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 38,529評論 3贊 341
?白月光啟示錄
正文我和宋清朗相戀三年唤崭，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片脖律。...
茶點故事閱讀 40,664評論 1贊 352
活死人
序言：一個原本活蹦亂跳的男人離奇死亡浩姥，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出状您，到底是詐尸還是另有隱情勒叠，我是刑警寧澤，帶...
沈念sama閱讀 36,346評論 5贊 350
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布膏孟，位于F島的核電站眯分，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏柒桑。R本人自食惡果不足惜弊决，卻給世界環(huán)境...
茶點故事閱讀 42,025評論 3贊 334
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望魁淳。院中可真熱鬧飘诗，春花似錦、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 32,511評論 0贊 24
一樁弒父案娱挨，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽掘猿。三九已至，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 33,611評論 1贊 272
情欲美人皮
我被黑心中介騙來泰國打工喳瓣，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人赞别。一個月前我還...
沈念sama閱讀 49,081評論 3贊 377
代替公主和親
正文我出身青樓畏陕，卻偏偏與公主長得像，于是被迫代替她去往敵國和親仿滔。傳聞我的和親對象是個殘疾皇子惠毁，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 45,675評論 2贊 359

2. LevelDB源碼剖析之基礎(chǔ)部件-AtomicPointer、Arena搪哪、Slice