Dynamo

Summary:

Dynamo is a highly available key-value storage engine that Amazon's core services use to provide an "always-on" experience.

It is achieved by extensive use of object versioning and application-assisted conflict resolution

ACID properties:

Dynamo targets applications that operate with weaker consistency if this results in high availability. Dynamo does not provide any isolation guarantees and permits only single key updates.

First time knowing about service level agreement (SLA). It is a contract where client (internal user of Dynamo) and a service provider agree on serveral system-related characteristics. An example of aa simple SLA is a service guarateeing that it will provide a response within 300ms for 99.9% of its requests for a peak client load of 500 requests per second.

One of the main design considerations for Dynamo is to give services control over their system properties, such as durability and consistency, and to let services make their own tradeoffs between functionality, performance and cost-effectiveness.

Dynamo is designed to be an eventually consistent data store. That is all updates reach all replicas eventually.

An important design consideration is to decide when to perform the process of resolving update conflicts. Dynamo targets the design space of an "always writeable" data store (i.e., a data store that is highly available for writes)

Since Amazon needs to make sure when user checkout, the writes is always availble (not rejected on customer side). This design consideration makes lots of sense.

Who performs the process of conflict?

This can be done by the data store or the application. On data store side, the choice are limited. On application side, they can decide on the conflict resolution method that is best suited for its client's experience. For instance, the application that maintains customer shopping carts can choose to "merge" the conflicting versions and return a single unified shopping cart.

Compare to P2P storage systems, Dynamo does not focus on the problem of data integrity and security because it is built for a trusted environment. And it is only storing key-value pairs compare to Big-Table where multi-dimensional sorted map is stored.

So this allows Dynamo focus on high availability where updates are not rejected even in the wake of network partitions or server failures.

Snip20210723_26.png

Dynamo using consistent hashing to allow scale incrementally. In consistent hashing, the output of a hash function is treated as a fixed circular space or "ring" (i.e. the largest hash value wraps around to the smallest hash value). Each node in the system is assigned a random value within this apce which represents its "position" on the ring. Thus, each node becomes responsible for the region in the ring between it and its predecessor node on the ring. The principle advantage of consistent hashing is that departure or arrival of a node only affects its immediate neighbors and other nodes remain unaffected (follow incremental scalbility design)

But it's not perfect, due to random position, it could cause non-uniform data and load distribution. To address these issues, Dynamo uses a variant of consistent hashing:

instead of mapping a node to a single point in the circle, each node gets assigned to multiple points in the ring. To this end, Dynamo uses the concept of “virtual nodes”. A virtual node looks like a single node in the system, but each node can be responsible for more than one virtual node. Effectively, when a new node is added to the system, it is assigned multiple positions (henceforth, “tokens”) in the ring.

Version control

Dynamo uses vector clocks [12] in order to capture causality between different versions of the same object. A vector clock is effectively a list of (node, counter) pairs.

Vector clock provides eventual consistensy so update rates could very (if it is high Dynamo is still able to write)

Dynamo uses a consistency protocol similar to those used in quorum systems.

This protocol has two key configurable values: R and W. R is the minimum number of nodes that must participate in a successful read operation. W is the minimum number of nodes that must participate in a successful write operation.

Setting R and W such that R + W > N yields a quorum-like system.

N is highest-ranked reachable nodes.

Implementation

Dynamo uses Berkeley Databse (BDB) Transactional Data Store, BDB Java Edition, MySQL. The main reason for designing a pluggable persistence component is to choose the storage engine best suited for an application's access patterns. For exmaple, BDB can handle objects in order of tens of kilobytes where MySQL can handle larger sizes.

The majority of Dynamo's production instances use BDB Transactional Data Store.

沒想到BDB也有開源版本……看來大部分軟件都是有源頭的...

All communications are implemented using Java NIO channels.

Fine tuning

The main advantage of Dynamo is that its client applications can tune the values of N, R and W to achieve their desired levels of performance, availability and durability. For instance, the value of N determines the durability of each object. A typical value of N used by Dynamo’s users is 3.

image.png
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末毒租,一起剝皮案震驚了整個(gè)濱河市稚铣,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌墅垮,老刑警劉巖惕医,帶你破解...
    沈念sama閱讀 212,454評論 6 493
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異算色,居然都是意外死亡曹锨,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,553評論 3 385
  • 文/潘曉璐 我一進(jìn)店門剃允,熙熙樓的掌柜王于貴愁眉苦臉地迎上來沛简,“玉大人齐鲤,你說我怎么就攤上這事〗烽梗” “怎么了给郊?”我有些...
    開封第一講書人閱讀 157,921評論 0 348
  • 文/不壞的土叔 我叫張陵,是天一觀的道長捧灰。 經(jīng)常有香客問我淆九,道長,這世上最難降的妖魔是什么毛俏? 我笑而不...
    開封第一講書人閱讀 56,648評論 1 284
  • 正文 為了忘掉前任炭庙,我火速辦了婚禮,結(jié)果婚禮上煌寇,老公的妹妹穿的比我還像新娘焕蹄。我一直安慰自己,他們只是感情好阀溶,可當(dāng)我...
    茶點(diǎn)故事閱讀 65,770評論 6 386
  • 文/花漫 我一把揭開白布腻脏。 她就那樣靜靜地躺著,像睡著了一般银锻。 火紅的嫁衣襯著肌膚如雪永品。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 49,950評論 1 291
  • 那天击纬,我揣著相機(jī)與錄音鼎姐,去河邊找鬼。 笑死更振,一個(gè)胖子當(dāng)著我的面吹牛症见,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播殃饿,決...
    沈念sama閱讀 39,090評論 3 410
  • 文/蒼蘭香墨 我猛地睜開眼谋作,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了乎芳?” 一聲冷哼從身側(cè)響起遵蚜,我...
    開封第一講書人閱讀 37,817評論 0 268
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎奈惑,沒想到半個(gè)月后吭净,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 44,275評論 1 303
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡肴甸,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 36,592評論 2 327
  • 正文 我和宋清朗相戀三年寂殉,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片原在。...
    茶點(diǎn)故事閱讀 38,724評論 1 341
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡友扰,死狀恐怖彤叉,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情村怪,我是刑警寧澤秽浇,帶...
    沈念sama閱讀 34,409評論 4 333
  • 正文 年R本政府宣布,位于F島的核電站甚负,受9級特大地震影響柬焕,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜梭域,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 40,052評論 3 316
  • 文/蒙蒙 一斑举、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧病涨,春花似錦富玷、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,815評論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽沸柔。三九已至循衰,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間褐澎,已是汗流浹背会钝。 一陣腳步聲響...
    開封第一講書人閱讀 32,043評論 1 266
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留工三,地道東北人迁酸。 一個(gè)月前我還...
    沈念sama閱讀 46,503評論 2 361
  • 正文 我出身青樓,卻偏偏與公主長得像俭正,于是被迫代替她去往敵國和親奸鬓。 傳聞我的和親對象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 43,627評論 2 350

推薦閱讀更多精彩內(nèi)容