要點(diǎn):
- 充分發(fā)揮多核的威力
- Actor模型傅是,不共享內(nèi)存,lock-free
- 每一個(gè)Actor固定在一個(gè)core的一個(gè)線程上
- Key按一致性哈希分配到不同Server的不同Actor中
- Hot Key采用多主(multi master)復(fù)制陵究,由多個(gè)Actor同時(shí)并行處理辉词,副本(replica)數(shù)量需要根據(jù)情況進(jìn)行選擇
- Actor之間(包括本地和網(wǎng)絡(luò)中)通過(guò)定期廣播進(jìn)行同步禽绪,且只同步本地更新的最終狀態(tài)(如果一次廣播期間發(fā)生多次改動(dòng))
- Lattice Composition算法進(jìn)行并行修改的合并医瘫,Actor接收到廣播消息進(jìn)行本地合并
- 保證最終一致性
- 跟Redis相比侣肄,單線程的性能變化不大,優(yōu)勢(shì)主要是伸縮性和Hot Key的多副本并發(fā)
需要更多細(xì)節(jié)可以閱讀文末鏈接中的論文醇份,下面是論文內(nèi)容的一些摘錄稼锅。
Anna is a new key-value store system called Anna: a partitioned, multi-mastered system that achieves high performance and elasticity via wait-free execution and coordination-free consistency
Our design rests on a simple architecture of coordination-free
actors
that perform state update via merge of lattice-based composite data structures.
Goal
providing excellent performance on a single multicore
machine, while scaling up elastically to geo-distributed
cloud deployment.
Requirements
-
partition
(shard) the key space, not only across nodes at cloud scale but also across cores for high performance - workload scaling, employ
multi-master replication
to concurrently serve puts and gets against a single key from multiple threads -
wait-free execution
, meaning that each thread is always doing useful work (serving requests), and never waiting for other threads for reasons of consistency or semantics - coordination-free consistency models
Design
Coordination-free Actors
besting state-of-the-art lock-free shared memory implementations while scaling smoothly and making repartitioning for elasticity extremely responsive.
uses
lattice composition
to maintain the consistency of replicated state. Lattices are resilient to message re-ordering and duplication, allowing Anna to employ asynchronous multi-master replication without need for any waiting
Anna combines
asynchronous multi-master replication
withlattice-based state management
to remain scalable across both low and high conflict workloads while still guaranteeing consistency
Multi Master Replication
In multi-master replication, a key is replicated
on multiple actors, each of which can read and update its own local copy.
In a coordination-free approach, on the other hand, each actor can process a request locally without introducing any inter-actor communication on the critical path. Updates are periodically communicated to other actors when a timer is triggered or when the actor experiences a reduction in request load.
Unlike synchronous multi-master and single-master replication, a coordination-free multi-master scheme could lead to inconsistencies between replicas, because replicas may observe and process messages in different orders.
Rader: Key通過(guò)一致性哈希分不到不同的Server和Actors
Multi Cast Periodically
Anna perform updates against their local state in parallel without synchronizing, and periodically exchange state via multicast.
Anna employs simple eventual consistency
, and threads are set to multicast every 100 milliseconds.
On single machine, Actors update their local states, then write the updates to a shared buffer and multicast the address of updates in buffer to other actors.
On different machines, updates needs to be serialized (e.g. through protobuf) and then broadcast through tcp.
Rader:Anna是最終一致性的,意味著會(huì)有一個(gè)時(shí)間窗口各個(gè)Actor本地的狀態(tài)是不同步的
Results
Good performance than shared-memory models
Anna indeed achieves wait-free execution: the vast majority of CPU time (90%) is spent processing requests without many cache misses, while overheads of lattice merge and multicast are small. In short, Anna’s Coordination-free actor model addresses the heart of the scalability limitations of multi-core KVS systems.
TBB and Masstree spend 92% - 95% of the CPU time on atomic instructions under high contention, and only 4% - 7% of the CPU time is devoted to request handling. As a result, the TBB hash map and Masstree perform 50× slower than Anna (rep= 1) and 700× slower than Anna (full replication).
Rader:更新沖突較多的情況下僚纷,共享內(nèi)存模型花費(fèi)了絕大多數(shù)的CPU在原子操作上矩距,不管是有鎖還是無(wú)鎖的實(shí)現(xiàn)方式維護(hù)“緩存一致性”都是瓶頸。 Redis因?yàn)槭菃尉€程的怖竭,沒(méi)有這方面的問(wèn)題锥债。
Be care of replications
for systems that support multi-master replication, having a high replication factor under low contention workloads can hurt performance. Instead, we want to dynamically monitor the data’s contention level and
selectively replicate the highly contented keys across threads
Rader:沖突較少的情況下,謹(jǐn)慎選擇副本數(shù)量侵状,過(guò)多的副本會(huì)傷害性能
Compare with Redis Cluster
Anna can significantly outperform Redis Cluster by replicating hot keys under high contention, and can
match the performance of Redis Cluster under low contention.
Rader:低沖突的情況下跟Redis性能差不多赞弥,但是高沖突的時(shí)候可以通過(guò)Hot Key副本提高性能