本篇文章是對Exactly once is NOT exactly the same翻譯和分析架诞,對流式計算中衡量準確性的三類語義進行了初步的理解。
1婿牍、Stream Processing Engines
Distributed event stream processing has become an increasingly hot topic in the area of Big Data. Notable Stream Processing Engines (SPEs)
include Apache Storm, Apache Flink, Heron, Apache Kafka (Kafka Streams), and Apache Spark (Spark Streaming). One of the most notable and widely discussed features of SPEs is their processing semantics, with “exactly-once” being one of the most sought after and many SPEs claiming to provide “exactly-once” processing semantics.
There exists a lot of misunderstanding and ambiguity, however, surrounding what exactly “exactly-once” is, what it entails, and what it really means when individual SPEs claim to provide it. The label “exactly-once” for describing processing semantics is also very misleading. In this blog post, I’ll discuss how “exactly-once” processing semantics differ across many popular SPEs andwhy “exactly-once” can be better described as **effectively-once**
. I’ll also explore the tradeoffs(權(quán)衡) between common techniques used to achieve what is often called “exactly-once.”
注意:精確一次更準確的說法應該是有效一次侈贷。
2、Background
Stream processing, sometimes referred to as event processing, can be succinctly described as continuous processing of an unbounded series of data or events.(流處理等脂,有時也稱為事件處理俏蛮,可以簡單地描述為對一系列無界數(shù)據(jù)或事件的連續(xù)處理) A stream- or event-processing application can be more or less described as a directed graph and often, but not always, as a directed acyclic graph (DAG).(流或事件處理應用程序可以或多或少地描述為有向圖,也可以經(jīng)常(但不總是)描述為有向無環(huán)圖(DAG)) In such a graph, each edge represents a flow of data or events and each vertex represents an operator that uses application-defined logic to process data or events from adjacent(鄰近的) edges.(每個邊表示一個數(shù)據(jù)流或事件流上遥,每個頂點表示一個操作符搏屑,該操作符使用應用程序定義的邏輯來處理來自相鄰邊的數(shù)據(jù)或事件) There are two special types of vertices, commonly referenced as sources and sinks. Sources consume external data/events and inject them into the application while sinks typically gather results produced by the application.(有兩種特殊類型的頂點,通常稱為 sources 和 sinks粉楚。sources讀取外部數(shù)據(jù)/事件到應用程序中辣恋,而 sinks 通常會收集應用程序生成的結(jié)果) Figure 1 below depicts an example a streaming application.
Figure 1. A typical Heron processing topology
An SPE that executes a stream/event processing application usually allows users to specify a reliability mode or processing semantics that indicates which guarantees it will provide for data processing across the entirety of the application graph.(執(zhí)行流/事件處理應用程序的SPE通常允許用戶指定一種可靠性模式或處理語義,該可靠性模式或處理語義指示哪些保證它將為整個應用程序圖提供數(shù)據(jù)處理) These guarantees are meaningful since you can always assume the possibility of failures via network, machines, etc. that can result in data loss.(這些保證是有意義的模软,因為您總是可以假設通過網(wǎng)絡伟骨、機器等可能導致數(shù)據(jù)丟失的故障的可能性) Three modes/labels, at-most-once, at-least-once, and exactly-once, are generally used to describe the data processing semantics that the SPE should provide to the application.
Here’s a loose definition of those different processing semantics(通常使用三種模式/標簽(最多一次,至少一次和完全一次)來描述SPE應該提供給應用程序的數(shù)據(jù)處理語義)
3燃异、At-most-once(最多一次)
This is essentially a “best effort” approach(這本質(zhì)上是一種“盡力而為”的方法). Data or events are guaranteed to be processed at most once by all operators in the application.(應用程序中的所有操作符保證最多一次處理數(shù)據(jù)或event) This means that no additional attempts will be made to retry or retransmit events if it was lost before the streaming application can fully process it(這意味著携狭,如果流應用程序在完全處理事件之前丟失了事件,則不會再嘗試重試或重新傳輸事件). Figure 2 illustrates an example of this.
Figure 2. At-most-once processing semantics
4回俐、At-least-once(至少一次)
Data or events are guaranteed to be processed at least once by all operators in the application graph(這意味著逛腿,如果流應用程序在完全處理事件之前丟失了事件稀并,則不會再嘗試重試或重新傳輸事件). This usually means an event will be replayed or retransmitted from the source if the event is lost before the streaming application fully processed it.(這通常意味著如果事件在流應用程序完全處理它之前丟失,則將從源重播或重新傳輸該事件) Since it can be retransmitted, however, an event can sometimes be processed more than once, thus the at-least-once term.(然而单默,由于可以將其重新傳輸碘举,因此事件有時可以進行多次處理,因此搁廓,至少一次) Figure 3 illustrates an example of this. In this case, the first operator initially fails to process an event, then succeeds upon retry, then succeeds upon a second retry that turns out to have been unnecessary.(圖3舉例說明了這一點引颈。在這種情況下,第一個操作符最初無法處理事件境蜕,然后重試成功线欲,然后再重試成功,最后發(fā)現(xiàn)沒有必要進行第二次重試)
Figure 3. At-least-once processing semantics
5汽摹、Exactly-once(精確一次)
Events are guaranteed to be processed “exactly once” by all operators in the stream application, even in the event of various failures.(即使在發(fā)生各種故障的情況下,流應用程序中的所有操作符也保證只處理一次事件)
Two popular mechanisms(機制) are typically used to achieve “exactly-once” processing semantics.
Distributed snapshot/state checkpointing
(分布式快照或者是狀態(tài)檢查點)At-least-once event delivery plus message deduplication
(至少一次事件傳遞以及消息重復數(shù)據(jù)刪除)
5.1苦锨、Distributed snapshot/state checkpointing
The distributed snapshot/state checkpointing method of achieving “exactly-once” is inspired by the Chandy-Lamport distributed snapshot algorithm.1(實現(xiàn)“完全一次”的分布式快照/狀態(tài)檢查點方法是受Chandy-Lamport分布式快照算法啟發(fā)的逼泣。) With this mechanism, all the state for each operator in the streaming application is periodically checkpointed, and in the event of a failure anywhere in the system, all the state of for every operator is rolled back to the most recent globally consistent checkpoint.(流應用程序中每個操作符的所有狀態(tài)都是定期檢查點的,當系統(tǒng)中任何地方出現(xiàn)故障時舟舒,每個操作符的所有狀態(tài)都回滾到最近的全局一致檢查點) During the rollback, all processing will be paused.(在回滾期間拉庶,所有處理將被暫停) Sources are also reset to the correct offset corresponding to the most recent checkpoint.(source也被重置為與最近的檢查點相對應的正確偏移量) The whole streaming application is basically rewound to its most recent consistent state and processing can then restart from that state(整個流處理應用程序基本上被重新恢復到最近的一致狀態(tài),然后處理可以從該狀態(tài)重新啟動). Figure 4 below illustrates the basics of this mechanism.
Figure 4. Distributed snapshot
In Figure 4,
the streaming application is working normally at T1 and the state is checkpointed.(流應用程序在T1下正常工作秃励,狀態(tài)為檢查點氏仗。)
At time T2, however, the operator fails to process an incoming datum. At this point, the state value of S = 4 has been saved to durable storage, while the state value S = 12 is held in the operator’s memory.(但是在T2時,Operator不能處理輸入的數(shù)據(jù)夺鲜。此時皆尔,狀態(tài)值S = 4被保存到持久存儲中,而狀態(tài)值S = 12保存在操作符的內(nèi)存中币励。)
In order to overcome this discrepancy, at time T3 the processing graph rewinds the state to S = 4 and “replays” each successive state in the stream up to the most recent, processing each datum(基準點). (為了克服這種差異慷蠕,在時間T3處,處理圖將狀態(tài)后退到S = 4食呻,并將流中的每個連續(xù)狀態(tài)“重播”到最新的流炕,處理每個數(shù)據(jù)的狀態(tài)。)
The end result is that some data have been processed multiple times, but that’s okay because the resulting state is the same no matter how many rollbacks have been performed.(最終的結(jié)果是仅胞,有些數(shù)據(jù)已經(jīng)處理了多次每辟,但是這沒有關(guān)系,因為不管執(zhí)行了多少回滾干旧,結(jié)果狀態(tài)都是相同的渠欺。)
5.2、At-least-once event delivery plus message deduplication
Another method used to achieve “exactly-once” is through implementing at-least-once event delivery in conjunction with event deduplication on a per-operator basis.(另一種實現(xiàn)精確一次的方法是在每個operation的基礎上實現(xiàn)至少一次事件交付和事件重復數(shù)據(jù)刪除) SPEs utilizing this approach will replay failed events for further attempts at processing and remove duplicated events for every operator prior to the events entering the user defined logic in the operator.(使用此方法的spe將重播失敗事件莱革,以便進一步嘗試處理峻堰,并在事件進入操作符中用戶定義的邏輯之前刪除每個操作符的重復事件讹开。) This mechanism requires that a transaction log be maintained for every operator to track which events it has already processed.(該機制要求為每個操作符維護一個事務日志,以跟蹤它已經(jīng)處理的事件) SPEs that utilize a mechanism like such are Google’s MillWheel2 and Apache Kafka Streams. Figure 5 illustrates the gist of this mechanism.
Figure 5. At-least-once delivery plus deduplication
6捐名、Is exactly-once really exactly-once?
Now let’s reexamine what the “exactly-once” processing semantics really guarantees to the end user. The label “exactly-once” is misleading in describing what is done exactly once.(現(xiàn)在讓我們重新審視『精確一次』處理語義真正對最終用戶的保證旦万。『精確一次』這個術(shù)語在描述正好處理一次時會讓人產(chǎn)生誤導)
Some might think that “exactly-once” describes the guarantee to event processing in which each event in the stream is processed only once.(有些人可能認為『精確一次』描述了事件處理的保證镶蹋,其中流中的每個事件只被處理一次)
In reality, there is no SPE that can guarantee exactly-once processing. To guarantee that the user-defined logic in each operator only executes once per event is impossible in the face of arbitrary failures, because partial execution of user code is an ever-present possibility.(實際上成艘,沒有SPE可以保證精確的一次處理。要保證每個操作符中的用戶定義邏輯只針對每個事件執(zhí)行一次是不可能的贺归,因為隨時都可能出現(xiàn)部分執(zhí)行用戶代碼的情況淆两。)
So what does SPEs guarantee when they claim “exactly-once” processing semantics? If user logic cannot be guaranteed to be executed exactly once then what is executed exactly once? When SPEs claim “exactly-once” processing semantics, what they’re actually saying is that they can guarantee that updates to state managed by the SPE are committed only once to a durable backend store.(那么,當引擎聲明『精確一次』處理語義時拂酣,它們能保證什么呢秋冰?如果不能保證用戶邏輯只執(zhí)行一次,那么什么邏輯只執(zhí)行一次婶熬?當引擎聲明『精確一次』處理語義時剑勾,它們實際上是在說,它們可以保證引擎管理的狀態(tài)更新只提交一次到持久的后端存儲)
Both mechanisms described above use a durable backend store as a source of truth that can hold the state of every operator and automatically commit updates to it. (上面描述的兩種機制都使用持久的后端存儲作為真實性的來源赵颅,可以保存每個算子的狀態(tài)并自動向其提交更新)
For mechanism 1 (distributed snapshot/state checkpointing), this durable backend state is used to hold the globally consistent state checkpoints (checkpointed state for every operator) for the streaming application.(對于機制 1 (分布式快照 / 狀態(tài)檢查點)虽另,此持久后端狀態(tài)用于保存流應用程序的全局一致狀態(tài)檢查點(每個算子的檢查點狀態(tài))
For mechanism 2 (at-least-once event delivery plus deduplication), the durable backend state is used to store the state of every operator as well as a transaction log for every operator that tracks all the events it has already fully processed.(對于機制 2 (至少一次事件傳遞加上重復數(shù)據(jù)刪除),持久后端狀態(tài)用于存儲每個算子的狀態(tài)以及每個算子的事務日志饺谬,該日志跟蹤它已經(jīng)完全處理的所有事件)
The committing of state or applying updates to the durable backend that is the source of truth can be described as occurring exactly-once.(提交狀態(tài)或?qū)ψ鳛檎鎸崄碓吹某志煤蠖藨酶驴梢员幻枋鰹榍『冒l(fā)生一次) Computing the state update/change, i.e. processing the event that is executing arbitrary user -defined logic on the event, however, can happen more than once if failures occur, as mentioned above(如上所述捂刺,計算狀態(tài)更新/更改,即處理在事件上執(zhí)行任意用戶定義的邏輯的事件募寨,如果發(fā)生故障族展,則可能會發(fā)生多次(如上所述)). In other words, the processing of an event can happen more than once but the effect of that processing is only reflected once in the durable backend state store.(換句話說,事件的處理可以發(fā)生多次拔鹰,但是處理的效果只在持久后端狀態(tài)存儲中反映一次) Here at Streamlio, we’ve decided that effectively-once is the best term for describing these processing semantics.
7苛谷、Distributed snapshot versus at-least-once event delivery plus deduplication
From a semantic point of view, both the distributed snapshot and at-least-once event delivery plus deduplication mechanisms provide that same guarantee. Due to differences in implementation between the two mechanisms, however, there are significant performance differences.(從語義的角度來看,分布式快照和至少一次事件交付加上重復數(shù)據(jù)刪除機制都提供了相同的保證格郁。但是腹殿,由于這兩種機制的實現(xiàn)方式不同,性能也有很大差異例书。)
The performance overhead of mechanism 1 (distributed snapshot/state checkpointing) on top of the SPE can be minimal since the SPE is essentially sending a few special events alongside regular events through all the operators in the streaming application, while state checkpointing can be performed asynchronously in the background.(在SPE之上的機制1(分布式快照/狀態(tài)檢查點)的性能開銷可以最小化锣尉,因為SPE本質(zhì)上通過流式處理程序中的所有運算符發(fā)送一些特殊事件以及常規(guī)事件,而狀態(tài)檢查點在后臺可以異步執(zhí)行) For large streaming applications, however, failures may happen more frequently, causing the SPE to need to pause the application and roll back the state of all operators, which will in turn impact performance(然而决采,對于大型流應用程序自沧,故障可能發(fā)生得更頻繁,導致SPE需要暫停應用程序并回滾所有操作符的狀態(tài),這反過來又會影響性能). The larger the streaming application, the more likely and thus more frequently failures can occur, and in turn, the more significantly the performance of the streaming application will be impacted(流應用程序越大拇厢,故障發(fā)生的可能性就越大爱谁,因此故障發(fā)生的頻率也就越高,反過來孝偎,流應用程序的性能受到的影響也就越大). However, again, this mechanism is very non-intrusive and demands minimal additional resources impact to run.(但是访敌,這種機制是非侵入性的,并且運行時對額外資源的影響很小衣盾。)
Mechanism 2 (at-least-once event delivery plus deduplication) may require a lot more resources, especially storage(機制2(至少一次事件交付加上重復數(shù)據(jù)刪除)可能需要更多的資源寺旺,特別是存儲。). With this mechanism, the SPE would need to be able to track every tuple that has been fully processed by every instance of an operator to perform deduplication as well as perform the deduplication itself for every event.(使用這種機制势决,SPE將需要能夠跟蹤操作符的每個實例已經(jīng)完全處理過的每個元組阻塑,以便執(zhí)行重復數(shù)據(jù)刪除,以及為每個事件執(zhí)行重復數(shù)據(jù)刪除本身果复。) This can amount to a huge amount of data to keep track of, especially if the streaming application is large or if there are many applications running. There is also performance overhead associated with every event at every operator to perform the deduplication(每個操作員執(zhí)行重復數(shù)據(jù)消除的每個事件都會產(chǎn)生性能開銷). With this mechanism, however, the performance of the streaming application is less likely to be impacted by the size of the application.(流式應用不太可能受到應用大小的影響)
- 1陈莽、With mechanism 1, a global pause and state rollback needs to occur if any failures occur on any operator; (使用機制1,如果任何操作符發(fā)生任何故障虽抄,則需要執(zhí)行全局暫停和狀態(tài)回滾)
- 2传透、with mechanism 2, the effects of a failure are much more localized. When a failure occurs in an operator, events that might have not been fully processed are just replayed/retransmitted from an upstream source.(對于機制2,故障的影響更加局限极颓。當操作員發(fā)生故障時,可能只是從上游源重放/重傳了可能尚未完全處理的事件) The performance impact is isolated to where the failure happened in the streaming application and will cause little impact to the performance of other operators in the streaming application.(性能影響被隔離到流應用程序中發(fā)生故障的地方群嗤,對流應用程序中其他操作符的性能影響很胁ぢ ) The pros and cons of both mechanisms from a performance standpoint are listed in the tables below.(從性能的角度來看,這兩種機制的優(yōu)缺點列在下面的表中狂秘。)
Distributed snapshot/state checkpointing
Pros優(yōu)點 | Cons缺點 |
---|---|
Little performance and resource overhead (性能和資源開銷很少) | Larger impact to performance when recovering from failures (從故障中恢復對性能的影響更大) |
Potential impact to performance increases as topology gets larger (隨著拓撲變大骇径,對性能的潛在影響會增加) |
At-least-once delivery plus deduplication
Pros(優(yōu)點) | Cons(缺點) |
---|---|
Performance impact of failures are localized (故障對性能的影響已本地化) | Potentially need large amounts of storage and infrastructure to support (潛在需要大量的存儲和基礎架構(gòu)來支持) |
Impact of failures does not necessarily increase with the size of the topology (故障的影響不一定隨拓撲的大小而增加) | Performance overhead for every event at every operator (每個操作員的每個事件的性能開銷) |
Though there are differences between the distributed snapshot and at-least-once event delivery plus deduplication mechanisms from a theoretical point of view, both can be reduced to at-least-once processing plus idempotency.(盡管從理論上講,分布式快照和至少一次事件交付加上重復數(shù)據(jù)刪除機制之間存在差異者春,但兩者都可以簡化為至少一次處理加上冪等性) For both mechanisms, events will be replayed/retransmitted when failures occur (implementing at-least-once), and through state rollback or event deduplication, operators essentially become idempotent when updating internally managed state.(對于這兩種機制破衔,都將在發(fā)生故障時(至少一次實現(xiàn))重播/重傳事件,并且通過狀態(tài)回滾或事件重復數(shù)據(jù)刪除钱烟,操作員在更新內(nèi)部管理狀態(tài)時實質(zhì)上將成為冪等晰筛。)
8、Conclusion
In this blog post, I hope to have convinced you that the term “exactly-once” is very misleading. Providing “exactly-once” processing semantics really means that distinct updates to the state of an operator that is managed by the stream processing engine are only reflected once(提供精確的一次處理語義實際上意味著對由流處理引擎管理的操作符狀態(tài)的不同更新只反映一次). “Exactly-once” by no means guarantees that processing of an event, i.e. execution of arbitrary user-defined logic, will happen only once.
(確切地說拴袭,一次并不能保證事件的處理读第,即任意用戶定義邏輯的執(zhí)行只發(fā)生一次) Here at Streamlio, we prefer the term effectively once for this guarantee because processing is not necessarily guaranteed to occur once but the effect on the SPE-managed state is reflected once(在Streamlio中,對于這種保證拥刻,我們更傾向于使用“有效一次”這個術(shù)語怜瞒,因為處理并不一定保證只發(fā)生一次,但是對spe管理狀態(tài)的影響只反映一次). Two popular mechanisms, distributed snapshot and dessage deduplication, are used to implement exactly/effectively-once processing semantics. (兩種流行的機制般哼,即分布式快照和dessage重復數(shù)據(jù)刪除吴汪,用于實現(xiàn)精確/有效的一次處理語義)Both mechanisms provide the same semantic guarantees to message processing and state updates but there are nonetheless differences in performance(這兩種機制為消息處理和狀態(tài)更新提供了相同的語義保證惠窄,但在性能上仍然存在差異). This post is not meant to convince you that either mechanism is superior to the other, as each has its pros and cons.
References
1. Chandy, K. Mani and Leslie Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS) 3.1 (1985): 63-75.
2. Akidau, Tyler, et al. MillWheel: Fault-tolerant stream processing at internet scale. Proceedings of the VLDB Endowment 6.11 (2013): 1033-1044.