Abstract
Serverless computing is an excellent fit for big data processing because it can scale quickly and cheaply to thousands of parallel functions. Existing serverless platforms isolate functions in ephemeral, stateless containers, preventing them from directly sharing memory. This forces users to duplicate and serialize data repeatedly, adding unnecessary performance and resource costs. We believe that a new lightweight isolation approach is needed, which supports sharing memory directly between functions and reduces resource overheads.
We introduce Faaslets, a new isolation abstraction for high-performance serverless computing. Faaslets isolate the memory of executed functions using software-fault isolation (SFI), as provided by WebAssembly while allowing memory regions to be shared between functions in the same address space. Faaslets can thus avoid expensive data movement when functions are co-located on the same machine. Our run?time for Faaslets, FAASM, isolates other resources, e.g. CPU and network, using standard Linux cgroups, and provides a low-level POSIX host interface for networking, file system access, and dynamic loading. To reduce initialization times, FAASM restores Faaslets from already-initialized snapshots. We compare FAASM to a standard container-based platform and show that, when training a machine learning model, it achieves a 2× speed-up with 10× less memory; for serving machine learning inference, FAASM doubles the throughput and reduces tail latency by 90%.
無服務(wù)器計(jì)算非常適合大數(shù)據(jù)處理云矫,因?yàn)樗梢钥焖偾伊畠r(jià)地?cái)U(kuò)展到數(shù)千個并行功能“曷現(xiàn)有的無服務(wù)器平臺將功能隔離在短暫的無狀態(tài)容器中济竹,防止它們直接共享內(nèi)存阿纤。這迫使用戶重復(fù)復(fù)制和序列化數(shù)據(jù)寺酪,增加了不必要的性能和資源成本。我們認(rèn)為需要一種新的輕量級隔離方法逊桦,它支持直接在函數(shù)之間共享內(nèi)存并減少資源開銷粹舵。
我們介紹了 Faaslets,這是一種用于高性能無服務(wù)器計(jì)算的新隔離抽象戈泼。 Faaslet 使用 WebAssembly 提供的軟件故障隔離 (SFI) 來隔離已執(zhí)行函數(shù)的內(nèi)存婿禽,同時(shí)允許在同一地址空間中的函數(shù)之間共享內(nèi)存區(qū)域。因此大猛,當(dāng)功能位于同一臺機(jī)器上時(shí)扭倾,F(xiàn)aaslet 可以避免昂貴的數(shù)據(jù)移動。我們的 Faaslets 運(yùn)行時(shí) FAASM 隔離了其他資源挽绩,例如CPU 和網(wǎng)絡(luò)膛壹,使用標(biāo)準(zhǔn) Linux cgroups,并為網(wǎng)絡(luò)唉堪、文件系統(tǒng)訪問和動態(tài)加載提供低級 POSIX 主機(jī)接口模聋。為了減少初始化時(shí)間,F(xiàn)AASM 從已經(jīng)初始化的快照中恢復(fù) Faaslet唠亚。我們將 FAASM 與基于容器的標(biāo)準(zhǔn)平臺進(jìn)行了比較链方,結(jié)果表明,在訓(xùn)練機(jī)器學(xué)習(xí)模型時(shí)灶搜,它實(shí)現(xiàn)了 2 倍的加速侄柔,而內(nèi)存減少了 10 倍共啃;為了服務(wù)機(jī)器學(xué)習(xí)推理,F(xiàn)AASM 將吞吐量翻了一番暂题,并將尾部延遲減少了 90%。
Introduction
Serverless computing is becoming a popular way to deploy data-intensive applications. A function-as-a-service (FaaS) model decomposes computation into many functions, which can effectively exploit the massive parallelism of clouds. Prior work has shown how serverless can support map/reduce-style jobs [42, 69], machine learning training [17, 18] and inference [40], and linear algebra computation [73, 88]. As a result, an increasing number of applications, implemented in diverse programming languages, are being migrated to serverless platforms.
無服務(wù)器計(jì)算正在成為部署數(shù)據(jù)密集型應(yīng)用程序的流行方式究珊。 功能即服務(wù) (FaaS) 模型將計(jì)算分解為許多功能薪者,可以有效地利用云的大規(guī)模并行性。 之前的工作已經(jīng)展示了無服務(wù)器如何支持 map/reduce 式作業(yè) [42, 69]剿涮、機(jī)器學(xué)習(xí)訓(xùn)練 [17, 18] 和推理 [40] 以及線性代數(shù)計(jì)算 [73, 88]言津。 因此,越來越多以不同編程語言實(shí)現(xiàn)的應(yīng)用程序正在遷移到無服務(wù)器平臺取试。
Existing platforms such as Google Cloud Functions [32], IBM Cloud Functions [39], Azure Functions [50], and AWS Lambda [5] isolate functions in ephemeral, stateless containers. The use of containers as an isolation mechanism introduces two challenges for data-intensive applications, data access overheads, and the container resource footprint.
Google Cloud Functions [32]悬槽、IBM Cloud Functions [39]、Azure Functions [50] 和 AWS Lambda [5] 等現(xiàn)有平臺將函數(shù)隔離在短暫的無狀態(tài)容器中瞬浓。使用容器作為隔離機(jī)制給數(shù)據(jù)密集型應(yīng)用程序帶來了兩個挑戰(zhàn)初婆,數(shù)據(jù)訪問開銷和容器資源占用。
Data access overheads are caused by the stateless nature of the container-based approach, which forces states to be maintained externally, e.g. in object stores such as Amazon S3 [6], or passed between function invocations. Both options incur costs due to duplicate data in each function, repeated serialization, and regular network transfers. This results in current applications adopting an inefficient “data-shipping architecture”, i.e. moving data to the computation and not vice versa—such architectures have been abandoned by the data management community many decades ago [36]. These overheads are compounded as the number of functions increases, reducing the benefit of unlimited parallelism, which is what makes serverless computing attractive in the first place.
數(shù)據(jù)訪問開銷是由基于容器的方法的無狀態(tài)性質(zhì)引起的猿棉,它強(qiáng)制在外部維護(hù)狀態(tài)磅叛,例如在 Amazon S3 [6] 等對象存儲中,或在函數(shù)調(diào)用之間傳遞萨赁。由于每個函數(shù)中的重復(fù)數(shù)據(jù)弊琴、重復(fù)序列化和定期網(wǎng)絡(luò)傳輸,這兩個選項(xiàng)都會產(chǎn)生成本杖爽。這導(dǎo)致當(dāng)前的應(yīng)用程序采用低效的“數(shù)據(jù)傳輸架構(gòu)”敲董,即將數(shù)據(jù)移動到計(jì)算中,反之亦然——這種架構(gòu)在幾十年前就被數(shù)據(jù)管理社區(qū)拋棄了 [36]慰安。隨著函數(shù)數(shù)量的增加腋寨,這些開銷會變得更加復(fù)雜,從而減少了無限并行的好處泻帮,而這正是使無服務(wù)器計(jì)算具有吸引力的首要原因精置。
The container resource footprint is particularly relevant because of the high-volume and short-lived nature of serverless workloads. Despite containers having a smaller memory and CPU overhead than other mechanisms such as virtual machines (VMs), there remains an impedance mismatch between the execution of individual short-running functions and the process-based isolation of containers. Containers have start-up latencies in the hundreds of milliseconds to several seconds, leading to the cold-start problem in today’s serverless platforms [36, 83]. The large memory footprint of containers limits scalability—while technically capped at the process limit of a machine, the maximum number of containers is usually limited by the amount of available memory, with only a few thousand containers supported on a machine with 16 GB of RAM [51].
由于無服務(wù)器工作負(fù)載的高容量和短期特性,容器資源占用尤其重要锣杂。盡管容器比其他機(jī)制(例如虛擬機(jī) (VM))具有更小的內(nèi)存和 CPU 開銷脂倦,但在單個短期運(yùn)行功能的執(zhí)行與基于進(jìn)程的容器隔離之間仍然存在阻抗不匹配。容器的啟動延遲在數(shù)百毫秒到幾秒之間元莫,導(dǎo)致當(dāng)今無服務(wù)器平臺的冷啟動問題 [36, 83]赖阻。容器的大內(nèi)存占用限制了可擴(kuò)展性——雖然在技術(shù)上受限于機(jī)器的進(jìn)程限制,但容器的最大數(shù)量通常受可用內(nèi)存量的限制踱蠢,在具有 16 GB RAM 的機(jī)器上僅支持幾千個容器[51]火欧。
Current data-intensive serverless applications have addressed these problems individually but never solved both— instead, either exacerbating the container resource overhead or breaking the serverless model. Some systems avoid data movement costs by maintaining states in long-lived VMs or services, such as ExCamera [30], Shredder [92], and Cirrus [18], thus introducing non-serverless components. To address the performance overhead of containers, systems typically increase the level of trust in users’ code and weaken isolation guarantees. PyWren [42] reuses containers to execute multiple functions; Crucial [12] shares a single instance of the Java virtual machine (JVM) between functions; SAND [1] executes multiple functions in long-lived containers, which also run an additional message-passing service; and Cloudburst [75] takes a similar approach, introducing a local key-value-store cache. Provisioning containers to execute multiple functions and extra services amplifies resource overheads and breaks the fine-grained elastic scaling inherent to serverless. While several of these systems reduce data access overheads with local storage, none provide shared memory between functions, thus still requiring duplication of data in separate process memories.
當(dāng)前的數(shù)據(jù)密集型無服務(wù)器應(yīng)用程序單獨(dú)解決了這些問題棋电,但從未解決過這兩個問題——相反,要么加劇容器資源開銷苇侵,要么破壞無服務(wù)器模型赶盔。一些系統(tǒng)通過在長期存在的 VM 或服務(wù)中維護(hù)狀態(tài)來避免數(shù)據(jù)移動成本,例如 ExCamera [30]榆浓、Shredder [92] 和 Cirrus [18]于未,從而引入了非無服務(wù)器組件。為了解決容器的性能開銷陡鹃,系統(tǒng)通常會提高對用戶代碼的信任度并削弱隔離保證烘浦。 PyWren [42] 重用容器來執(zhí)行多個功能; Crucial [12] 在函數(shù)之間共享 Java 虛擬機(jī) (JVM) 的單個實(shí)例萍鲸; SAND [1] 在長壽命容器中執(zhí)行多個功能闷叉,這些容器還運(yùn)行額外的消息傳遞服務(wù); Cloudburst [75] 采用了類似的方法脊阴,引入了本地鍵值存儲緩存握侧。配置容器以執(zhí)行多個功能和額外服務(wù)會放大資源開銷并打破無服務(wù)器固有的細(xì)粒度彈性擴(kuò)展。雖然這些系統(tǒng)中有幾個通過本地存儲減少了數(shù)據(jù)訪問開銷蹬叭,但沒有一個提供功能之間的共享內(nèi)存藕咏,因此仍然需要在單獨(dú)的進(jìn)程內(nèi)存中復(fù)制數(shù)據(jù)。
Other systems reduce the container resource footprint by moving away from containers and VMs. Terrarium [28] and Cloudflare Workers [22] employ software-based isolation using WebAssembly and V8 Isolates, respectively; Krustlet [54] replicates containers using WebAssembly for memory safety, and SEUSS [16] demonstrates serverless unikernel. While these approaches have a reduced resource footprint, they do not address data access overheads, and the use of software-based isolation alone does not isolate resources.
其他系統(tǒng)通過遠(yuǎn)離容器和虛擬機(jī)來減少容器資源占用秽五。 Terrarium [28] 和 Cloudflare Workers [22] 分別使用 WebAssembly 和 V8 Isolates 采用基于軟件的隔離孽查; Krustlet [54] 使用 WebAssembly 復(fù)制容器以確保內(nèi)存安全,而 SEUSS [16] 演示了無服務(wù)器 unikernel坦喘。雖然這些方法減少了資源占用盲再,但它們沒有解決數(shù)據(jù)訪問開銷,并且僅使用基于軟件的隔離并不能隔離資源瓣铣。
We make the observation that serverless computing can better support data-intensive applications with a new isolation abstraction that (i) provides strong memory and resource isolation between functions, yet (ii) supports efficient state sharing. Data should be co-located with functions and accessed directly, minimizing data-shipping. Furthermore, this new isolation abstraction must (iii) allow scaling state across multiple hosts; (iv) has a low memory footprint, permitting many instances on one machine; (v) exhibit fast instantiation times; and (vi) support multiple programming languages to facilitate the porting of existing applications.
我們觀察到無服務(wù)器計(jì)算可以通過新的隔離抽象更好地支持?jǐn)?shù)據(jù)密集型應(yīng)用程序答朋,該抽象(i)在功能之間提供強(qiáng)大的內(nèi)存和資源隔離,但(ii)支持有效的狀態(tài)共享棠笑。 數(shù)據(jù)應(yīng)與功能位于同一位置并直接訪問梦碗,從而最大限度地減少數(shù)據(jù)傳輸。 此外蓖救,這種新的隔離抽象必須 (iii) 允許跨多個主機(jī)擴(kuò)展?fàn)顟B(tài)洪规; (iv) 內(nèi)存占用低,允許在一臺機(jī)器上運(yùn)行多個實(shí)例循捺; (v) 表現(xiàn)出快速的實(shí)例化時(shí)間斩例; (vi) 支持多種編程語言,以方便現(xiàn)有應(yīng)用程序的移植从橘。
In this paper, we describe Faaslets, a new lightweight isolation abstraction for data-intensive serverless computing. Faaslets support stateful functions with efficient shared memory access and are executed by our FAASM distributed serverless runtime. Faaslets have the following properties, summarising our contributions:
在本文中念赶,我們描述了 Faaslets础钠,這是一種用于數(shù)據(jù)密集型無服務(wù)器計(jì)算的新型輕量級隔離抽象。 Faaslet 支持具有高效共享內(nèi)存訪問的有狀態(tài)功能叉谜,并由我們的 FAASM 分布式無服務(wù)器運(yùn)行時(shí)執(zhí)行旗吁。 Faaslets 具有以下屬性,總結(jié)了我們的貢獻(xiàn):
(1) Faaslets achieve lightweight isolation. Faaslets rely on software fault isolation (SFI) [82], which restricts functions to access their memory. A function associated with a Faaslet, together with its library and language runtime dependencies, is compiled to WebAssembly [35]. The FAASM runtime then executes multiple Faaslets, each with a dedicated thread, within a single address space. For resource isolation, the CPU cycles of each thread are constrained using Linux cgroups [79] and network access is limited using network namespaces [79] and traffic shaping. Many Faaslets can be executed efficiently and safely on a single machine.
(2) Faaslets support efficient local/global state access. Since Faaslets share the same address space, they can access shared memory regions with local states efficiently. This allows the co-location of data and functions and avoids serialization overheads. Faaslets use a two-tier state architecture, a local tier provides in-memory sharing, and a global tier supports distributed access to states across hosts. The FAASM runtime provides a state management API to Faaslets that gives fine-grained control over the state in both tiers. Faaslets also support stateful applications with different consistency requirements between the two tiers.
(3) Faaslets have fast initialization times. To reduce cold-start time when a Faaslet executes for the first time, it is launched from a suspended state. The FAASM run?time pre-initializes a Faaslet ahead of time and snapshots its memory to obtain a Proto-Faaslet, which can be restored in hundreds of microseconds. Proto-Faaslets are used to create fresh Faaslet instances quickly, e.g. avoiding the time to initialize a language runtime. While existing work on snapshots for serverless takes a single-machine approach [1, 16, 25, 61], Proto-Faaslets support cross-host restores and are OS-independent.
(4) Faaslets support a flexible host interface. Faaslets interact with the host environment through a set of POSIX-like calls for networking, file I/O, global state access, and library loading/linking. This allows them to support dynamic language runtimes and facilitates the porting of existing applications, such as CPython by changing fewer than 10 lines of code. The host interface provides just enough virtualization to ensure isolation while adding a negligible overhead.
The FAASM runtime1 uses the LLVM compiler toolchain to translate applications to WebAssembly and supports functions written in a range of programming languages, including C/C++, Python, Typescript, and Javascript. It integrates with existing serverless platforms, and we describe the use with Knative [33], a state-of-the-art platform based on Kubernetes.
To evaluate FAASM’s performance, we consider a number of workloads and compare them to a container-based serverless deployment. When training a machine learning model with SGD [68], we show that FAASM achieves a 60% improvement in run time, a 70% reduction in network transfers, and a 90% reduction in memory usage; for machine learning inference using TensorFlow Lite [78] and MobileNet [37], FAASM achieves over a 200% increase in maximum throughput and a 90% reduction in tail latency. We also show that FAASM executes a distributed linear algebra job for matrix multiplication using Python/Numpy with negligible performance overhead and a 13% reduction in network transfers.
(1) Faaslets實(shí)現(xiàn)輕量級隔離停局。 Faaslets 依賴于軟件故障隔離 (SFI) [82]阵漏,它限制函數(shù)訪問它們自己的內(nèi)存。與 Faaslet 關(guān)聯(lián)的函數(shù)及其庫和語言運(yùn)行時(shí)依賴項(xiàng)被編譯為 WebAssembly [35]翻具。 FAASM 運(yùn)行時(shí)然后在單個地址空間內(nèi)執(zhí)行多個 Faaslet,每個 Faaslet 都有一個專用線程回还。對于資源隔離裆泳,每個線程的 CPU 周期使用 Linux cgroups [79] 進(jìn)行限制,并且使用網(wǎng)絡(luò)命名空間 [79] 和流量整形來限制網(wǎng)絡(luò)訪問柠硕。許多 Faaslet 可以在一臺機(jī)器上高效工禾、安全地執(zhí)行。
(2) Faaslets 支持高效的本地/全局狀態(tài)訪問蝗柔。由于 Faaslet 共享相同的地址空間闻葵,因此它們可以有效地訪問具有本地狀態(tài)的共享內(nèi)存區(qū)域。這允許數(shù)據(jù)和函數(shù)的共存并避免序列化開銷癣丧。 Faaslets 使用兩層狀態(tài)架構(gòu)槽畔,本地層提供內(nèi)存共享,全局層支持跨主機(jī)分布式訪問狀態(tài)胁编。 FAASM 運(yùn)行時(shí)為 Faaslets 提供了一個狀態(tài)管理 API厢钧,可以對兩個層中的狀態(tài)進(jìn)行細(xì)粒度控制。 Faaslet 還支持在兩層之間具有不同一致性要求的有狀態(tài)應(yīng)用程序嬉橙。
(3) Faaslet 具有快速的初始化時(shí)間早直。為了減少 Faaslet 第一次執(zhí)行時(shí)的冷啟動時(shí)間,它從掛起狀態(tài)啟動市框。 FAASM 運(yùn)行時(shí)會提前預(yù)初始化 Faaslet 并對其內(nèi)存進(jìn)行快照以獲得 Proto-Faaslet霞扬,該原始 Faaslet 可以在數(shù)百微秒內(nèi)恢復(fù)。 Proto-Faaslet 用于快速創(chuàng)建新的 Faaslet 實(shí)例枫振,例如避免初始化語言運(yùn)行時(shí)的時(shí)間喻圃。雖然無服務(wù)器快照的現(xiàn)有工作采用單機(jī)方法 [1, 16, 25, 61],但 Proto-Faaslets 支持跨主機(jī)恢復(fù)并且獨(dú)立于操作系統(tǒng)蒋得。
(4) Faaslets 支持靈活的主機(jī)接口级及。 Faaslet 通過一組類似 POSIX 的網(wǎng)絡(luò)調(diào)用、文件 I/O额衙、全局狀態(tài)訪問和庫加載/鏈接與主機(jī)環(huán)境交互饮焦。這使它們能夠支持動態(tài)語言運(yùn)行時(shí)怕吴,并通過更改少于 10 行的代碼來促進(jìn)現(xiàn)有應(yīng)用程序的移植,例如 CPython县踢。主機(jī)接口提供足夠的虛擬化以確保隔離转绷,同時(shí)增加可忽略不計(jì)的開銷。
FAASM 運(yùn)行時(shí) 1 使用 LLVM 編譯器工具鏈將應(yīng)用程序轉(zhuǎn)換為 WebAssembly硼啤,并支持使用多種編程語言編寫的函數(shù)议经,包括 C/C++、Python谴返、Typescript 和 Javascript煞肾。它與現(xiàn)有的無服務(wù)器平臺集成,我們描述了使用 Knative [33]嗓袱,這是一個基于 Kubernetes 的最先進(jìn)平臺籍救。
為了評估 FAASM 的性能渠抹,我們考慮了許多工作負(fù)載,并將它們與基于容器的無服務(wù)器部署進(jìn)行了比較奇颠。在使用 SGD [68] 訓(xùn)練機(jī)器學(xué)習(xí)模型時(shí),我們表明 FAASM 實(shí)現(xiàn)了 60% 的運(yùn)行時(shí)間改進(jìn)放航、70% 的網(wǎng)絡(luò)傳輸減少以及 90% 的內(nèi)存使用量減少;對于使用 TensorFlow Lite [78] 和 MobileNet [37] 的機(jī)器學(xué)習(xí)推理三椿,F(xiàn)AASM 的最大吞吐量增加了 200% 以上,尾部延遲減少了 90%搜锰。我們還展示了 FAASM 使用 Python/Numpy 執(zhí)行矩陣乘法的分布式線性代數(shù)作業(yè)伴郁,性能開銷可忽略不計(jì),網(wǎng)絡(luò)傳輸減少 13%焊傅。
2 Isolation vs. Sharing in Serverless
Sharing memory is fundamentally at odds with the goal of isolation, hence providing shared access to in-memory states in a multi-tenant serverless environment is a challenge.
共享內(nèi)存從根本上與隔離目標(biāo)不一致狈涮,因此在多租戶無服務(wù)器環(huán)境中提供對內(nèi)存中狀態(tài)的共享訪問是一個挑戰(zhàn)。
Table. 1 contrasts containers and VMs with other potential serverless isolation options, namely unikernels [16] in which minimal VM images are used to pack tasks densely on a hypervisor and software-fault isolation (SFI) [82], providing lightweight memory safety through static analysis, instrumentation and runtime traps. The table lists whether each fulfills three key functional requirements: memory safety, resource isolation, and sharing of in-memory state. A fourth requirement is the ability to share a filesystem between functions, which is important for legacy code and to reduce duplication with shared files.
Table 1 將容器和 VM 與其他潛在的無服務(wù)器隔離選項(xiàng)進(jìn)行對比握巢,即 unikernels [16]松却,其中使用最少的 VM 映像在管理程序和軟件故障隔離 (SFI) [82] 上密集打包任務(wù)溅话,通過靜態(tài)分析提供輕量級內(nèi)存安全 歌焦、檢測和運(yùn)行時(shí)陷阱。 該表列出了每個功能是否滿足三個關(guān)鍵功能要求:內(nèi)存安全独撇、資源隔離和內(nèi)存狀態(tài)共享。 第四個要求是能夠在函數(shù)之間共享文件系統(tǒng)卵史,這對于遺留代碼和減少共享文件的重復(fù)很重要搜立。
The table also compares these options on a set of nonfunctional requirements: low initialization time for fast elasticity; small memory footprint for scalability and efficiency, and the support for a range of programming languages.
該表還根據(jù)一組非功能性要求比較了這些選項(xiàng):快速彈性的低初始化時(shí)間; 可擴(kuò)展性和效率的小內(nèi)存占用,以及對一系列編程語言的支持色鸳。
Containers offer an acceptable balance of features if one sacrifices efficient state sharing—as such they are used by many serverless platforms [32, 39, 50]. Amazon uses Firecracker [4], a “micro VM” based on KVM with similar properties to containers, e.g. initialization times in the hundreds of milliseconds and memory overheads of megabytes.
如果犧牲了高效的狀態(tài)共享,容器可以提供可接受的功能平衡——因此蒜哀,許多無服務(wù)器平臺都在使用它們 [32, 39, 50]。亞馬遜使用 Firecracker [4]吏砂,這是一種基于 KVM 的“微型 VM”撵儿,具有與容器類似的屬性,例如數(shù)百毫秒的初始化時(shí)間和兆字節(jié)的內(nèi)存開銷狐血。
Containers and VMs compare poorly to unikernels and SFI on initialization times and memory footprint because of their level of virtualization. They both provide complete virtualized POSIX environments, and VMs also virtualize hardware. Unikernels minimize their levels of virtualization, while SFI provides none. Many unikernel implementations, however, lack the maturity required for production serverless platforms, e.g. missing the required tooling and a way for non-expert users to deploy custom images. SFI alone cannot provide resource isolation, as it purely focuses on memory safety. It also does not define a way to perform isolated interactions with the underlying host. Crucially, as with containers and VMs, neither unikernels nor SFI can share state efficiently, with no way to express shared memory regions between compartments.
由于虛擬化級別淀歇,容器和 VM 在初始化時(shí)間和內(nèi)存占用方面與 unikernel 和 SFI 相比較差。它們都提供完整的虛擬化 POSIX 環(huán)境浪默,并且 VM 還可以虛擬化硬件缀匕。 Unikernel 將虛擬化級別降至最低,而 SFI 則沒有谋梭。然而,許多 unikernel 實(shí)現(xiàn)缺乏生產(chǎn)無服務(wù)器平臺所需的成熟度,例如缺少必需的工具和非專家用戶部署自定義映像的方法兰英。 SFI 本身不能提供資源隔離厌蔽,因?yàn)樗兇怅P(guān)注內(nèi)存安全摔癣。它也沒有定義與底層主機(jī)執(zhí)行隔離交互的方法。至關(guān)重要的是戴卜,與容器和 VM 一樣琢岩,unikernel 和 SFI 都不能有效地共享狀態(tài),無法在隔間之間表達(dá)共享內(nèi)存區(qū)域江锨。
2.1 Improving on Containers
Serverless functions in containers typically share state via external storage and duplicate data across function instances. Data access and serialization introduces network and compute overheads; duplication bloats the memory footprint of containers, already of the order of megabytes [51]. Containers contribute hundreds of milliseconds up to seconds in cold-start latencies [83], incurred on initial requests and when scaling. Existing work has tried to mitigate these drawbacks by recycling containers between functions, introducing static VMs, reducing storage latency, and optimizing initialization.
容器中的無服務(wù)器功能通常通過外部存儲共享狀態(tài)并跨功能實(shí)例復(fù)制數(shù)據(jù)糕篇。數(shù)據(jù)訪問和序列化引入了網(wǎng)絡(luò)和計(jì)算開銷拌消;重復(fù)使容器的內(nèi)存占用膨脹,已經(jīng)達(dá)到兆字節(jié) [51]墩崩。在初始請求和擴(kuò)展時(shí)鹦筹,容器在冷啟動延遲 [83] 中貢獻(xiàn)了數(shù)百毫秒到幾秒。現(xiàn)有工作試圖通過在功能之間回收容器饰迹、引入靜態(tài)虛擬機(jī)余舶、減少存儲延遲和優(yōu)化初始化來緩解這些缺點(diǎn)。
Recycling containers avoid initialization overheads and allow data caching but sacrifices isolation and multi-tenancy. PyWren [42] and its descendants, Numpywren [73], IBMPy-wren [69], and Locus [66] use recycled containers, with long-lived AWS Lambda functions that dynamically load and execute Python functions. Crucial [12] takes a similar approach, running multiple functions in the same JVM. SAND [1] and Cloudburst [75] provide only process isolation between functions of the same application and place them in shared long-running containers, with at least one additional background storage process. Using containers for multiple functions and supplementary long-running services requires over-provisioned memory to ensure capacity both for concurrent executions and for peak usage. This is at odds with the idea of fine-grained scaling in serverless.
回收容器避免了初始化開銷并允許數(shù)據(jù)緩存赠制,但犧牲了隔離和多租戶。 PyWren [42] 及其后代 Numpywren [73]烟号、IBMPy-wren [69] 和 Locus [66] 使用回收的容器政恍,以及可動態(tài)加載和執(zhí)行 Python 函數(shù)的長壽命 AWS Lambda 函數(shù)。 Crucial [12] 采用了類似的方法迫筑,在同一個 JVM 中運(yùn)行多個函數(shù)宗弯。 SAND [1] 和 Cloudburst [75] 僅在同一應(yīng)用程序的功能之間提供進(jìn)程隔離蒙保,并將它們放置在共享的長時(shí)間運(yùn)行的容器中,至少有一個額外的后臺存儲進(jìn)程邓厕。將容器用于多種功能和補(bǔ)充的長期運(yùn)行服務(wù)需要超額配置內(nèi)存详恼,以確保并發(fā)執(zhí)行和峰值使用的容量。這與無服務(wù)器中細(xì)粒度擴(kuò)展的想法不一致。
Adding static VMs to handle external storage improves performance but breaks the serverless paradigm. Cirrus [18] uses large VM instances to run a custom storage backend; Shredder [92] uses a single long-running VM for both storage and function execution; ExCamera [30] uses long-running VMs to coordinate a pool of functions. Either the user or provider must scale these VMs to match the elasticity and parallelism of functions, which adds complexity and cost.
添加靜態(tài) VM 來處理外部存儲可提高性能她紫,但打破了無服務(wù)器模式贿讹。 Cirrus [18] 使用大型 VM 實(shí)例來運(yùn)行自定義存儲后端; Shredder [92] 使用單個長時(shí)間運(yùn)行的 VM 進(jìn)行存儲和功能執(zhí)行茄菊; ExCamera [30] 使用長時(shí)間運(yùn)行的 VM 來協(xié)調(diào)功能池赊堪。用戶或提供商必須擴(kuò)展這些 VM 以匹配功能的彈性和并行性,這會增加復(fù)雜性和成本脊僚。
Reducing the latency of auto-scaled storage can improve performance within the serverless paradigm. Pocket [43] provides ephemeral serverless storage; other cloud providers offer managed external states, such as AWS Step Functions [3], Azure Durable Functions [53], and IBM Composer [8]. Such approaches, however, do not address the data-shipping problem and its associated network and memory overheads.
減少自動擴(kuò)展存儲的延遲可以提高無服務(wù)器范例中的性能遵绰。 Pocket [43] 提供短暫的無服務(wù)器存儲增淹;其他云提供商提供托管的外部狀態(tài)虑润,例如 AWS Step Functions [3]加酵、Azure Durable Functions [53] 和 IBM Composer [8]。然而舞蔽,這些方法并沒有解決數(shù)據(jù)傳輸問題及其相關(guān)的網(wǎng)絡(luò)和內(nèi)存開銷码撰。
Container initialization times have been reduced to mitigate the cold-start problem, which can contribute several seconds of latency with standard containers [36, 72, 83]. SOCK [61] improves the container boot process to achieve cold starts in the low hundreds of milliseconds; Catalyzer [25] and SEUSS [16] demonstrate snapshot and restore in VMs and unikernels to achieve millisecond serverless cold starts. Although such reductions are promising, the resource overhead and restrictions on sharing memory in the underlying mechanisms still remain.
容器初始化時(shí)間已減少以緩解冷啟動問題,冷啟動問題可能會導(dǎo)致標(biāo)準(zhǔn)容器有幾秒的延遲 [36, 72, 83]朵栖。 SOCK [61] 改進(jìn)了容器啟動過程柴梆,實(shí)現(xiàn)了幾百毫秒的冷啟動绍在; Catalyzer [25] 和 SEUSS [16] 演示了 VM 和 unikernel 中的快照和恢復(fù),以實(shí)現(xiàn)毫秒級無服務(wù)器冷啟動臼寄。盡管這種減少是有希望的溜宽,但底層機(jī)制中的資源開銷和共享內(nèi)存的限制仍然存在。
2.2 Potential of Software-based Isolation
Software-based isolation offers memory safety with initialization times and memory overheads up to two orders of magnitude lower than containers and VMs. For this reason, it is an attractive starting point for serverless isolation. However, software-based isolation alone does not support resource isolation or efficient in-memory state sharing.
基于軟件的隔離提供內(nèi)存安全留攒,初始化時(shí)間和內(nèi)存開銷比容器和虛擬機(jī)低兩個數(shù)量級嫉嘀。因此,它是無服務(wù)器隔離的一個有吸引力的起點(diǎn)汤善。但是,僅基于軟件的隔離不支持資源隔離或高效的內(nèi)存中狀態(tài)共享不狮。
It has been used in several existing edge and serverless computing systems, but none address these shortcomings. Fastly’s Terrarium [28] and Cloudflare Workers [22] provide memory safety with WebAssembly [35] and V8 Isolates [34], respectively, but neither isolates CPU or network use, and both rely on data shipping for state access; Shredder [92] also uses V8 Isolates to run code on a storage server, but does not address resource isolation, and relies on co-locating state and functions on a single host. This makes it ill-suited to the level of scale required in serverless platforms; Boucher et al. [14] show microsecond initialization times for Rust microservices, but do not address isolation or state sharing; Krustlet [54] is a recent prototype using WebAssembly to replace Docker in Kubernetes, which could be integrated with Knative [33]. It focuses, however, on replicating container-based isolation, and so fails to meet our requirement for in-memory sharing.
它已用于幾個現(xiàn)有的邊緣和無服務(wù)器計(jì)算系統(tǒng)摇零,但沒有一個解決這些缺點(diǎn)桶蝎。 Fastly 的 Terrarium [28] 和 Cloudflare Workers [22] 分別通過 WebAssembly [35] 和 V8 Isolates [34] 提供內(nèi)存安全,但都沒有隔離 CPU 或網(wǎng)絡(luò)使用噪服,并且都依賴數(shù)據(jù)傳輸進(jìn)行狀態(tài)訪問胜茧; Shredder [92] 也使用 V8 Isolates 在存儲服務(wù)器上運(yùn)行代碼,但沒有解決資源隔離問題雹顺,并且依賴于在單個主機(jī)上并置狀態(tài)和功能廊遍。這使得它不適合無服務(wù)器平臺所需的規(guī)模水平;布歇等人没酣。 [14] 顯示了 Rust 微服務(wù)的微秒初始化時(shí)間卵迂,但沒有解決隔離或狀態(tài)共享問題; Krustlet [54] 是最近使用 WebAssembly 替換 Kubernetes 中的 Docker 的原型,它可以與 Knative [33] 集成论颅。然而囱嫩,它側(cè)重于復(fù)制基于容器的隔離,因此無法滿足我們對內(nèi)存共享的要求今妄。
Our final non-functional requirement is for multi-language support, which is not met by language-specific approaches to software-based isolation [11, 27]. Portable Native Client [23] provides multi-language software-based isolation by targeting a portable intermediate representation, LLVM IR, and hence meets this requirement. Portable Native Client has now been deprecated, with WebAssembly as its successor [35].
我們的最后一個非功能性需求是多語言支持,這是基于軟件的隔離的特定語言方法無法滿足的 [11, 27]犬性。 Portable Native Client [23] 通過針對可移植的中間表示 LLVM IR 提供基于多語言軟件的隔離腾仅,因此滿足此要求。 Portable Native Client 現(xiàn)在已被棄用鹤耍,WebAssembly 作為其繼任者 [35]验辞。
WebAssembly offers strong memory safety guarantees by constraining memory access to a single linear byte array, referenced with offsets from zero. This enables efficient bounds checking at both compile- and runtime, with runtime checks backed by traps. These traps (and others for referencing invalid functions) are implemented as part of WebAssembly runtimes [87]. The security guarantees of WebAssembly are well established in existing literature, which covers formal verification [84], taint tracking [31], and dynamic analysis [45]. WebAssembly offers mature support for languages with an LLVM front-end such as C, C++, C#, Go, and Rust [49], while toolchains exist for Typescript [10] and Swift [77]. Java bytecode can also be converted [7], and further language support is possible by compiling language runtimes to WebAssembly, e.g. Python, JavaScript, and Ruby. Although WebAssembly is restricted to a 32-bit address space, 64-bit support is in development.
WebAssembly 通過限制對單個線性字節(jié)數(shù)組的內(nèi)存訪問來提供強(qiáng)大的內(nèi)存安全保證跌造,引用從零開始的偏移量。這可以在編譯和運(yùn)行時(shí)進(jìn)行有效的邊界檢查财著,運(yùn)行時(shí)檢查由陷阱支持撑碴。這些陷阱(以及其他用于引用無效函數(shù)的陷阱)是作為 WebAssembly 運(yùn)行時(shí)的一部分實(shí)現(xiàn)的 [87]。 WebAssembly 的安全保證在現(xiàn)有文獻(xiàn)中已經(jīng)很好地建立伟姐,其中包括形式驗(yàn)證 [84]亿卤、污點(diǎn)跟蹤 [31] 和動態(tài)分析 [45]。 WebAssembly 為具有 LLVM 前端的語言(如 C秆乳、C++钻哩、C#街氢、Go 和 Rust [49])提供成熟的支持,而 Typescript [10] 和 Swift [77] 的工具鏈存在珊肃。 Java 字節(jié)碼也可以被轉(zhuǎn)換 [7],并且可以通過將語言運(yùn)行時(shí)編譯為 WebAssembly 來進(jìn)一步支持語言厉亏,例如Python、JavaScript 和 Ruby阱飘。盡管 WebAssembly 僅限于 32 位地址空間虱颗,但 64 位支持正在開發(fā)中忘渔。
The WebAssembly specification does not yet include mechanisms for sharing memory, therefore it alone cannot meet our requirements. There is a proposal to add a form of synchronized shared memory to WebAssembly [85], but it is not well suited to sharing serverless states dynamically due to the required compile-time knowledge of all shared regions. It also lacks an associated programming model and provides only local memory synchronization.
WebAssembly 規(guī)范尚未包含共享內(nèi)存的機(jī)制,因此僅靠它無法滿足我們的要求散址。有一種建議將一種形式的同步共享內(nèi)存添加到 WebAssembly [85]宣赔,但由于需要所有共享區(qū)域的編譯時(shí)知識,它不太適合動態(tài)共享無服務(wù)器狀態(tài)吏祸。它還缺乏相關(guān)的編程模型钩蚊,僅提供本地內(nèi)存同步。
The properties of software-based isolation highlight a compelling alternative to containers, VMs, and unikernels, but none of these approaches meet all of our requirements. We, therefore, propose a new isolation approach to enable efficient serverless computing for big data.
基于軟件的隔離的特性突出了容器鸣驱、VM 和 unikernel 的一個引人注目的替代方案踊东,但這些方法都不能滿足我們的所有要求刚操。因此,我們提出了一種新的隔離方法缎脾,以實(shí)現(xiàn)大數(shù)據(jù)的高效無服務(wù)器計(jì)算占卧。