從C++和OCaml程序員的視角看Rust (Part 2)

本文來(lái)源地址

SEPTEMBER 29, 2017

by GUILLAUME ENDIGNOUX
@GEndignoux

This post is the second of my series about Rust compared to other languages. After a first post about the basics of the language, I’ll now discuss memory management, a cornerstone of Rust’s safety guarantees. You will hopefully get a clear idea about how moves, borrows and other lifetimes work!

Despite the title, in this post, I will mostly compare Rust to C++, because there isn’t much memory management to do in OCaml, which is garbage-collected.


Allocation and memory access

Memory management is a core aspect of programming. The fundamental reason is that most complex algorithms require a variable amount of memory, depending on their input. Therefore, we obtain such memory at runtime with allocators. Most of the time, we also want to deallocate this memory once we are done using it, to avoid exhausting memory resources (otherwise we have a memory leak). But we must make sure that we do not access the memory again after deallocation, because this memory may have been reallocated for other unrelated objects (use-after-free bug).

Several strategies have been implemented to manage memory resources.

  • In C, memory management is manual with the malloc() and free() functions. This gives full control to the programmer, but the code is tedious and repetitive, and a single mistake can lead to (security) bugs (memory leak, use-after-free, double-free, etc.).
  • In C++, the <abbr title="Resource acquisition is initialization" style="box-sizing: border-box; margin: 0px; padding: 0px; border-bottom: 1px dotted; cursor: help;">RAII</abbr> paradigm automates memory (and more generally resources) management with constructors and destructors. If destructors are properly implemented, the resources acquired by a variable are released as soon as the variable goes out-of-scope. The destructing code is automatically added by the compiler, and this ensures that neither memory leaks nor use-after-freehappen.
  • In most modern languages (e.g. Java, Python, OCaml), a garbage collector (<abbr title="Garbage collector" style="box-sizing: border-box; margin: 0px; padding: 0px; border-bottom: 1px dotted; cursor: help;">GC</abbr>) runs from time to time and looks for unreachable memory by traversing all allocated memory.

Although garbage collection avoids pitfalls of manual memory management and is easy to use for the programmer (there is essentially nothing to do), it introduces some overhead (due to memory traversal) and unpredictability. There is no way to know when the <abbr title="Garbage collector" style="box-sizing: border-box; margin: 0px; padding: 0px; border-bottom: 1px dotted; cursor: help;">GC</abbr> will run. But every time it runs there is some delay before the program resumes, which can be problematic for highly-interactive scenarios. Also, <abbr title="Garbage collector" style="box-sizing: border-box; margin: 0px; padding: 0px; border-bottom: 1px dotted; cursor: help;">GC</abbr> only cares about memory, so other resources (e.g. files, network connections) must be managed by other means, e.g. back with the manual strategy, or with finalizers.

Note for the curious: In OCaml, the native integer type has one bit reserved by the <abbr title="Garbage collector" style="box-sizing: border-box; margin: 0px; padding: 0px; border-bottom: 1px dotted; cursor: help;">GC</abbr>, to avoid wrapping the value into a box. So int is a 63-bit integer (on 64-bit architectures), i.e. in range <math><semantics><annotation encoding="application/x-tex">[-2^{62}, 2^{62}-1]</annotation></semantics></math>[?262,262?1].

Another problem is unsafe access to memory. In C (and C++), because you have pointers with unrestricted arithmetic operations, you can (accidentally) access unallocated memory. This often boils down to access an array index i larger than the array’s length l, i.e. an out-of-bound access (or buffer overflow). This can be avoided at the cost of a comparison i < l, as implemented in C++’s std::vector::at() function. In OCaml, there are no pointers, and array access always checks bounds, so you don’t have these safety problems.

In Rust, the <abbr title="Resource acquisition is initialization" style="box-sizing: border-box; margin: 0px; padding: 0px; border-bottom: 1px dotted; cursor: help;">RAII</abbr> strategy of allocation is chosen for its predictability and performance properties, and its safety is enhanced compared to C++. There is no direct access to pointers (unless you explicitly write unsafe code), and we will now see that safe access is enforced by the compiler thanks to more static checks!

Ownership

Rust goes beyond <abbr title="Resource acquisition is initialization" style="box-sizing: border-box; margin: 0px; padding: 0px; border-bottom: 1px dotted; cursor: help;">RAII</abbr>, by introducing ownership rules that are enforced statically at compile time.

Some objects are not copyable, which means that only one variable can have ownership of a given object. This is, for example, the case of heap-allocated objects (the Box type), that implement deallocation in their destructor. Indeed, if such an object could be copied verbatim, then each copy would call the deallocation and we would obtain a double-free bug.

Instead, ownership can be transferred via a move. This is similar to std::move() in C++11, but in Rust it’s automatic and the compiler prevents access to the moved-from variable.

// C++14
auto a = std::make_unique<int>(1);
// The compiler forbids a copy "b = a", we need to move explicitly.
auto b = std::move(a);
std::cout << "b = " << *b << std::endl;
// The following is accepted by the compiler, despite "a" being in an unspecified state at runtime.
std::cout << "a = " << *a << std::endl;

// Rust
let a = Box::new(1);
// The box is moved from "a" to "b" and ownership is transfered.
let b = a;
println!("b = {}", b);
// This does not compile because "a" has lost ownership.
println!("a = {}", a);
// error[E0382]: use of moved value: `a`
//  --> src/main.rs:7:24
//   |
// 4 |     let b = a;
//   |         - value moved here
// ...
// 7 |     println!("a = {}", a);
//   |                        ^ value used here after move
//   |
//   = note: move occurs because `a` has type `std::boxed::Box<i32>`, which does not implement the `Copy` trait

Although transferring ownership is safe, we often want to have several variables pointing to the same object, in particular for function calls. Typically, the calling function wants to give access to some data to the called function (via an argument), but needs to access it after the call, so transferring ownership to the called function does not work.

For this purpose, Rust introduces a concept of references, that borrow objects from their owner. Similarly to C/C++, a reference can be obtained with the & operator, and the corresponding reference type is &T.

// Type annotations are optional, written here for clarity.
let a : i32 = 1;
let b : &i32 = &a;
println!("a = {}", a);
println!("b = {}", b);

Borrows and mutability

By default, Rust variables are immutable. If you want to borrow a variable to modify it, you need to use a mutable borrow, as follows.

// The original variable must be mutable to obtain a mutable borrow
let mut a = 1;
{
    let b = &mut a; // Mutable borrow
    *b += 1; // Use "*" to dereference
}
println!("a = {}", a);
// a = 2

Here, you can notice that we used a temporary scope to operate on b. This is because Rust has strict rules about borrows, enforced by the so-called borrow checker. These rules can be formalized as follows.

  • (Paraphrased from the Rust book) You may have one or the other of these two kinds of borrows, but not both at the same time:
    • one or more immutable references (&T) to a resource,
    • exactly one mutable reference (&mut T).
  • When a variable is mutably borrowed, it cannot be read. Likewise, when a variable is (mutably or immutably) borrowed, it cannot be modified.

To summarize, as long as a variable is mutably borrowed, nothing else can access this variable. This rules may seem strict but are here to avoid race-conditions. Indeed, ? threads without data races ? is one of the main goals of Rust.

But these rules also avoid data races in single-threaded programs! For example, in C++ you can obtain a reference to a vector cell, and then resize this vector, obtaining a dangling reference…

// C++
std::vector<int> v(10, 0); // Vector of length 10.
int& x = v[9];
v.resize(5); // The resize invalidates reference "x".
// Undefined behavior.
std::cout << "x = " << x << std::endl;

In Rust, this would violate the third rule: v cannot be modified as long as reference x is in scope. This example might seem obvious, but if you hide references behind iterators and wrap this code in a more complex program, it becomes hard to debug in practice!

Warning: Despite the name and the type syntax, references in Rust really are safe pointers. Contrary to C++ references, they can be rebound to another object, as long as they are declared mutable (mut). Note that this is orthogonal to a mutable borrow, for which the referred-to object is mutable.

let a = 1;
let b = 2;
let mut r = &a;
println!("r = {}", r);
r = &b;
println!("r = {}", r);
// r = 1
// r = 2

Lifetimes

Beyond the basic rules on mutability and borrows, Rust introduces the concept of lifetimes, to avoid dangling references. A typical example of dangling reference is to return a reference to a temporary.

fn bad() -> &i32 {
    let a : i32 = 1;
    return &a;
}
// error[E0106]: missing lifetime specifier
//  --> src/main.rs:1:13
//   |
// 1 | fn bad() -> &i32 {
//   |             ^ expected lifetime parameter
//   |

This does not compile, and Rust requests a ? lifetime specifier ? for the return type. Returning a reference is not forbidden in itself, and there are indeed valid reasons to do it. For example, you may want to access an array cell or some internal data in an object, to modify it or to read it without an expensive copy.

Let’s take a more concrete example with a Point structure over integers, and define a getx() function that returns a reference to the x coordinate (i.e. takes an argument of type &Point and returns a &i32). The returned reference to the x coordinate will logically have the same lifetime as the Point parameter because x is part of the point. But we don’t know exactly what this lifetime is, and in fact, this varies for each call to the function getx(), depending on the parameter. So we use a generic lifetime 'a (or any word preceded by an apostrophe) and write the function as follows.

#[derive(Debug)]
struct Point { x: i32, y: i32 }

fn getx<'a>(p : &'a Point) -> &'a i32 {
    return &p.x;
}

fn main() {
    let p = Point {x: 1, y: 2};
    let px = getx(&p);
    println!("p.x = {}", px);
    println!("p = {:?}", p);
}
// p.x = 1
// p = Point { x: 1, y: 2 }

Here, the diamond syntax getx<'a> means that getx() is parametrized by a generic lifetime 'a. This syntax is similar to templates in C++. Then, &'a Point represents a reference to Point which has lifetime 'a, and the result &'a i32 is a reference to i32 which has the same lifetime 'a.

In fact, I lied a bit to you because in practice you don’t need to explicit lifetimes for simple functions such as getx(). This is due to lifetime elision rules, that allows the programmer to remove lifetime annotations in non-ambiguous cases, where the compiler can figure them out itself. For example, if there is only one reference parameter, a reference result can only come from this parameter (reference to temporaries have an invalid lifetime), so the compiler infers that they share a common lifetime 'a. So you can rewrite getx() as follows.

fn getx(p : &Point) -> &i32 {
    return &p.x;
}

Lifetimes in structures

Thanks to elision rules, you probably won’t have to write lifetimes in functions, unless you write complex stuff. However, a common case where you won’t avoid lifetimes is if you use references in structures.

To give a more concrete example, let’s consider a common exercise in functional programming: writing a linked list. Typically, a list is either empty or contains a pointer to the first node. In turn, each node contains a value and a pointer to the remaining of the list, which is itself a list. Let’s try to formalize this.

enum List { Nil(), Next(Node) }
struct Node { value: i32, next: List }
// error[E0072]: recursive type `List` has infinite size
//  --> src/main.rs:1:1
//   |
// 1 | enum List { Nil(), Next(Node) }
//   | ^^^^^^^^^               ----- recursive without indirection
//   | |
//   | recursive type has infinite size
//   |
//   = help: insert indirection (e.g., a `Box`, `Rc`, or `&`) at some point to make `List` representable

As we can see, if we don’t use any kind of indirection and put the next node directly inside the current node, we obtain a recursive structure and the compiler detects that it is not representable. Instead, let’s try to use a reference to the next node.

enum List { Nil(), Next(&Node) }
// error[E0106]: missing lifetime specifier
//  --> src/main.rs:1:25
//   |
// 1 | enum List { Nil(), Next(&Node) }
//   |                         ^ expected lifetime parameter

Now, we are back to lifetimes. You cannot write a reference inside a structure (or enumeration) without specifying its lifetime.This makes sense because such a reference points to another object, which must be alive at least as long as the current object exists (otherwise we have a dangling reference).

So let’s try with a generic lifetime 'a.

enum List { Nil(), Next(&'a Node) }
// error[E0261]: use of undeclared lifetime name `'a`
//  --> src/main.rs:1:26
//   |
// 1 | enum List { Nil(), Next(&'a Node) }
//   |                          ^^ undeclared lifetime

This lifetime must be declared. For this, we add it to the List enumeration, using the diamond notation. The following means that the List has some lifetime 'a, and that the reference to the next Node must have at least this lifetime 'a.

enum List<'a> { Nil(), Next(&'a Node) }

Now, we rewrite the Node structure to also declare and use the lifetime 'a.

struct Node<'a> { value: i32, next: List<'a> }
// error[E0106]: missing lifetime specifier
//  --> src/main.rs:1:33
//   |
// 1 | enum List<'a> { Nil(), Next(&'a Node) }
//   |                                 ^^^^ expected lifetime parameter

By doing this, we also need to propagate the lifetime to the Node instance in the List definition. The following code finally works and implements our linked list!

#[derive(Debug)]
enum List<'a> { Nil(), Next(&'a Node<'a>) }
#[derive(Debug)]
struct Node<'a> { value: i32, next: List<'a> }

fn main() {
    let n1 = Node {value: 1, next: List::Nil()};
    let n2 = Node {value: 2, next: List::Next(&n1)};
    println!("node = {:?}", n2);
}
// node = Node { value: 2, next: Next(Node { value: 1, next: Nil }) }

Now, let’s see if the borrow checker really checks lifetimes. In the following example, we replace the next node of n2 by a node n3which does not live long enough, due to a temporary scope. We can verify that the borrow checker complains!

fn main() {
    let n1 = Node {value: 1, next: List::Nil()};
    let mut n2 = Node {value: 2, next: List::Next(&n1)};
    {
        let n3 = Node {value: 3, next: List::Nil()};
        n2.next = List::Next(&n3);
    }
    println!("node = {:?}", n2);
}
// error[E0597]: `n3` does not live long enough
//   --> src/main.rs:12:5
//    |
// 11 |         n2.next = List::Next(&n3);
//    |                               -- borrow occurs here
// 12 |     }
//    |     ^ `n3` dropped here while still borrowed
// 13 |     println!("node = {:?}", n2);
// 14 | }
//    | - borrowed value needs to live until here

Compare this to pointers in C/C++, for which the compiler does not complain about such unsafe code!

// DO NOT DO THIS!
struct Node {
    Node(int v) : value(v), next(nullptr) {}
    Node(int v, Node* n) : value(v), next(n) {}

    int value;
    Node* next;
}

void print(Node n);

int main() {
    Node n1(1);
    Node n2(2, &n1);
    {
        Node n3(3);
        n2.next = &n3;
    }
    // n2.next is now invalid but the compiler does not complain...
    print(n2);
}

In summary, the borrow checker verifies mutable borrows and lifetimes, to ensure safe access to all references. There are more interesting and powerful features about lifetimes, such as bounds and coercion, but this post is already long enough so I invite the curious reader to have a look at them in the Rust book.

Conclusion

Memory management is a ubiquitous programming problem, and there is no simple solution. You can hide this complexity with a simple API (malloc/free) or an invisible garbage collector, but with complex programs, you can end up with security problems or performance issues. Instead, Rust chose to expose powerful abstractions that are not easy to understand at first glance but provide compile-time safety without compromising on performance.

There is still a lot to say about Rust, so get ready for my next post in this series, and more Rust projects! In the meantime, you can find me at RustFest this weekend.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末瓶籽,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌怀挠,老刑警劉巖蔚龙,帶你破解...
    沈念sama閱讀 217,826評(píng)論 6 506
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件臭家,死亡現(xiàn)場(chǎng)離奇詭異并炮,居然都是意外死亡补君,警方通過(guò)查閱死者的電腦和手機(jī)坤溃,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,968評(píng)論 3 395
  • 文/潘曉璐 我一進(jìn)店門拍霜,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái),“玉大人薪介,你說(shuō)我怎么就攤上這事祠饺。” “怎么了汁政?”我有些...
    開封第一講書人閱讀 164,234評(píng)論 0 354
  • 文/不壞的土叔 我叫張陵道偷,是天一觀的道長(zhǎng)。 經(jīng)常有香客問(wèn)我记劈,道長(zhǎng)勺鸦,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 58,562評(píng)論 1 293
  • 正文 為了忘掉前任目木,我火速辦了婚禮换途,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘刽射。我一直安慰自己军拟,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,611評(píng)論 6 392
  • 文/花漫 我一把揭開白布誓禁。 她就那樣靜靜地躺著懈息,像睡著了一般。 火紅的嫁衣襯著肌膚如雪摹恰。 梳的紋絲不亂的頭發(fā)上辫继,一...
    開封第一講書人閱讀 51,482評(píng)論 1 302
  • 那天阁最,我揣著相機(jī)與錄音,去河邊找鬼骇两。 笑死速种,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的低千。 我是一名探鬼主播配阵,決...
    沈念sama閱讀 40,271評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼示血!你這毒婦竟也來(lái)了棋傍?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,166評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤难审,失蹤者是張志新(化名)和其女友劉穎瘫拣,沒(méi)想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體告喊,經(jīng)...
    沈念sama閱讀 45,608評(píng)論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡麸拄,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,814評(píng)論 3 336
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了黔姜。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片拢切。...
    茶點(diǎn)故事閱讀 39,926評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖秆吵,靈堂內(nèi)的尸體忽然破棺而出淮椰,到底是詐尸還是另有隱情,我是刑警寧澤纳寂,帶...
    沈念sama閱讀 35,644評(píng)論 5 346
  • 正文 年R本政府宣布善涨,位于F島的核電站裤纹,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜嗽交,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,249評(píng)論 3 329
  • 文/蒙蒙 一韭畸、第九天 我趴在偏房一處隱蔽的房頂上張望笛质。 院中可真熱鬧炼邀,春花似錦、人聲如沸灯抛。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,866評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)对嚼。三九已至夹抗,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間纵竖,已是汗流浹背漠烧。 一陣腳步聲響...
    開封第一講書人閱讀 32,991評(píng)論 1 269
  • 我被黑心中介騙來(lái)泰國(guó)打工杏愤, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人已脓。 一個(gè)月前我還...
    沈念sama閱讀 48,063評(píng)論 3 370
  • 正文 我出身青樓珊楼,卻偏偏與公主長(zhǎng)得像,于是被迫代替她去往敵國(guó)和親度液。 傳聞我的和親對(duì)象是個(gè)殘疾皇子厕宗,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,871評(píng)論 2 354