References and borrowing - Programming Rust (2016)

Programming Rust (2016)

Chapter 5. References and borrowing

All the pointer types we’ve seen so far—the simple Box<T> heap pointer, and the pointers internal to String and Vec values—are owning pointers: when the owner is dropped, the referent goes with it. Rust also has non-owning pointer types called references, which have no effect on their referents’ lifetimes.

In fact, it’s rather the opposite: references must never outlive their referents. Rust requires it to be apparent simply from inspecting the code that no reference will outlive the value it points to. To emphasize this, Rust refers to creating a reference to some value as “borrowing” the value: what you have borrowed, you must eventually return to its owner.

If you felt a moment of skepticism when reading the “requires it to be apparent” phrase there, you’re in excellent company. The references themselves are nothing special; under the hood, they’re just pointers. But the rules that keep them safe are novel to Rust; outside of research languages, you won’t have seen anything like them before. And although these rules are the part of Rust that requires the most effort to master, the breadth of classic, absolutely everyday bugs they prevent is surprising, and their effect on multi-threaded programming is exciting enough to keep you up late thinking about the possibilities. This is Rust’s radical wager, again.

As an example, let’s suppose we’re going to build a table of murderous Renaissance artists and their most celebrated works. Rust’s standard library includes a hash table type, so we can define our type like this:

use std::collections::HashMap;

type Table = HashMap<String, Vec<String>>;

In other words, this is a hash table which maps String values to Vec<String> values, taking the name of an artist to a list of the names of their works. You can iterate over the entries of a HashMap with a for loop, so we can write a function to print out a Table for debugging:

fn show(table: Table) {

for (artist, works) in table {

println!("works by {}:", artist);

for work in works {

println!(" {}", work);

}

}

}

Constructing and printing the table is straightforward:

let mut table = Table::new();

table.insert("Gesualdo".to_string(),

vec!["many madrigals".to_string(),

"Tenebrae Responsoria".to_string()]);

table.insert("Caravaggio".to_string(),

vec!["The Musicians".to_string(),

"The Calling of St. Matthew".to_string()]);

table.insert("Cellini".to_string(),

vec!["Perseus with the head of Medusa".to_string(),

"a salt cellar".to_string()]);

show(table);

And it all works fine:

$ cargo run

Running `/home/jimb/rust/book/fragments/target/debug/fragments`

works by Gesualdo:

Tenebrae Responsoria

many madrigals

works by Cellini:

Perseus with the head of Medusa

a salt cellar

works by Caravaggio:

The Musicians

The Calling of St. Matthew

$

But if you’ve read the previous chapter’s section on moves, this definition for show should raise a few questions. In particular, HashMap is not Copy—it can’t be, since it owns a dynamically allocated table. So when the program calls show(table) above, the whole structure gets moved to the function, leaving the variable table uninitialized. If the calling code tries to use table now, it’ll run into trouble:

...

show(table);

assert_eq!(table["Gesualdo"][0], "many madrigals");

Rust complains that table isn’t available any more:

error: use of moved value: `table`

assert_eq!(table["Gesualdo"][0], "many madrigals");

^~~~~

note: `table` moved here because it has type `HashMap<String, Vec<String>>`,

which is non-copyable

show(table);

^~~~~

In fact, if we look into the definition of show, the outer for loop takes ownership of the hash table and consumes it entirely; and the inner for loop does the same to each of the vectors. (We saw this behavior earlier, in the “Liberté, égalité, fraternité” example.) Because of move semantics, we’ve completely destroyed the entire structure simply by trying to print it out. Thanks, Rust!

The right way to handle this is to use references. References come in two kinds:

§ A shared reference lets you read but not modify its referent. However, you can have as many shared references to a particular value at a time as you like. The expression &e yields a shared reference to e’s value; if e has the type T, then &e has the type &T. Shared references are Copy.

§ A mutable reference lets you both read and modify its referent. However, you may only have one mutable reference to a particular value active at a time. The expression &mut e yields a mutable reference to e’s value; you write its type as &mut T. Mutable references are not Copy.

You can think of the distinction between shared and mutable references as a way to enforce a “multiple readers or single writer” rule at compile time. This turns out to be essential to memory safety, for reasons we’ll go into later in the chapter.

The printing function in our example doesn’t need to modify the table, just read its contents. So the caller should be able to pass it a shared reference to the table, as follows:

show(&table);

References are non-owning pointers, so the table variable remains the owner of the entire structure; show has just borrowed it for a bit. Naturally, we’ll need to adjust the definition of show to match, but you’ll have to look closely to see the difference:

fn show(table: &Table) {

for (artist, works) in table {

println!("works by {}:", artist);

for work in works {

println!(" {}", work);

}

}

}

The type of table has changed from Table to &Table: instead of passing the table by value (and hence moving ownership into the function), we’re now passing a shared reference. That’s the only textual change. But how does this play out as we work through the body?

Whereas our original outer for loop took ownership of the HashMap and consumed it, in our new version it receives a shared reference to the HashMap. Iterating over a shared reference to a HashMap is defined to produce shared references to each entry’s key and value: artist has changed from a String to a &String, and works from a Vec<String> to a &Vec<String>.

The inner loop is changed similarly. Iterating over a shared reference to a vector is defined to produce shared references to its elements, so work is now a &String. No ownership changes hands anywhere in this function; it’s just passing around non-owning references.

Now, if we wanted to write a function to alphabetize the works of each artist, a shared reference doesn’t suffice, since shared references don’t permit modification. Instead, the sorting function needs to take a mutable reference to the table:

fn sort_works(table: &mut Table) {

for (_artist, works) in table {

works.sort();

}

}

And we need to pass it one:

sort_works(&mut table);

This mutable borrow grants the sort_works function the ability to read and modify our structure, as required by the vectors’ sort method.

References as values

The above example shows a pretty typical use for references: allowing functions to access or manipulate a structure without taking ownership. But references are more flexible than that, so let’s look at some very constrained examples to get a more detailed view of what’s going on.

Implicit dereferencing

Superficially, Rust references resemble C++ references: they’re both just pointers under the hood; and in the examples we showed, there was no need to explicitly dereference them. However, Rust references are really closer to C or C++ pointers. You must use the & operator to create references, and in the general case, dereferencing does require explicit use of the * operator:

let x = 10;

let r = &x;

assert!(*r == 10);

let mut y = 32;

let mr = &mut y;

*mr *= 32;

assert!(*mr == 1024);

So why were there no uses of * in the artist-handling code? The . operator automatically follows references for you, so you can omit the * in many cases:

let point = (1000, 729);

let r = &point;

assert_eq!(r.0, 1000);

In the above, the reference r.0 automatically dereferences r, as if you’d written (*r).0. This is why Rust has no analog to C and C++’s -> operator: the . operator handles that case itself. In fact, the . operator will follow as many references as you give it:

struct Point { x: i32, y: i32 }

let point = Point { x: 1000, y: 729 };

let r : &Point = &point;

let rr : &&Point = &r;

let rrr : &&&Point = &rr;

assert_eq!(rrr.y, 729);

(We’ve only written out the types here for clarity’s sake; there’s nothing here Rust couldn’t infer.) In memory, that code builds a structure like this:

a reference to a reference to a reference, and the . operator

So the expression rrr.y actually traverses three pointers to get to its target, as directed by the type of rrr, before fetching its y field.

Method calls automatically follow references as well:

let mut v = vec![1968, 1973];

let mut r : &mut Vec<i32> = &mut v;

let rr : &mut &mut Vec<i32> = &mut r;

rr.sort_by(|a, b| b.cmp(a)); // reverse sort order

assert_eq!(**rr, [1973, 1968]);

Even though the type of rr is &mut &mut Vec<i32>, we can still invoke methods available on Vec directly on rr, like sort_by.

Rust’s comparison operators “see through” any number of references as well, as long as both operands have the same type:

let x = 10;

let y = 10;

let rx = &x;

let ry = &y;

let rrx = ℞

let rry = &ry;

assert!(rrx <= rry);

assert!(rrx == rry);

The final assertion here succeeds, even though rrx and rry point at different values (namely, rx and ry), because the == operator follows all the references and performs the comparison on their final targets, x and y. This is almost always the behavior you want, especially when writing generic functions. If you actually want to know whether two references point to the same object, you must cast the references to raw pointers, which the comparison operators will not automatically dereference:

assert!(rx as *const i32 != ry as *const i32);

Assigning references

Like a C or C++ pointer, and unlike a C++ reference, assigning to a Rust reference makes it point at a new value:

let x = 10;

let y = 20;

let mut r = &x;

if b { r = &y; }

assert!(*r == 10 || *r == 20);

The reference r initially points to x. But if b is true, the if expression will change r to point to y instead:

a reference that has been repointed by assignment

References to slices and trait objects

The references we’ve shown so far are all simple pointers. However, as explained in the sections on slices and trait objects in Chapter 3, some references are “fat pointers”: two-word values that include both a pointer and some additional information about its referent.

A reference to a slice of an array, vector, or string, written &[T] for some type T, or &str for a slice of a String, comprises a pointer to the first element included in the slice, and the length of the slice in elements. A reference to a trait object &Tr, for some trait Tr, comprises a pointer to a value that implements the trait Tr, and a pointer to an implementation of Tr’s methods appropriate for the referent’s true type.

Aside from carrying this extra data, slice and trait object references behave just like the other sorts of references we’ve shown so far in this chapter: they are non-owning pointers, which are not allowed to outlive their referents; they may be mutable or shared; and so on.

References are never null

Rust references are never null. There’s no analog to C’s NULL or C++’s nullptr; there is no default initial value for a reference (you can’t use a variable until it’s been initialized, regardless of its type); and Rust won’t convert integers to pointers (outside of unsafe code).

C and C++ code often uses a null pointer to indicate the absence of a value: for example, the malloc function either returns a pointer to a new block of memory, or nullptr if there isn’t enough memory available to satisfy the request. In Rust, if you need a value that is either a reference to something or not, use the type Option<&T>. At the machine level, Rust represents None as a null pointer, and Some(r), where r is an &T value, as the non-zero address, so Option<&T> is just as efficient as a nullable pointer would be in C or C++, even though it’s safer: its type requires you to check whether it’s None before you can use it.

Borrowing references to arbitrary expressions

Whereas C and C++ only let you apply their & operator to certain kinds of expressions, Rust lets you borrow a reference to the value of any sort of expression at all:

fn factorial(n: usize) -> usize { (1..n+1).product() }

let r = &factorial(6);

assert_eq!(r + &1009, 1729);

In situations like this, Rust simply creates an anonymous variable to hold the expression’s value, and makes the reference point to that. The lifetime of this anonymous variable depends on what we do with the reference:

§ If the reference is being immediately assigned to a variable in a let statement (or is part of some struct or array that is being immediately assigned), then Rust makes the anonymous variable live as long as the variable the let initializes. In our example, Rust would do this for the referent of r.

§ Otherwise, the anonymous variable lives to the end of the enclosing statement. In our example, the anonymous variable created to hold 1009 lasts only to the end of the assert_eq! statement.

Reference safety

As we’ve presented them so far, references look pretty much like ordinary pointers in C or C++. But those are unsafe; how does Rust keep its references under control? Perhaps the best way to see the rules in action is to try to break them. We’ll start with the simplest example possible, and then add in interesting complications and explain how they work out.

Borrowing a local variable

Here’s a pretty obvious case. You can’t borrow a reference to a local and take it out of the local’s scope:

let r;

{

let x = 1;

r = &x;

}

assert_eq!(*r, 1); // bad: reads memory `x` used to occupy

The Rust compiler rejects this program, with a detailed error message:

error: `x` does not live long enough

r = &x;

^~

note: reference must be valid for the block suffix following statement 0 at ...

let r;

{

let x = 1;

r = &x;

}

assert_eq!(*r, 1); // bad: reads memory `x` used to occupy

...

note: ...but borrowed value is only valid for the block suffix following

statement 0 at ...

let x = 1;

r = &x;

}

The “block suffix following statement 0” phrase isn’t very clear. It generally refers to the portion of the program that some variable is in scope, from the point of its declaration to the end of the block that contains it. Here, the error messages talk about the “block suffixes” of the lifetimes of rand x. The compiler’s complaint is that the reference r is still live when its referent x goes out of scope, making it a dangling pointer—which is verboten.

While it’s obvious to a human reader that this program is broken, it’s worth looking at how Rust itself reached that conclusion. Even this simple example shows the logical tools Rust uses to check much more complex code.

Rust tries to assign each reference in your program a lifetime that meets the constraints imposed by how the reference is used. A lifetime is some stretch of your program for which a reference could live: a lexical block, a statement, an expression, the scope of some variable, or the like.

Here’s one constraint which should seem pretty obvious: If you have a variable x, then a reference to x must not outlive x itself:

permissible lifetimes for &x

Beyond the point where x goes out of scope, the reference would be a dangling pointer. This is true even if x is some larger data structure instead of a simple i32, and you’ve borrowed a reference to some part of it: x owns the whole structure, so when x goes, every value it owns goes along with it.

Here’s another constraint: If you store a reference in a variable r, the reference must be good for the entire lifetime of r:

permissible lifetimes for reference stored in r

If the reference can’t live at least as long as r does, then at some point r will be a dangling pointer. As before, this is true even if r is some larger data structure that contains the reference; if you build a vector of references, say, all of them must have lifetimes that enclose the vector’s.

So we’ve got some situations that limit how long a reference’s lifetime can be, and others that limit how short it can be. Rust simply tries to find a lifetime for every reference that satisfies these constraints. For example, the following code fragment shows a lifetime that satisfies the constraints placed on it:

a reference with a lifetime enclosing r's scope, but within x's scope

Since we’ve borrowed a reference to x, the reference’s lifetime must not extend beyond x’s scope. Since we’ve stored it in r, its lifetime must cover r’s scope. Since the latter scope lies within the former, Rust can easily find a lifetime that meets the constraints.

But in our original example, the constraints are contradictory:

a reference with contradictory constraints on its lifetime

There is simply no lifetime that is contained by x’s scope, and yet contains r’s scope. Rust recognizes this, and rejects the program.

This is the process Rust uses for all code. Data structures and function calls introduce new sorts of constraints, but the principle remains the same: first, understand the constraints arising from the way the program uses references; then, find lifetimes that satisfy those constraints. This is not so different from the process C and C++ programmers impose on themselves; the difference is that Rust knows the rules, and enforces them.

Lifetimes are entirely figments of Rust’s compile-time imagination. At run time, a reference is nothing but a pointer; its lifetime has been checked and discarded, and has no run-time representation.

Receiving references as parameters

When we pass a reference to a function, how does Rust make sure the function uses it safely? Suppose we have a function f that takes a reference and stores it in a global variable. We’ll need to make a few revisions to this, but here’s a first cut:

// This code has several problems, and doesn't compile.

static mut STASH: &i32;

fn f(p: &i32) { STASH = p; }

Rust’s equivalent of a global variable is called a static: it’s a value that’s created when the program starts, and which lasts until it terminates. (Like any other declaration, Rust’s module system controls where statics are visible, so they’re only “global” in their lifetime, not their visibility.) We cover statics in [Link to Come], but for now we’ll just call out a few rules that our code above doesn’t follow:

§ Every static must be initialized.

§ Mutable statics are inherently not thread-safe (after all, any thread can access a static at any time), and even in single-threaded programs, they can fall prey to other sorts of reentrancy problems. For these reasons, you may only access a mutable static within an unsafe block. In this example we’re not concerned with those particular problems, so we’ll just throw in an unsafe block and move on.

§ If you use a reference in a static’s type, you must explicitly write out its lifetime. Lifetime names in Rust are lower-case identifiers with a ' affixed to the front, like 'a or 'party. Our static STASH is alive for the program’s entire execution; Rust names this maximal lifetime 'static. In a reference type, the lifetime name goes between the & and the referent type, so STASH’s type must be &'static i32.

With those revisions made, we now have:

static mut STASH: &'static i32 = &128;

fn f(p: &i32) { // still not good enough

unsafe {

STASH = p;

}

}

We’re almost done. To see the remaining problem, we need to write out a few things that Rust is helpfully letting us omit. The signature of f as written above is actually shorthand for the following:

fn f<'a>(p: &'a i32) { ... }

Here, the lifetime 'a is a “lifetime parameter” of f. When we write fn f<'a>, we’re defining a function that will work for any given lifetime 'a. So, the signature above says that f is a function that takes a reference to an i32 with any given lifetime 'a.

Since we must allow 'a to be any lifetime, things had better work out if it’s the smallest possible lifetime: one just enclosing the body of f. This assignment then becomes a point of contention:

STASH = p;

When we assign one reference to another, the source’s lifetime must be at least as long as the destination’s. But p’s lifetime is clearly not guaranteed to be as long as STASH’s, which is 'static, so Rust rejects our code:

error: cannot infer an appropriate lifetime for automatic coercion due to

conflicting requirements

STASH = p;

^~

note: first, the lifetime cannot outlive the lifetime 'a as defined on the

block at ...

fn f<'a>(p: &'a i32) {

unsafe {

STASH = p;

}

}

note: ...so that reference does not outlive borrowed content

STASH = p;

^~

note: but, the lifetime must be valid for the static lifetime...

note: ...so that expression is assignable (expected `&'static i32`, found `&i32`)

STASH = p;

^~

At this point it’s clear that our function can’t accept just any reference as an argument. But it ought to be able to accept a reference that has a 'static lifetime: storing such a reference in STASH can’t create a dangling pointer. And indeed, the following code compiles just fine:

static mut STASH: &'static i32 = &10;

fn f(p: &'static i32) {

unsafe {

STASH = p;

}

}

This time, f’s signature spells out that p must be a reference with lifetime 'static, so there’s no longer any problem storing that in STASH. We can only apply f to references to other statics, but that’s the only thing that’s safe to do anyway.

Take a step back, though, and notice what happened to f’s signature as we amended our way to correctness: the original f(p: &i32) ended up as f(p: &'static i32). In other words, we were unable to write a function that stashed a reference in a global variable without reflecting that intention in the function’s signature. In Rust, a function’s signature always exposes the body’s behavior.

Conversely, if we do see a function with a signature like g(p: &i32) (or with the lifetimes written out, g<'a>(p: &'a i32)), we can tell that it does not stash its argument p anywhere that will outlive the call. There’s no need to look into g’s definition; the signature alone tells us what gcan and can’t do with its argument. This fact ends up being very useful when you’re trying to establish the safety of a call to the function.

Passing references as arguments

Now that we’ve shown how a function’s signature relates to its body, let’s examine how it relates to the function’s callers. Suppose you have the following code:

// This could be written more briefly: fn g(p: &i32),

// but let's write out the lifetimes for now.

fn g<'a>(p: &'a i32) { ... }

let x = 10;

g(&x);

From g’s signature alone, Rust knows it will not save p anywhere that might outlive the call: any lifetime that covers the call must work for 'a. So Rust chooses the smallest possible lifetime for &x: that of the call to g. This meets all constraints: it doesn’t outlive x, and covers the entire call to g. So this code passes muster.

Note that although g takes a lifetime parameter 'a, we didn’t need to mention it when calling g. In general, you only need to worry about lifetime parameters when defining functions and types; when using them, Rust infers the lifetimes for you.

What if we tried to pass &x to our function f from earlier, that stores its argument in a static?

fn f(p: &'static i32) { ... }

let x = 10;

f(&x);

This fails to compile: the reference &x must not outlive x, but by passing it to f we constrain it to live at least as long as 'static. There’s no way to satisfy everyone here, so Rust rejects the code.

Returning references

It’s common for a function to take a reference to some data structure, and then return a reference into some part of that structure. For example, here’s a function that returns a reference to the smallest element of a slice:

// v should have at least one element.

fn smallest(v: &[i32]) -> &i32 {

let mut s = &v[0];

for r in &v[1..] {

if *r < *s { s = r; }

}

s

}

We’ve omitted lifetimes from that function’s signature in the usual way, but writing them out would give us:

fn smallest<'a>(v: &'a [i32]) -> &'a i32 { ... }

Suppose we call smallest like this:

let s;

{

let parabola = [9, 4, 1, 0, 1, 4, 9];

s = smallest(&parabola);

}

assert_eq!(*s, 0); // bad: points to element of dropped array

From smallest’s signature, we can see that its argument and return value must have the same lifetime, 'a. In our call, the argument &parabola must not outlive parabola itself; yet smallest’s return value must live at least as long as s. There’s no possible lifetime 'a that can satisfy both constraints, so Rust rejects the code:

error: `parabola` does not live long enough

s = smallest(&parabola);

^~~~~~~~

note: reference must be valid for the block suffix following statement 0 at...

let s;

{

let parabola = [9, 4, 1, 0, 1, 4, 9];

s = smallest(&parabola);

}

assert_eq!(*s, 0); // bad: points to element of dropped array

...

note: ...but borrowed value is only valid for the block suffix following

statement 0 at ...

let parabola = [9, 4, 1, 0, 1, 4, 9];

s = smallest(&parabola);

}

Lifetimes in function signatures let Rust assess the relationships between the references you pass to the function and those the function returns, and ensure they’re being used safely.

Structures containing references

How does Rust handle references stored in data structures? Here’s the same erroneous program we looked at earlier, except that we’ve put the reference inside a structure:

// This does not compile.

struct S {

r: &i32

}

let s;

{

let x = 10;

s = S { r: &x };

}

assert_eq!(*s.r, 10); // bad: reads from dropped `x`

Rust is skeptical about our type S:

error: missing lifetime specifier

r: &i32

^~~~

Whenever a reference type appears inside another type’s definition, you must write out its lifetime, either declaring it 'static, or giving the type a lifetime parameter and using that. We’ll do the latter:

struct S<'a> {

r: &'a i32

}

The amended definition reads: “S is a struct that, for any given lifetime 'a, has a field r holding a reference with lifetime 'a to an i32.”

Each time you create a value of the type S, Rust tries to decide what lifetime would work for that value’s lifetime parameter 'a. Consider the expression S { r: &x }: this initializes r with &x, constraining 'a to be no larger than x’s scope.

Recall that assigning a reference to a variable constrains the reference’s lifetime to cover the variable’s scope. This rule actually applies not just to references, but to any type that takes a lifetime parameter, like our struct S. So the assignment s = S { r: &x } constrains the lifetime 'a to be at least as long as that of s.

And now Rust has arrived at the same contradictory constraints as before: 'a must not outlive x, yet must live at least as long as s. The type’s lifetime parameter relates the containing value’s lifetime to those of the references it contains. In this case, 'a relates the lifetime of our S value to itsr member, allowing Rust to detect the dangling pointer. Disaster averted!

Note that when a type has lifetime parameters, the lifetimes for each value of that type are distinct. If we had several values of type S running around in our example, each one would have its own independent 'a lifetime; the values wouldn’t necessarily constrain each other. (Naturally, if we assigned references back and forth between the values, that would introduce constraints following the usual rules.)

How does a type with a lifetime parameter behave when placed inside some other type?

struct T {

s: S // not adequate

}

Rust is skeptical, just as it was when we tried placing a reference in S without specifying its lifetime:

error: wrong number of lifetime parameters: expected 1, found 0

s: S

^~

We can’t leave off S’s lifetime parameter here: Rust needs to know how a T’s lifetime relates to that of the reference in its S, in order to apply the same checks to T that it does for S and plain references.

We could give s the 'static lifetime. This works:

struct T {

s: S<'static>

}

With this definition, the s field may only borrow values that live for the entire execution of the program. That’s somewhat restrictive, but it does mean that a T can’t possibly borrow a local variable; there are no special constraints on a T’s lifetime.

The other approach would be to give T its own lifetime parameter, and pass that to S:

struct T<'a> {

s: S<'a>

}

By taking a lifetime parameter 'a and using it in s’s type, we’ve allowed Rust to relate a T value’s lifetime to that of the reference its S holds.

We showed earlier how a function’s signature exposes what it does with the references we pass it. Now we’ve shown something similar about types: a type’s lifetime parameters always reveal whether it contains references with interesting (that is, non-'static) lifetimes, and what those lifetimes can be.

For example, suppose we have a parsing function that takes a slice of bytes, and returns a structure holding the results of the parse:

fn parseRecord<'i>(input: &'i [u8]) -> Record<'i> { ... }

Without looking into the definition of the Record type at all, we can tell that, if we receive a Record from parseRecord, whatever references it contains must point into the input buffer we passed in, and nowhere else (except perhaps at 'static values).

Distinct lifetime parameters

Suppose you’ve defined a structure containing two references like this:

struct S<'a> {

x: &'a i32,

y: &'a i32

}

Both references use the same lifetime 'a. This could be a problem if your code wants to do something like this:

let x = 10;

let r;

{

let y = 20;

{

let s = S { x: &x, y: &y };

r = s.x;

}

}

This code doesn’t create any dangling pointers. The reference to y stays in s, which goes out of scope before y does. The reference to x ends up in r, which doesn’t outlive x.

If you try to compile this, however, Rust will complain that y does not live long enough, even though it clearly does. Why is Rust worried? If you work through the code carefully, you can follow its reasoning:

§ Both members of S are references with the same lifetime 'a, so Rust must find a single lifetime that works for both s.x and s.y.

§ We assign r = s.x, requiring 'a to cover r’s lifetime.

§ We initialized s.y with &y, requiring 'a to be no longer than y’s lifetime.

These constraints are impossible to satisfy: no lifetime is shorter than y’s scope, but longer than r’s. Rust balks.

The problem arises because both references in S have the same lifetime 'a. Changing the definition of S to let each reference have a distinct lifetime fixes everything:

struct S<'a, 'b> {

x: &'a i32,

y: &'b i32

}

With this definition, s.x and s.y have independent lifetimes. What we do with s.x has no effect on what we store in s.y, so it’s easy to satisfy the constraints now: 'a can simply be r’s lifetime, and 'b can be s’s. (y’s lifetime would work too for 'b, but Rust tries to choose the smallest lifetime that works.) Everything ends up fine.

Newcomers to Rust often encounter difficulties of this sort: they know how to add lifetime parameters to their structures, and write lifetime names into their reference types, but don’t recognize when they’ve placed tighter constraints on the references than they really need. Often the stricter definition works fine in simple situations, but once they encounter an unlucky arrangement of lifetimes, Rust rejects their correct program, citing risks the programmer can see will never come to pass. If this happens to you, check whether your lifetime parameters aren’t entangling things you’d prefer to let vary independently.

Function signatures can have similar effects. Suppose we have a function like this:

fn f<'a>(r: &'a i32, s: &'a i32) -> &'a i32 { r } // perhaps too tight

Here, both reference parameters use the same lifetime 'a, which can unnecessarily constrain the caller in the same way we’ve shown above. When possible, let parameters’ lifetimes vary independently:

fn f<'a, 'b>(r: &'a i32, s: &'b i32) -> &'a i32 { r } // looser

Sharing versus mutation

So far, we’ve discussed how Rust ensures no reference will ever point to a variable that has gone out of scope. But there are other ways to introduce dangling pointers. Here’s an easy case:

let v = vec![4, 8, 19, 27, 34, 10];

let r = &v;

let aside = v; // move vector to aside

r[0]; // bad: uses `v`, which is now uninitialized

The assignment to aside moves the vector, leaving v uninitialized, turning r into a dangling pointer:

a reference pointing to a vector that has been moved

The problem here is not that v goes out of scope while r still refers to it, but rather that v’s value gets moved elsewhere, leaving v uninitialized. Naturally, Rust catches the error:

error: cannot move out of `v` because it is borrowed

let aside = v;

^~~~~

note: borrow of `v` occurs here

let r = &v;

^~

Throughout its lifetime, a shared reference makes its referent read-only: you may not assign to the referent or move its value elsewhere. In the code above, r’s lifetime covers the attempt to move the vector, so Rust rejects the program. If you change the program as shown below, there’s no problem:

let v = vec![4, 8, 19, 27, 34, 10];

{

let r = &v;

r[0]; // okay: vector is still there

}

let aside = v;

In this version, r goes out of scope earlier, the reference’s lifetime ends before v is moved aside, and all is well.

Here’s a different way to wreak havoc. Suppose we have a handy function to extend a vector with the elements of a slice:

fn extend(vec: &mut Vec<f64>, slice: &[f64]) {

for elt in slice {

vec.push(*elt);

}

}

This is a slightly less flexible (and much less optimized) version of the standard library’s extend_from_slice method on vectors. We can use it to build up a vector from slices of other vectors or arrays:

let mut wave = Vec::new();

let head = vec![0.0, 1.0];

let tail = [0.0, -1.0];

extend(&mut wave, &head); // extend wave with another vector

extend(&mut wave, &tail); // extend wave with an array

assert_eq!(wave, vec![0.0, 1.0, 0.0, -1.0]);

So we’ve built up one period of a sine wave here. If we want to add on another undulation, can we append the vector to itself?

extend(&mut wave, &wave);

assert_eq!(wave, vec![0.0, 1.0, 0.0, -1.0,

0.0, 1.0, 0.0, -1.0]);

This may look fine on casual inspection. But remember that, when we add an element to a vector whose buffer is full, the vector must allocate a new buffer with more space. Suppose wave starts with space for four elements, and so must allocate a larger buffer when extend tries to add a fifth. Memory ends up looking like this:

extending a vector with a slice of itself

The extend function’s vec argument borrows wave (owned by the caller) which has allocated itself a new buffer with space for eight elements. But slice continues to point to the old four-element buffer, which has been dropped.

This sort of problem isn’t unique to Rust: modifying collections while pointing into them is delicate territory in many languages. In C++, the specification of std::vector cautions you that “reallocation [of the vector’s buffer] invalidates all the references, pointers, and iterators referring to the elements in the sequence.” Similarly, Java says, of modifying a java.util.Hashtable object:

[I]f the Hashtable is structurally modified at any time after the iterator is created, in any way except through the iterator’s own remove method, the iterator will throw a ConcurrentModificationException.

Rust reports the problem with our call to extend at compile time:

error: cannot borrow `wave` as immutable because it is also borrowed as mutable

extend(&mut wave, &wave);

^~~~

note: previous borrow of `wave` occurs here; the mutable borrow prevents

subsequent moves, borrows, or modification of `wave` until the borrow ends

extend(&mut wave, &wave);

^~~~

note: previous borrow ends here

extend(&mut wave, &wave);

^

In other words, we may borrow a mutable reference to the vector, and we may borrow a shared reference to its elements, but those two references’ lifetimes may not overlap. In our case, both references’ lifetimes cover the call to extend, so Rust rejects the code.

The errors above both stem from violations of Rust’s rules for mutation and sharing:

§ Shared access is read-only access. Values borrowed by shared references are read-only. Across the lifetime of a shared reference, neither its referent, nor anything reachable from that referent, can be changed by anything. There exist no live mutable references to anything in that structure; its owner is held read-only; and so on. It’s really frozen.

§ Conversely, Mutable access is exclusive access. A value borrowed by a mutable reference is reachable exclusively via that reference. Across the lifetime of a mutable reference, there is no other usable path to its referent, or to any value reachable from there. The only references whose lifetimes may overlap with a mutable reference are those you borrow from the mutable reference itself.

Rust reported the extend example as a violation of the first rule: since we’ve borrowed a shared reference to wave’s elements, the elements and the Vec itself are all read-only. You can’t borrow a mutable reference to a read-only value.

But Rust could also have treated our bug as a violation of the second rule: since we’ve borrowed a mutable reference to wave, that mutable reference must be the only way to reach the vector or its elements. The shared reference to the slice is itself another way to reach the elements, violating the second rule.

Each kind of reference affects what we can do with the values along the owning path to the referent, and the values reachable from the referent:

effects of borrowing on values in an ownership tree

Note that in both cases, the path of ownership leading to the referent cannot be changed for the reference’s lifetime. For a shared borrow, the path is read-only; for a mutable borrow, it’s completely inaccessible. So there’s no way for the program to do anything that will invalidate the reference.

Paring these principles down to the simplest possible examples:

let mut x = 10;

let r1 = &x;

let r2 = &x; // okay: multiple shared borrows permitted

x += 10; // error: cannot assign to `x` because it is borrowed

let m = &mut x; // error: cannot borrow `x` as mutable because it is

// also borrowed as immutable

let mut y = 20;

let m1 = &mut y;

let m2 = &mut y; // error: cannot borrow as mutable more than once

let z = y; // error: cannot use `y` because it was mutably borrowed

It is okay to reborrow a shared reference from a shared reference:

let mut w = (107, 109);

let r = &w;

let r0 = &r.0; // okay: reborrowing shared as shared

let m1 = &mut r.1; // error: can't reborrow shared as mutable

Reborrowing from a mutable reference is considered just another way of accessing the value through that reference, so it is permitted:

let mut v = (136, 139);

let m = &mut v;

let m0 = &mut m.0; // okay: reborrowing mutable from mutable

*m0 = 137;

let r1 = &mut m.1; // okay: reborrowing shared from mutable,

// and doesn't overlap with m0

v.1; // error: access through other paths still forbidden

These restrictions are pretty tight. Turning back to our attempted call extend(&mut wave, &wave), there’s no quick and easy way to fix up the code to work the way we’d like. And Rust applies these rules everywhere: if we borrow, say, a shared reference to a key in a HashMap, we can’t borrow a mutable reference to the HashMap until the shared reference’s lifetime ends.

But there’s good justification for this: designing containers to support unrestricted, simultaneous iteration and modification is difficult, and often precludes simpler, more efficient implementations. Java’s Hashtable and C++’s vector don’t bother, and neither Python dictionaries nor JavaScript objects define exactly how such access behaves. Other container types in JavaScript do, but require heavier implementations as a result. C++’s std::map container promises that inserting new entries doesn’t invalidate pointers to other entries in the map, but by making that promise, the standard precludes more cache-efficient designs like Rust’s BTreeMap, which stores multiple entries in each node of the tree.

Here’s another example of the kind of bug these rules catch. Consider the following C++ code, meant to manage a file descriptor. To keep things simple, we’re only going to show a constructor and a copying assignment operator, and we’re going to omit error handling:

struct File {

int descriptor;

File(int d) : descriptor(d) { }

File& operator=(const File &rhs) {

close(descriptor);

descriptor = dup(rhs.descriptor);

}

};

The assignment operator is simple enough, but fails badly in a situation like this:

File f(open("foo.txt", ...));

...

f = f;

If we assign a File to itself, both rhs and *this are the same object, so operator= closes the very file descriptor it’s about to pass to dup. We destroy the same resource we were meant to copy.

In Rust, the analogous code would be:

struct File {

descriptor: i32

}

fn new_file(d: i32) -> File { File { descriptor: d } }

fn clone_from(this: &mut File, rhs: &File) {

close(this.descriptor);

this.descriptor = dup(rhs.descriptor);

}

This is not idiomatic Rust. There are excellent ways to give Rust types their own constructor functions and methods, which we describe [Link to Come], but the above definitions work for this example.

If we write the Rust code corresponding to the use of File above, we get:

let mut f = new_file(open("foo.txt", ...));

...

clone_from(&mut f, &f);

Rust, of course, refuses to even compile this code:

error: cannot borrow `f` as immutable because it is also borrowed as mutable

clone_from(&mut f, &f);

^~

note: previous borrow of `f` occurs here; the mutable borrow prevents

subsequent moves, borrows, or modification of `f` until the borrow ends

clone_from(&mut f, &f);

^~

note: previous borrow ends here

clone_from(&mut f, &f);

^

This should look familiar. It turns out that two classic C++ bugs—failure to cope with self-assignment, and using invalidated iterators—are actually both the same underyling kind of bug! In both cases, code assumes it is modifying one value while consulting another, when in fact they’re both the same value. By requiring mutable access to be exclusive, Rust has fended off a wide class of everyday mistakes.

The immiscibility of shared and mutable references also really shines when writing concurrent code. A data race is only possible when some value is both shared shared between threads and mutable—which is exactly what Rust’s reference rules eliminate. A concurrent Rust program that avoids unsafe code is free of data races by construction. We’ll cover concurrency in detail in “Concurrency”.

RUST’S SHARED REFERENCES VERSUS C’S POINTERS TO CONST

On first inspection, Rust’s shared references seem to closely resemble C and C++’s pointers to const values. However, Rust’s rules for shared references are much stricter. For example, consider the following C code:

int x = 42; // int variable, not const

const int *p = &x; // pointer to const int

assert(*p == 42);

x++; // change variable directly

assert(*p == 43); // “constant” referent's value has changed

The fact that p is a const int * means that you can’t modify its referent via p itself: (*p)++ is forbidden. But you can also get at the referent directly as x, which is not const, and change its value that way. The C family’s const keyword has its uses, but constant it is not.

In Rust, a shared reference forbids all modifications to its referent, until its lifetime ends:

let mut x = 42; // non-const i32 variable

let p = &x; // shared reference to i32

assert_eq!(*p, 42);

x += 1; // error: cannot assign to x because it is borrowed

assert_eq!(*p, 42); // if you take out the assignment, this is true

To ensure a value is constant, we need to keep track of all possible paths to that value, and make sure that they either don’t permit modification, or cannot be used at all. C and C++ pointers are too unrestricted for the compiler to check this. Rust’s references are always tied to a particular lifetime, making it feasible to check them at compile time.

Reference counting: Rc and Arc

XXX Text below was initially in the exposition of different languages’ assignment semantics, but it probably belongs more in this section.

If we’d like to recreate the state of the Python program, we need to change the types to explicitly request reference counting. The following code places a reference count on the vector (but not the strings):

use std::rc::Rc;

let s = Rc::new(vec!["udon".to_string(), "ramen".to_string(), "soba".to_string()]);

let t = s.clone();

let u = s.clone();

For any type T, the type Rc<T> is a reference to a T with a reference count attached to the front. So, the program shown above builds a picture like this:

XXX Needed: Reference-counted Rust vector in memory

XXX Lifetime elision