Basic types - Programming Rust (2016)

Programming Rust (2016)

Chapter 3. Basic types

Rust’s types help the language meet several goals:

§ Safety: A program’s types provide enough information about its behavior to allow the compiler to ensure that the program is well defined.

§ Efficiency: The programmer has fine-grained control over how Rust programs represent values in memory, and can choose types she knows the processor will handle efficiently. Programs needn’t pay for generality or flexibility they don’t use.

§ Parsimony: Rust manages the above without requiring too much guidance from the programmer in the form of types written out in the code. Rust programs are usually less cluttered with types than the analogous C++ program would be.

Rather than using an interpreter or a just-in-time compiler, Rust is designed to use ahead-of-time compilation: the translation of your entire program to machine code is completed before it ever begins execution. Rust’s types help an ahead-of-time compiler choose good machine-level representations for the values your program operates on: representations whose performance you can predict, and which give you full access to the machine’s capabilities.

Rust is a statically typed language: without actually running the program, the compiler checks that every possible path of execution will use values only in ways consistent with their types. This allows Rust to catch many programming mistakes early, and is crucial to Rust’s safety guarantees.

Compared to a dynamically typed language like JavaScript or Python, Rust requires more planning from you up front: you must spell out the types of functions’ parameters and return values, members of struct types, and a few other places. However, two features of Rust make this less trouble than you might expect:

§ Given the types that you did spell out, Rust will infer most of the rest for you. In practice, there’s often only one type that will work for a given variable or expression; when this is the case, Rust lets you leave out the type. For example, you could spell out every type in a function, like this:

§ fn build_vector() -> Vec<i16> {

§ let mut v: Vec<i16> = Vec::<i16>::new();

§ v.push(10i16);

§ v.push(20i16);

§ return v;

}

But this is cluttered and repetitive. Given the function’s return type, it’s obvious that v must be a Vec<i16>, a vector of 16-bit signed integers; no other type would work. And from that it follows that each element of the vector must be an i16. This is exactly the sort of reasoning Rust’s type inference applies, allowing you to instead write:

fn build_vector() -> Vec<i16> {

let mut v = Vec::new();

v.push(10);

v.push(20);

return v;

}

These two definitions are exactly equivalent; Rust will generate the same machine code either way. Type inference gives back much of the legibility of dynamically typed languages, while still catching type errors at compile time.

§ Functions can be generic: when a function’s purpose and implementation are general enough, you can define it to work on any set of types that meet the necessary criteria. A single definition can cover an open-ended set of use cases.

In Python and JavaScript, all functions work this way naturally: a function can operate on any value that has the properties and methods the function will need. (This is the characteristic often called “duck typing”: if it quacks like a duck, it’s a duck.) But it’s exactly this flexibility that makes it so difficult for those languages to detect type errors early; testing is often the only way to catch such mistakes. Rust’s generic functions give the language a degree of the same flexibility, while still catching all type errors at compile time.

Despite their flexibility, generic functions are just as efficient as their non-generic counterparts. We’ll discuss generic functions in detail in [Link to Come].

The rest of this chapter covers Rust’s types from the bottom up, starting with simple machine types like integers and floating-point values, and then showing how to compose them into more complex structures. Where appropriate, we’ll describe how Rust represents values of these types in memory, and their performance characteristics.

Here’s a summary of all Rust’s types, brought together in one place.

Type

Description

Values

i8, i16, i32, i64 u8, u16, u32, u64

signed and unsigned integers, of given bit width

-5i8, 0x400u16, 0o100i16, 20_922_789_888_000u64, b'*' (u8 byte literal), 42 (type is inferred)

isize, usize

signed and unsigned integers, size of address on target machine (32 or 64 bits)

-0b0101_0010isize, 0xffff_fc00usize, 137 (type is inferred)

f32, f64

IEEE floating-point numbers, single and double precision

3.14f32, 6.0221e23f64, 1.61803 (float type is inferred)

bool

Boolean

true, false

(char, u8, i32)

tuple: mixed types

('%', 0x7f, -1)

()

unit (empty tuple)

()

struct S { x: f32, y: f32 }

structure with named fields

S { x: 120.0, y: 209.0 }

struct T (i32, char);

tuple-like structure

T(120, 'X')

struct E;

empty structure

E

enum Attend { OnTime, Late(u32) }

enumeration, algebraic data type

Late(5), OnTime

Box<Attend>

box: owning pointer that frees referent when dropped

Box::new(Late(15))

&i32, &mut i32

shared and mutable references: non-owning pointers that must not outlive their referent

&s.y, &mut v

char

Unicode character, 32 bits wide

'*', '\n', '字', '\x7f', '\u{CA0}'

String

UTF-8 string, dynamically sized

"ラーメン: ramen".to_string()

&str

reference to str: non-owning pointer to UTF-8 text

"そば: soba", &s[0..12]

[f64; 4], [u8; 256]

array, fixed length

[1.0, 0.0, 0.0, 1.0], [b' '; 256]

Vector<f64>

vector, varying length

vec![0.367, 2.718, 7.389]

&[u8], &mut [u8]

reference to slice: reference to a portion of an array or vector comprising pointer and length

&v[10..20], &mut a[..]

&Any, &mut Read

reference to trait object: comprises pointer to value and vtable of trait methods

value as &Any, &mut file as &mut Read

fn(&str, usize) -> isize

pointer to function (not a closure)

i32::saturating_add

Most of these types are covered in this chapter, except for the following:

§ We give struct types their own chapter, [Chapter to Come].

§ We give enumerated types their own chapter, Chapter 7.

§ We describe trait objects in [Link to Come]

Machine types

The footing of Rust’s type system is a collection of fixed-width numeric types, chosen to match the types that almost all modern processors implement directly in hardware, and the boolean and character types.

The names of Rust’s numeric types follow a regular pattern, spelling out their width in bits, and the representation they use:

size (bits)

unsigned integer

signed integer

floating-point

8

u8

i8

16

u16

i16

32

u32

i32

f32

64

u64

i64

f64

machine word

usize

isize

Table 3-1. Rust’s numeric types

Integer types

Rust’s unsigned integer types use their full range to represent positive values and zero:

type

range

u8

0 to 28−1 (0 to 255)

u16

0 to 216−1 (0 to 65,535)

u32

0 to 232−1 (0 to 4,294,967,295)

u64

0 to 264−1 (0 to 18,446,744,073,709,551,615, 18 quintillion)

usize

0 to either 232−1 or 264−1

Rust’s signed integer types use the two’s complement representation, using the same bit patterns as the corresponding unsigned type to cover a range of positive and negative values:

type

range

i8

−27 to 27−1 (−128 to 127)

i16

−215 to 215−1 (−32,768 to 32,767)

i32

−231 to 231−1 (−2,147,483,648 to 2,147,483,647)

i64

−263 to 263−1 (−9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)

isize

either −231 to 231−1, or −263 to 263−1

Rust generally uses the u8 type for “byte” values. For example, reading data from a file or socket yields a stream of u8 values.

Unlike C and C++, Rust treats characters as distinct from the numeric types; a char is neither a u8 nor an i8. We describe Rust’s char type in its own section below.

The precision of the usize and isize types depends on the size of the address space on the target machine: they are 32 bits long on 32-bit architectures, and 64 bits long on 64-bit architectures. The usize type is analogous to the size_t type in C and C++. Rust requires array indices to beusize values. Values representing the sizes of arrays or vectors or counts of the number of elements in some data structure also generally have the usize type. The isize type is the signed analog of the usize type, similar to the ssize_t type in C and C++.

In debug builds, Rust checks for integer overflow in arithmetic.

let big_val = std::i32::MAX;

let x = big_val + 1; // panic: arithmetic operation overflowed

In a release build, this addition would wrap to a negative number (unlike C++, where signed integer overflow is undefined behavior). But unless you want to give up debug builds forever, it’s a bad idea to count on it. When you want wrapping arithmetic, use the methods:

let x = big_val.wrapping_add(1); // ok

Integer literals in Rust can take a suffix indicating their type: 42u8 is a u8 value, and 1729isize is an isize. You can omit the suffix on an integer literal, in which case Rust will try to infer a unique type for it from the context. If more than one type is possible, Rust defaults to i32, if that is among the possibilities. Otherwise, Rust reports the ambiguity as an error.

The prefixes 0x, 0o, and 0b designate hexadecimal, octal, and binary literals.

To make long numbers more legible, you can insert underscores among the digits. For example, you can write the largest u32 value as 4_294_967_295. The exact placement of the underscores is not significant; for example, this permits breaking hexadecimal or binary numbers into groups of four digits, which is often more natural than groups of three.

Some examples of integer literals:

literal

type

decimal value

116i8

i8

116

0xcafeu32

u32

51966

0b0010_1010

inferred

42

0o106

inferred

70

Although numeric types and the char type are distinct, Rust does provide “byte literals”, character-like literals for u8 values: b'X' represents the ASCII code for the character X, as a u8 value. For example, since the ASCII code for A is 65, the literals b'A' and 65u8 are exactly equivalent. Byte literals are limited to ASCII values, from 0 through 127.

There are a few characters that you cannot simply place after the single quote, because that would be either syntactically ambiguous or hard to read. The following characters require a backslash placed in front of them:

character

byte literal

numeric equivalent

single quote, '

b'\''

39u8

backslash, \

b'\\'

92u8

newline

b'\n'

10u8

carriage return

b'\r'

13u8

tab

b'\t'

9u8

For characters that are hard to write or read, you can write their ASCII code in hexadecimal instead. A byte literal of the form b'\xHH', where HH is a two-digit hexadecimal number, represents the character whose ASCII code is HH. The number HH must be between 00 and 7F (127 decimal). For example, the ASCII “escape” control character has a code of 27 decimal, or 1B hexadecimal, so you can write a byte literal for “escape” as b'\x1b'. But since byte literals are just another notation for u8 values, b'\x1b' and 0x1b are equivalent (letting Rust infer the type). Since the simple numeric literal is more legible, it probably only makes sense to use hexadecimal byte literals when you want to emphasize that the value represents an ASCII code.

Like any other sort of value, integers can have methods. The standard library provides some basic operations, which you can look up in the on-line documentation by searching for std::i32, std::u8, and so on.

assert_eq!(2u16.pow(4), 16); // exponentiation

assert_eq!((-4i32).abs(), 4); // absolute value

assert_eq!(0b101101u8.count_ones(), 4); // population count

The type suffixes on the literals are required here: Rust can’t look up a value’s methods until it knows its type. In real code, however, there’s usually additional context to disambiguate the type, so the suffixes aren’t needed.

Floating-point types

Rust provides IEEE single- and double-precision floating-point types. Following the IEEE 754-2008 specification, these types include positive and negative infinities, distinct positive and negative zero values, and a “not a number” value.

type

precision

range

f32

IEEE single precision (at least 6 decimal digits)

roughly -3.4 × 1038 to +3.4 × 1038

f64

IEEE double precision (at least 15 decimal digits)

roughly -1.8 × 10308 to +1.8 × 10308

Floating-point literals have the general form:

a floating-point literal

a floating-point literal

Every part of a floating-point number after the integer part is optional, but at least one of the fractional part, exponent, or type suffix must be present, to distinguish it from an integer literal. The fractional part may consist of a lone decimal point, so 5. is a valid floating-point constant.

If a floating-point literal lacks a type suffix, Rust will infer whether it is an f32 or f64 from the context, defaulting to the latter if both would be possible. For the purposes of type inference, Rust treats integer literals and floating-point literals as distinct classes: it will never infer a floating-point type for an integer literal, or vice versa.

Some examples of floating-point literals:

literal

type

mathematical value

-1.125

inferred

−(1 9/16)

2.

inferred

2

0.25

inferred

1/4

125e-3

inferred

1/8

1e4

inferred

1000

40f32

f32

40

271.8281e-2f64

f64

2.718281

The f32 and f64 types provide a full complement of methods for mathematical calculations; for example, 2f64.sqrt() is the double-precision square root of two. The standard library documentation describes these under the module name “std::f32 (primitive type)” and “std::f64(primitive type)”.

The standard library’s std::f32 and std::f64 modules define constants for the IEEE-required special values, as well as the largest and smallest finite values. The std::f32::consts and std::f64::consts modules provide various commonly used constants like E, PI, and the square root of two.

Unlike C and C++, Rust performs almost no numeric conversions implicitly. If a function expects an f64 argument, it’s an error to pass an i32 value as the argument. In fact, Rust won’t even implicitly convert an i16 value to an i32 value, even though every i16 value is also an i32 value. But the key word here is “implicitly”: you can always write out explicit conversions using the as operator: i as f64, or x as i32. The lack of implicit conversions sometimes makes a Rust expression more verbose than the analogous C or C++ code would be. However, implicit integer conversions have a well-established record of causing bugs and security holes; in our experience, the act of writing out numeric conversions in Rust has alerted us to problems we would otherwise have missed.

Like any other type, floating-point types can have methods. The standard library provides the usual selection of arithmetic operations, transcendental functions, IEEE-specific manipulations, and general utilities, which you can look up in the on-line documentation by searching for std::f32and std::f64. Some examples:

assert_eq!(5f32.sqrt() * 5f32.sqrt(), 5.);

assert_eq!(1f64.asin(), std::f64::consts::PI/2.);

assert!((-1. / std::f32::INFINITY).is_sign_negative());

The type suffixes on the literals are required here: Rust can’t look up a value’s methods until it knows its type. In real code, however, there’s usually additional context to disambiguate the type, so the suffixes aren’t needed.

The bool type

Rust’s boolean type, bool, has the usual two values for such types, true and false. Comparison operators like == and < produce bool results: the value of 2 < 5 is true.

Many languages are lenient about using values of other types in contexts that require a boolean value: C and C++ implicitly convert characters, integers, floating-point numbers, and pointers to boolean values, so they can be used directly as the condition in an if or while statement. Python permits strings, lists, dictionaries, and even sets in boolean contexts, treating such values as true if they’re non-empty. Rust, however, is very strict: control structures like if and while require their conditions to be bool expressions, as do the short-circuiting logical operators && and ||. You must write if x != 0 { ... }, not simply if x { ... }.

Rust’s as operator can convert bool values to integer types:

assert_eq!(false as i32, 0);

assert_eq!(true as i32, 1);

However, as won’t convert in the other direction, from numeric types to bool. Instead, you must write out an explicit comparison like x != 0.

Although a bool only needs a single bit to represent it, Rust uses an entire byte for a bool value in memory, so you can create a pointer to it. But naturally, if the compiler can prove that a given bool never has its address taken, it can choose whatever representation it likes for it, since the programmer will never know the difference.

Characters

Rust’s character type char represents a single Unicode character, as a 32-bit value.

Rust uses the char type for single characters in isolation, but uses the UTF-8 encoding for strings and streams of text. So, a String represents its text as a sequence of UTF-8 bytes; but iterating over a string with a for loop produces char values.

Character literals are characters enclosed in single quotes, like '8' or '!'. You can use any Unicode character you like: '鉄' is a char literal representing the Japanese kanji for “tetsu” (iron).

As with byte literals, backslash escapes are required for a few characters:

character

Rust character literal

single quote, '

'\''

backslash, \

'\\'

newline

'\n'

carriage return

'\r'

tab

'\t'

If you prefer, you can write out a character’s Unicode scalar value in hexadecimal:

§ If the character’s scalar value is in the range U+0000 to U+007F (that is, if it is drawn from the ASCII character set), then you can write the character as '\xHH', where HH is a two-digit hexadecimal number. For example, the character literals '*' and '\x2A' are equivalent, because the scalar value of the character * is 42, or 2A in hexadecimal.

§ You can write any Unicode character as '\u{HHHHHH}', where HHHHHH is a hexadecimal number between one and six digits long. For example, the character literal '\u{CA0}' represents the character “ಠ”, a Kannada character used in the Unicode Look of Disapproval, “ಠ_ಠ”. The same literal could also be simply written as 'ಠ'.

A char always holds a Unicode scalar value, in the range 0x0000 to 0xD7FF or 0xE000 to 0x10FFFF. A char is never a surrogate pair half (that is, a code point in the range 0xD800 to 0xDFFF), or a value outside the Unicode code space (that is, greater than 0x10FFFF). Rust uses the type system and dynamic checks to ensure char values are always in the permitted range.

Rust never implicitly converts between char and any other type. You can use the as conversion operator to convert a char to an integer type; for types smaller than 32 bits, the upper bits of the character’s value are truncated:

assert_eq!('*' as i32, 42);

assert_eq!('ಠ' as u16, 0xca0);

assert_eq!('ಠ' as i8, -0x60); // U+0CA0 truncated to eight bits, signed

Going in the other direction, u8 is the only type the as operator will convert to char: Rust intends the as operator to perform only cheap, infallible conversions, but every integer type other than u8 includes values that are not permitted Unicode scalar values, so those conversions would require run-time checks. Instead, the standard library function std::char::from_u32 takes any u32 value and returns an Option<char>: if the u32 is not a permitted Unicode scalar value, then from_u32 returns None; otherwise, it returns Some(c), where c is the char result.

The standard library provides some useful methods on characters, which you can look up in the on-line documentation by searching for std::char. For example:

assert_eq!('*'.is_alphabetic(), false);

assert_eq!('8'.to_digit(10), Some(8));

assert_eq!('ಠ'.len_utf8(), 3);

Naturally, single characters in isolation are not as interesting as strings and streams of text. We’ll describe Rust’s standard String type and text handling in general below.

Tuples

A tuple is a pair, or triple, or quadruple, ... of values of assorted types. You can write a tuple as a sequence of elements, separated by commas and surrounded by parentheses. For example, ("Brazil", 1985) is a tuple whose first element is a statically allocated string, and whose second is an integer; its type is (&str, i32) (or whatever integer type Rust infers for 1985). Given a tuple value t, you can access its elements as t.0, t.1, and so on.

Tuples are distinct from arrays: for one thing, each element of a tuple can have a different type, whereas an array’s elements must be all the same type. Further, tuples allow only constants as indices: if t is a tuple, you can’t write t[i] to refer to the i’th element of a tuple. A tuple element expression always refers to some fixed element, like t.4.

Rust code often uses tuple types to return multiple values from a function. For example, the split_at method on string slices, which divides a string into two halves and returns them both, is declared like this:

fn split_at(&self, mid: usize) -> (&str, &str);

The return type (&str, &str) is a tuple of two string slices. You can use pattern matching syntax to assign each element of the return value to a different variable:

let text = "I see the eigenvalue in thine eye";

let (head, tail) = text.split_at(21);

assert_eq!(head, "I see the eigenvalue ");

assert_eq!(tail, "in thine eye");

This is more legible than the equivalent:

let text = "I see the eigenvalue in thine eye";

let temp = text.split_at(21);

let head = temp.0;

let tail = temp.1;

assert_eq!(head, "I see the eigenvalue ");

assert_eq!(tail, "in thine eye");

You’ll also see tuples used as a sort of minimal-drama struct type. For example, in the Mandelbrot program in Chapter 2, we need to pass the width and height of the image to the functions that plot it and write it to disk. We could declare a struct with width and height members, but that’s pretty heavy notation for something so obvious, so we just used a tuple:

/// Write the buffer `pixels`, whose dimensions are given by `bounds`, to the

/// file named `filename`.

///

/// `bounds` is a pair giving the width and height of the bitmap. ...

fn write_bitmap(filename: &str, pixels: &[u8], bounds: (usize, usize))

-> Result<()>

{ ... }

The type of the bounds parameter is (usize, usize), a tuple of two usize values. Using a tuple lets us manage the width and height as a single parameter, making the code more legible.

The other commonly used tuple type, perhaps surprisingly, is the zero-tuple (). This is traditionally called “the unit type” because it has only one value, also written (). Rust uses the unit type where there’s no meaningful value to carry, but context requires some sort of type nonetheless.

For example, a function which returns no value has a return type of (). The standard library’s reverse method on array slices has no meaningful return value; it reverses the slice’s elements in place. The declaration for reverse reads:

fn reverse(&mut self);

This simply omits the the function’s return type altogether, which is shorthand for returning the unit type:

fn reverse(&mut self) -> ();

Similarly, the write_bitmap example we mentioned above has a return type of std::io::Result<()>, meaning that the function provides a std::io::Error value if something goes wrong, but returns no value on success.

If you like, you may include a comma after a tuple’s last element: the types (&str, i32,) and (&str, i32) are equivalent, as are the expressions ("Brazil", 1985,) and ("Brazil", 1985). Human programmers will probably find trailing commas distracting, but tolerating them in the language’s syntax can simplify programs that generate Rust code. Rust consistently permits an extra trailing comma everywhere commas are used: function arguments, arrays, enum definitions, and so on.

For completeness’ sake, there are even tuples that contain a single value. The literal ("lonely hearts",) is a tuple containing a single string; its type is (&str,). Here, the comma after the value is necessary to distinguish the singleton tuple from a simple parenthetic expression. Like the trailing commas, singleton tuples probably don’t make much sense in code written by humans, but their admissibility can be useful to generated code.

Pointer types

Rust has several types that represent memory addresses.

This is a big difference between Rust and most languages with garbage collection. In Java, if class Tree contains a field Tree left;, then left is a reference to another separately-created Tree object. Objects never physically contain other objects in Java.

Rust is different. The language is designed to help keep allocations to a minimum. Values nest by default. The value ((0, 0), (1440, 900)) is stored as four adjacent integers. If you store it in a local variable, you’ve got a local variable four integers wide. Nothing is allocated in the heap.

This is great for memory efficiency, but as a consequence, when a Rust program needs values to point to other values, it must use pointer types explicitly. The good news is that the pointer types used in safe Rust are constrained to eliminate undefined behavior, so pointers are much easier to use correctly in Rust than in C++.

We’ll discuss three pointer types here: references, boxes, and unsafe pointers.

References

A value of type &String is a reference to a String value, an &i32 is a reference to an i32, and so on.

It’s easiest to get started by thinking of references as Rust’s basic pointer type. A reference can point to any value anywhere, stack or heap. The address-of operator, &, and the deref operator, *, work on references in Rust, just as their counterparts in C work on pointers. And like a C pointer, a reference does not automatically free any resources when it goes out of scope.

One difference is that Rust references are immutable by default:

§ &T - immutable reference, like const T* in C

§ &mut T - mutable reference, like T* in C

Another major difference is that Rust tracks the ownership and lifetimes of values, so many common pointer-related mistakes are ruled out at compile time. Chapter 5 explains Rust’s rules for safe reference use.

Boxes

The simplest way to allocate a value in the heap is to use Box::new.

let t = (12, "eggs");

let b = Box::new(t); // allocate a tuple in the heap

The type of t is (i32, &str), so the type of b is Box<(i32, &str)>. Box::new() allocates enough memory to contain the tuple on the heap. When b goes out of scope, the memory is freed immediately, unless b has been moved—by returning it, for example.

Raw pointers

Rust also has the raw pointer types *mut T and *const T. Raw pointers really are just like pointers in C++. Using a raw pointer is unsafe, because Rust makes no effort to track what a raw pointer points to. For example, the pointer may be null; it may point to memory that has been freed or now contains a value of a different type. All the classic pointer mistakes of C++ are offered for your enjoyment in unsafe Rust. For details, see [Link to Come].

Arrays, Vectors, and Slices

Rust has three types for representing a sequence of values in memory:

§ The type [T; N] represents an array of N values, each of type T. An array’s size is a constant, and is part of the type; you can’t append new elements, or shrink an array.

§ The type Vec<T>, called a “vector of Ts”, is a dynamically allocated, growable sequence of values of type T. A vector’s elements live on the heap, so you can resize vectors at will: push new elements onto them, append other vectors to them, delete elements, and so on.

§ The types &[T] and &mut [T], called a “shared slice of Ts” or “mutable slice of T”, is a reference to a series of elements that are a part of some other value, like an array or vector. You can think of a slice as a pointer to its first element, together with a count of the number of elements you can access starting at that point. A mutable slice &mut [T] lets you read and modify elements, but can’t be shared; a shared slice &[T] lets you share access amongst several readers, but doesn’t let you modify elements.

Given a value v of any of these three types, the expression v.len() gives the number of elements in v, and v[i] refers to the i’th element of v. The first element is v[0], and the last element is v[v.len() - 1]. Rust checks that i always falls within this range; if it doesn’t, the thread panics. Of course, v’s length may be zero, in which case any attempt to index it will panic. i must be a usize value; you can’t use any other integer type as an index.

Arrays

There are several ways to write array values. The simplest is to write a series of values within square brackets:

let lazy_caterer: [u32; 6] = [1, 2, 4, 7, 11, 16];

let taxonomy = ["Animalia", "Arthropoda", "Insecta"];

assert_eq!(lazy_caterer[3], 7);

assert_eq!(taxonomy.len(), 3);

For the common case of a long array filled with some value, you can write [V; N], where V is the value each element should have, and N is the length. For example, [true; 100000] is an array of 100000 bool elements, all set to true:

let mut sieve = [true; 100000];

for i in 2..100 {

if sieve[i] {

let mut j = i * i;

while j < 100000 {

sieve[j] = false;

j += i;

}

}

}

assert!(sieve[211]);

assert!(!sieve[30031]);

You’ll see this syntax used for fixed-size buffers: [0u8; 1024] can be a one-kilobyte buffer, filled with zero bytes. Rust has no notation for an uninitialized array. (In general, Rust ensures that code can never access any sort of uninitialized value.)

The useful methods you’d like to see on arrays—iterating over elements, searching, sorting, filling, filtering, and so on—all appear as methods of slices, not arrays. But since those methods take their operands by reference, and taking a reference to an array produces a slice, you can actually call any slice method on an array directly:

let mut chaos = [3, 5, 4, 1, 2];

chaos.sort();

assert_eq!(chaos, [1, 2, 3, 4, 5]);

Here, the sort method is actually defined on slices, but since sort takes its operand by reference, we can use it directly on chaos: the call implicitly produces a &mut [i32] slice referring to the entire array. In fact, the len method we mentioned earlier is a slice method as well.

We cover slices in more detail in section slices below.

Vectors

There are several ways to create vectors. The simplest is probably to use the vec! macro, which gives us a syntax for vectors that looks very much like an array literal:

let mut v = vec![2, 3, 5, 7];

assert_eq!(v.iter().fold(1, |a, b| a * b), 210);

But of course, this is a vector, not an array, so we can add elements to it dynamically:

v.push(11);

v.push(13);

assert_eq!(v.iter().fold(1, |a, b| a * b), 30030);

The vec! macro is equivalent to calling Vec::new to create a new, empty vector, and then pushing the elements onto it, which is another idiom:

let mut v = Vec::new();

v.push("step");

v.push("on");

v.push("no");

v.push("pets");

assert_eq!(v, vec!["step", "on", "no", "pets"]);

Another possibility is to build a vector from the values produced by an iterator:

let v: Vec<i32> = (0..5).collect();

assert_eq!(v, [0, 1, 2, 3, 4]);

You’ll often need to supply the type when using collect, as we’ve done above, as collect can build many different sorts of collections, not just vectors. By making the type for v explicit, we’ve made it unambiguous which sort of collection we want.

Vec is a fairly fundamental type to Rust—it’s used almost anywhere one needs list of dynamic size—so there are many other methods that construct new vectors or extend existing ones. To explore other options, consult the online documentation for std::vec::Vec.

A vector always stores its contents in the dynamically allocated heap. A Vec<T> consists of three values: a pointer to the block of memory allocated to hold the elements; the number of elements that block has the capacity to store; and the number it actually contains now (in other words, its length). When the block has reached its capacity, adding another element to the vector entails allocating a larger block, copying the present contents into it, updating the vector’s pointer and capacity to describe the new block, and finally freeing the old one.

If you know the number of elements a vector will need in advance, instead of Vec::new you can call Vec::with_capacity to create a vector with a block of memory large enough to hold them all, right from the start; then, you can add the elements to the vector one at a time without causing any reallocation. Note that this only establishes the initial size; if you exceed your estimate, the vector simply enlarges its storage as usual.

Many library functions look for the opportunity to use Vec::with_capacity instead of Vec::new. For example, in the collect example above, the iterator 0..5 knows in advance that it will yield five values, and the collect function takes advantage of this to pre-allocate the vector it returns with the correct capacity.

Just as a vector’s len method returns the number of elements it contains now, its capacity method returns the number of elements it could hold without reallocation:

let mut v = Vec::with_capacity(2);

assert_eq!(v.len(), 0);

assert_eq!(v.capacity(), 2);

v.push(1);

v.push(2);

assert_eq!(v.len(), 2);

assert_eq!(v.capacity(), 2);

v.push(3);

assert_eq!(v.len(), 3);

assert_eq!(v.capacity(), 4);

The capacities you’ll see in your code may differ from those shown here, depending on what sizes Vec and the system’s heap allocator decide would be best.

You can insert and remove elements wherever you like in a vector, although these operations copy all the elements after the insertion point:

let mut v = vec![10, 20, 30, 40, 50];

// Make the element at index 3 be 35.

v.insert(3, 35);

assert_eq!(v, [10, 20, 30, 35, 40, 50]);

// Remove the element at index 2.

v.remove(1);

assert_eq!(v, [10, 30, 35, 40, 50]);

You can use the pop method to remove the last element and return it. More precisely, popping a value from a Vec<T> returns an Option<T>: None if the vector was already empty, or Some(v) if its last element had been v.

let mut v = vec!["carmen", "miranda"];

assert_eq!(v.pop(), Some("miranda"));

assert_eq!(v.pop(), Some("carmen"));

assert_eq!(v.pop(), None);

You can use a for loop to iterate over a vector:

// Get our command-line arguments as a vector of Strings.

let languages: Vec<String> = std::env::args().skip(1).collect();

for l in languages {

println!("{}: {}", l,

if l.len() % 2 == 0 {

"functional"

} else {

"imperative"

});

}

Running this program with a list of programming languages is illuminating:

$ cargo run Lisp Scheme C C++ Fortran

Compiling fragments v0.1.0 (file:///home/jimb/rust/book/fragments)

Running `.../target/debug/fragments Lisp Scheme C C++ Fortran`

Lisp: functional

Scheme: functional

C: imperative

C++: imperative

Fortran: imperative

$

Finally, a satisfying definition for the term “functional language”.

As with arrays, many useful methods you’d like to see on vectors, like iterating over elements, searching, sorting, filling, and filtering, all appear as methods of slices, not arrays. But since those methods take their operands by reference, and taking a reference to a vector produces a slice, you can actually call any slice method on an vector directly:

let mut v = vec!["a man", "a plan", "a canal"];

v.reverse();

assert_eq!(v, ["a canal", "a plan", "a man"]); // disappointing

Here, the reverse method is actually defined on slices, but since reverse takes its operand by reference, we can use it directly on v: the call implicitly produces a &mut [&str] slice referring to the entire array.

BUILDING VECTORS ELEMENT BY ELEMENT

Building a vector one element at a time isn’t as bad as it might sound. Whenever a vector outgrows its capacity by a single element, it chooses a new block twice as large as the old one. By the time it has reached its final size of 2n for some n, the total number of elements copied in the course of reaching that size is the sum of each of the powers of two smaller than 2n—that is, the sizes of the blocks we left behind. But if you think about how powers of two work, that total is simply 2n-1, meaning that the number of elements copied is always within a factor of two of the final size. Since the number of copies is linear in the final size, the cost per element is constant—the same as it would be if you had allocated the vector with the correct size to begin with!

What this means is that using Vec::with_capacity instead of Vec::new is a way to gain a constant factor improvement in speed, rather than an algorithmic improvement. For small vectors, avoiding a few calls to the heap allocator can make an observable difference in performance.

Slices

A slice, written [T] without specifying the length, is a region of an array or vector. Since a slice can be any length, slices can’t be stored directly in variables or passed as function arguments. Slices are always passed by reference.

A reference to a slice is a fat pointer: a two-word value comprising a pointer to the slice’s first element, and the number of elements in the slice.

Suppose you run the following code:

let v: Vec<f64> = vec![0.0, 0.707, 1.0, 0.707];

let a: [f64; 4] = [0.0, -0.707, -1.0, -0.707];

let sv: &[f64] = &v;

let sa: &[f64] = &a;

On the last two lines, Rust automatically converts a &Vec<f64> reference and a &[f64; 4] reference to slice references that point directly to the data.

Whereas an ordinary reference is a non-owning pointer to a single value, a reference to a slice is a non-owning pointer to several values. This makes slice references a good choice when you want to write a function that operates on any homogeneous data series, regardless of whether it’s stored in an array or a vector, stack or heap. For example, here’s a function that prints a slice of numbers, one per line:

fn print(n: &[f64]) {

for elt in n {

println!("{}", elt);

}

}

print(&v); // works on vectors

print(&a); // works on arrays

Because this function takes a slice reference as an argument, you can apply it to either a vector or an array, as shown. In fact, many methods you might think of as belonging to vectors or arrays are actually methods defined on slices: for example, the sort and reverse methods, which sort or reverse a sequence of elements in place, are actually methods on the slice type [T].

You can get a reference to a slice of an array or vector, or a slice of an existing slice, by indexing it with a range:

print(&v[0..2]); // print the first two elements of v

print(&a[2..]); // print elements of a starting with a[2]

print(&sv[1..3]); // print v[1] and v[2]

As with ordinary array accesses, Rust checks that the indices are valid. Trying to take a slice that extends past the end of the data results in a thread panic.

String types

Programmers familiar with C++ will recall that there are two string types in the language. String literals have the pointer type const char *. The standard library also offers a class, std::string, for dynamically creating strings at run time.

Rust has a similar design. In this section, we’ll show all the ways to write string literals, then talk about Rust’s two string types and how to use them.

String literals

String literals are enclosed in double quotes. They use the same backslash escape sequences as char literals.

let speech = "\"Ouch!\" said the well.\n";

A string may span multiple lines:

println!("In the room the women come and go,

Singing of Mount Abora");

The newline character in that string literal is included in the string, and therefore in the output. So are the spaces at the beginning of the second line.

If one line of a string ends with a backslash, then the newline character and the leading whitespace on the next line are dropped:

println!("It was a bright, cold day in April, and \

there were four of us—\

more or less.");

This prints a single line of text. The string contains a single space between “and” and “there”, because there is a space before the backslash in the program, and no space after the dash.

In a few cases, the need to double every backslash in a string is a nuisance. (The classic examples are regular expressions and Windows filenames.) For these cases, Rust offers raw strings. A raw string is tagged with the lowercase letter r. All backslashes and whitespace characters inside a raw string are included verbatim in the string. No escape sequences are recognized.

let default_win_install_path = r"C:\Program Files\Gorillas";

let pattern = Pcre::compile(r"\d+(\.\d+)*");

You can’t include a double-quote character in a raw string simply by putting a backslash in front of it—remember, we said no escape sequences are recognized. However, there is a cure for that too. The start and end of a raw string can be marked with pound signs:

println!(r###"

This raw string started with 'r###"'.

Therefore it does not end until we reach a quote mark ('"')

followed immediately by three pound signs ('###'):

"###);

You can add as few or as many pound signs as needed to make it clear where the raw string ends.

Byte strings

A string literal with the b prefix is a byte string. Such a string is a slice of u8 values—that is, bytes—rather than Unicode text.

let method = b"GET";

assert_eq!(method, &[b'G', b'E', b'T']);

This combines with all the other string syntax we’ve shown above: byte strings can span multiple lines, use escape sequences, and use backslashes to join lines. Raw byte strings start with br".

Byte strings can’t contain arbitrary Unicode characters. They must make do with ASCII and escape sequences that denote values in the range 0-255.

The type of method above is &[u8; 3]: it’s a reference to an array of 3 bytes. It doesn’t have any of the string methods we’ll discuss in a minute. The most string-like thing about it is the syntax we used to write it.

Strings in memory

Strings are sequences of Unicode characters, but they are not stored in memory as arrays of chars. Instead, they are stored using UTF-8, a variable-width encoding. Each ASCII character in a string is stored in one byte. Other characters take up multiple bytes.

assert_eq!("ಠ_ಠ".as_bytes(),

[0xe0, 0xb2, 0xa0, b'_', 0xe0, 0xb2, 0xa0]);

The type of a string literal is &str, meaning it is a reference to a str, a slice of memory that’s guaranteed to contain valid UTF-8 data.

Like other slice references, an &str is a fat pointer. It contains both the address of the actual data and a length field. The .len() method of an &str returns the length. Note that it’s measured in bytes, not characters:

assert_eq!("ಠ_ಠ".len(), 7);

assert_eq!("ಠ_ಠ".chars().count(), 3);

A string literal is a reference to an immutable string of text, typically stored in memory that is mapped as read-only. It is impossible to modify a str:

let mut s = "hello";

s[0] = 'c'; // error: the type `str` cannot be mutably indexed

s.push('\n'); // error: no method named `push` found for type `&str`

For creating new strings at run time, there is the standard String type.

String

&str is very much like &[T]: a fat pointer to some data. String is analogous to Vec<T>.

Vec<T>

String

automatically frees buffers

yes

yes

growable

yes

yes

::new() and ::with_capacity() static methods

yes

yes

.reserve() and .capacity() methods

yes

yes

.push() and .pop() methods

yes

yes

range syntax v[start..stop]

yes, returns &[T]

yes, returns &str

automatic conversion

&Vec<T> to &[T]

&String to &str

inherits methods

from &[T]

from &str

Like a Vec, each String has its own heap-allocated buffer that isn’t shared with any other String. When a String variable goes out of scope, the buffer is automatically freed, unless the String was moved.

There are several ways to create Strings.

§ The .to_string() method converts an &str to a String. This copies the string.

let error_message = "too many pets".to_string();

§ The format!() macro works just like println!(), except that it returns a new String instead of writing text to stdout, and it doesn’t automatically add a newline at the end.

§ assert_eq!(format!("{}°{:02}′{:02}″N", 24, 5, 20),

"24°05′20″N".to_string());

As it happens, this would work fine without the .to_string() call, because the String can automatically convert to &str.

§ Arrays, slices, and vectors of strings have two methods, .concat() and .join(sep), that form a new String from many strings:

§ let bits = vec!["vini", "vidi", "vici"];

§ assert_eq!(bits.concat(), "vinividivici");

assert_eq!(bits.join(", "), "vini, vidi, vici");

The choice sometimes arises of which type to use: &str or String. Chapter 4 addresses this question in detail. For now it will do to point out that an &str can refer to any slice of any string, whether it is a string literal (stored in the executable) or a String (allocated and freed at run time). This means that &str is more appropriate for function arguments when the caller should be allowed to pass either kind of string.

Using strings

Strings support the == and != operators. Two strings are equal if they contain the same characters in the same order (regardless of whether they point to the same location in memory).

assert_eq!("ONE".to_lowercase() == "one", true);

Strings also support the comparison operators <, <=, >, and >=, as well as many useful methods which you can find in the on-line documentation by searching for str. (Or just flip to [Link to Come].)

assert_eq!("peanut".contains("nut"), true);

assert_eq!("ಠ_ಠ".replace("ಠ", "■"), "■_■");

assert_eq!(" clean\n".trim(), "clean");

for word in "vini, vidi, vici".split(", ") {

assert!(word.starts_with("vi"));

}

Other string-like types

Rust guarantees that strings are valid UTF-8. Sometimes a program really needs to be able to deal with strings that are not valid Unicode. This usually happens when a Rust program has to interoperate with some other system that doesn’t enforce any such rules. For example, in most operating systems it’s easy to create a file with a filename that isn’t valid Unicode. What should happen when a Rust program comes across this sort of filename?

Rust’s solution for these cases is to offer a few string-like types for these particular situations. Stick to String and &str for Unicode text; but

§ when working with filenames, use std::path::PathBuf and &Path instead;

§ when working with binary data that isn’t character data at all, use Vec<u8> and &[u8];

§ when interoperating with C libraries that use null-terminated strings, use std::ffi::CString and &CStr.

Beyond the basics

Types are a central part of Rust. We’ll continue talking about types and introducing new ones throughout the book.

In particular, Rust’s user-defined types give the language much of its flavor, because that’s where methods are defined. There are three kinds of user-defined type, and we’ll cover them in three successive chapters: structs in [Link to Come], enums in Chapter 7, and traits in [Link to Come].

Functions and closures have their own types, covered in [Link to Come]. And the types that make up the standard library are covered throughout the book. For example, [Link to Come] presents the standard collection types.

All of that will have to wait, though. Before we move on, it’s time to tackle the concepts that are at the heart of Rust’s safety rules.