Async - Hack and HHVM (2015)

Hack and HHVM (2015)

Chapter 7. Async

Typical web apps will need to start time-consuming external operations and wait for them to finish. They make queries to databases, which can involve waiting for a server across a network to read from a spinning disk. They might use external APIs, which can involve making HTTP or HTTPS requests across the Internet. These can take a lot of time, and if the app can’t multitask, it will waste time waiting for those operations to finish, since it can’t do anything useful in the meantime.

It gets worse: if a non-multitasking app has multiple time-consuming operations that could be done simultaneously (e.g. two independent database queries), it can’t. It has to wait for one to finish, then start the next and wait for that to finish, and so on. This adds up quickly and is tremendously wasteful; for high-traffic web apps, some form of multitasking that gets around these problems is a necessity. Some PHP extensions, like cURL and MySQLi, have support for executing multiple operations at a time, but they don’t interoperate with each other.

In Figure 7-1, the two queries could run in parallel, but with no way to multitask, they must run one at a time.

A timeline

Figure 7-1. Two database queries, without async

Like PHP, Hack doesn’t support multithreading, so web apps in Hack need some other form of multitasking.

That’s the purpose of async. It offers a way to implement cooperative multitasking, in which tasks voluntarily and explicitly cede the CPU to each other. The opposite is preemptive multitasking, in which tasks are forcibly interrupted by the task manager.

Cooperative multitasking has several advantages over preemptive multitasking. Preemptive multitasking requires significant care to use safely. In the preemptive model, concurrency safety has to be pervasive; you have to protect critical sections and synchronize access to shared memory. In cooperative multitasking, none of that applies. Because each task gets to control when it yields to other tasks, it doesn’t have to go out of its way to protect critical sections: all it has to do is avoid breaking them.

Async provides syntax for giving up the CPU to other async tasks, as well as infrastructure within HHVM (the scheduler) that manages the cooperative multitasking, deciding which async tasks to run and when. Figure 7-2 shows how cooperative multitasking can significantly reduce the end-to-end time of the two queries from Figure 7-1, by doing the second query while waiting for the first to complete.

A timeline

Figure 7-2. The same two queries, with async

In this chapter, we’ll see what async functions look like, how to use them, how to structure your code around them, and how to use the async extensions that HHVM provides.

Introductory Examples

In this section, we’ll look at a few small examples of async functions, just to give you an idea of what they look like and how async code is structured. We’ll gloss over most of the details for now, and get into all the specifics in the rest of the chapter.

There are two syntactic differences between async functions and regular functions. Async functions have the async keyword immediately before the function keyword in their headers, and they can use the await keyword in their bodies. Here’s the simplest possible example of an async function:

async function hello(): Awaitable<string> {

return 'hello';

}

Methods, both static and non-static, can be async as well:

class C {

public static async function hello(): Awaitable<string> {

return 'hello';

}

public async function goodbye(): Awaitable<string> {

return 'goodbye';

}

}

Closures can also be async, whether they use PHP closure syntax or Hack lambda expression syntax (see Lambda Expressions):

$hello = async function(): Awaitable<string> { return 'hello'; };

$goodbye = async () ==> 'goodbye';

There are two important things to note about all these examples. First, async functions don’t necessarily need to be inherently asynchronous at all; the examples are all returning a constant result. Second, async functions have a special return type. The bodies of these functions look as if they’re returning strings, but at runtime, that’s not what happens. A call to an async function returns an object that represents a result that may or may not be ready—an object that implements the interface Awaitable, as the return type annotations say. From that object, you can retrieve the value that the async function gives to its return statement.

NOTE

The return types of async functions are unique in that the type argument to Awaitable is not erased at runtime. The runtime checks values passed to return statements in async functions against that type argument, and raises catchable fatal errors if the check fails, like it does with any other runtime type annotation failure.

<?hh // decl

// Decl mode to silence typechecker error

async function f(): Awaitable<string> {

// Catchable fatal error at runtime

return 100;

}

You get the value out of the Awaitable object by using the other part of the async function infrastructure: awaiting. An async function can use the keyword await to await the result of an asynchronous operation. The expression after the keyword must evaluate to an object that implements the interface Awaitable. An obvious example of such an object is the return value of another async function.

async function hello(): Awaitable<string> {

return 'hello';

}

async function hello_world(): Awaitable<string> {

$hello = await hello();

return $hello . ' world';

}

In the function hello_world(), the first expression to be evaluated is the function call hello()—an ordinary function call. As we saw above, this returns not the string 'hello', but an object that represents a result that may or may not be ready. Then, the await keyword declares to the runtime: wait for that result to be ready, and then return it.

The runtime handles checking to see if the result is ready, and waiting if it isn’t ready. If it’s not ready, the runtime can suspend the execution of hello_world(): it stops executing the function, saves its execution state, and picks up execution somewhere else—in another async function that’s waiting to run, if any.

Once the result is ready—that is, once the function hello() has executed a return statement—the scheduler can resume the execution of hello_world(). It restores the saved execution state of hello_world() and begins running the body of hello_world() at the point after theawait expression. The await expression evaluates to whatever hello() passed to its return statement—that is, the string 'hello'. That value is assigned to the local variable $hello, and execution continues as normal from there.

This is a trivial example, though. Moving up from the syntax level, there are two things you have to do in order to reap benefits from async: use async extension functions, and await multiple asynchronous operations simultaneously.

HHVM provides async extension functions for four kinds of operations: queries to MySQL databases, queries to memcached, cURL requests, and reads and writes of stream resources. Here are examples of the async MySQL and cURL APIs:

async function fetch_from_web(): Awaitable<string> {

return await HH\Asio\curl_exec('https://www.example.com/');

}

async function fetch_from_db(int $id): Awaitable<string> {

$conn = await AsyncMysqlClient::connect(

'127.0.0.1', 3306, 'example', 'admin', 'hunter2'

);

$result = await $conn->queryf('SELECT name FROM user WHERE id = %d', $id);

return $result->mapRows()[0]['name'];

}

Note how similar this async code looks to equivalent non-async code—if you just removed the new keywords and changed the class and function names to use non-async extensions, it would be equivalent non-async code. There’s no need to think about threading or synchronization. Theasync and await keywords are the only substantial difference: instead of simply calling a function that performs a long-running operation, you await it.

The other key to benefiting from async is to await multiple asynchronous operations at the same time. Running the two async functions above at the same time looks like this:

async function fetch_all(): Awaitable<string> {

list($web, $db) = await HH\Asio\v(array(fetch_from_web(), fetch_from_db());

return $web . $db;

}

We’ll examine everything going on here in detail in the rest of this chapter, but now you have a high-level idea of how async code looks.

Async in Detail

Before getting started, if you’re going to use async extensively, we highly recommend that you install asio-utilities, a library of async helper functions. We’ll look at the contents of this library as we go. You can use async without it, but it makes code significantly more concise.

The recommended way to download and install the library is through Composer, a package manager for PHP and Hack. Add this to your composer.json file:

"require": {

"hhvm/asio-utilities": "~1.0"

}

Wait Handles

The concept of a wait handle is central to the way async code works. A wait handle is an object that represents a possibly-asynchronous operation that may or may not have completed. If it has completed, you can get a result from the wait handle. If not, you can await the wait handle.

Wait handles are represented by the generic interface Awaitable. There are several classes that implement this interface, but they’re implementation details, and you shouldn’t rely on their specifics.

The two most important kinds of wait handle are:

§ Ones representing async functions. To get one of these, simply call an async function.

§ async function f(): Awaitable<int> {

§ // ...

§ }

§

§ async function main(): Awaitable<void> {

§ $wait_handle = f();

§ // $wait_handle is a wait handle; a value of type Awaitable<int>

§

§ $result = await $wait_handle;

§ // $result is an int; the await "unwraps" the Awaitable

}

§ Ones representing multiple other wait handles. To get one of these, use the async helper functions[23] HH\Asio\v() when you have an indexed list of wait handles, like a Vector or an array with consecutive integer keys; and HH\Asio\m() when you have an associative mapping of wait handles, like a Map or an array with string keys.

§ async function triple(float $number): Awaitable<float> {

§ return $number * 3.0;

§ }

§

§ async function triple_v(): Awaitable<void> {

§ $handles = array(

§ triple(3.0),

§ triple(4.0),

§ );

§ $result = await HH\Asio\v($handles);

§

§ var_dump($result[0]); // Prints: float(9)

§ var_dump($result[1]); // Prints: float(12)

§ }

§

§ async function triple_m(): Awaitable<void> {

§ $handles = array(

§ 'three' => triple(3.0),

§ 'four' => triple(4.0),

§ );

§ $result = await HH\Asio\m($handles);

§

§ var_dump($result['three']); // Prints: float(9)

§ var_dump($result['four']); // Prints: float(12)

}

HH\Asio\v() turns a Vector or array of awaitables into an awaitable Vector. Likewise, HH\Asio\m() turns a Map or array of awaitables into an awaitable Map.

For a non-async function to get a result out of a wait handle, there’s a function in asio-utilities called HH\Asio\join() [24]. It takes one argument, an Awaitable. The function waits for the awaitable to complete, then returns its result.

async function f(): Awaitable<mixed> {

// ...

}

function main(): void {

$result = HH\Asio\join(f());

}

You shouldn’t call HH\Asio\join() inside an async function—if you do, that awaitable and its dependencies will run to completion synchronously, with none of your currently in-flight awaitables getting a chance to run. If you’re in an async function, and you have a wait handle whose result you want, just await it.

Async and Callable Types

In Hack’s Type System, we saw that Hack has syntax for annotating the types of callable values. In this example, you must pass f() a function that takes an integer and returns a string:

function f((function(int): string) $callback): void {

// ...

}

function main(): void {

$good = function (int $x): string { return (string)$x; };

f($good); // OK

$bad = function (array $x): int { return count($x); };

f($bad); // Error

}

You might now ask: how do you do this for async functions? How would f(), in the example above, specify that you must pass it an async function as a callback?

The answer is that you can’t, for good reason. The async-ness of a function is an implementation detail of that function. Putting the async keyword on a function does two things:

§ It allows the function to use the await keyword in its body—an implementation detail, and not something that should matter to any code outside the function.

§ It forces the function’s return type to be Awaitable. The return type does matter to code outside the function, but what matters is just the return type, not the function’s async-ness.

To return to the example above, f() can specify that the callback must return an Awaitable<string>. This will allow, but not require, async functions to be passed as the callback:

function f((function(int): Awaitable<string>) $callback): void {

// ...

}

To make the reason for this restriction clearer, consider another implementation detail of functions: whether they’re closures or not. Allowing f() in the example above to specify that you must pass it an async function would be just as silly as allowing it to specify that you must pass it a closure.

For the same reason, you can’t declare abstract methods, or methods in interfaces, to be async. You can, of course, declare them as non-async but with Awaitable as their return type.

interface I {

public async function bad(): Awaitable<void>; // Error

public function good(): Awaitable<void>; // OK

}

abstract class C {

abstract public async function bad(): Awaitable<void>; // Error

abstract public function good(): Awaitable<void>; // OK

}

await Is Not an Expression

Although await behaves like an expression in several ways, it’s not a general expression. There are only three syntactic positions where it can appear:

§ As an entire statement by itself.

§ async function f(): Awaitable<void> {

§ await other_func();

}

§ On the right hand side of a normal assignment or list assignment statement.

§ async function f(): Awaitable<void> {

§ $result = await other_func();

§ list($one, $two) = await yet_another_func();

}

§ As the argument of a return statement:

§ async function f(): Awaitable<mixed> {

§ return await other_func();

}

If you use await anywhere else, it’s a syntax error. So, for example, you can’t do this:

async function f(): Awaitable<void> {

var_dump(await other_func()); // Syntax error

}

This restriction may be lifted in future. It exists now because of implementation issues[25].

Async Generators

Generators were introduced in PHP 5.5. On the surface, they look quite similar to async functions. Both features introduce a special kind of function that has the ability to stop executing partway through, in such a way that it can pick up where it left off later.

However, the two features are orthogonal: like any other function, generators can be async. Here’s an example that implements a countdown clock, yielding once per second. (We’ll see HH\Asio\usleep() in Sleeping.)

async function countdown(int $start): AsyncIterator<int> {

for ($i = $start; $i >= 0; --$i) {

await HH\Asio\usleep(1000000); // Sleep for 1 second

yield $i;

}

}

The most important thing to note here is the return type annotation: AsyncIterator<int>. This signifies that you can iterate over the value returned from countdown(), and the values you get out of the iteration are integers.

However, this is an async iterator, not a regular iterator. There’s some new syntax to iterate over an async iterator: await as.

async function use_countdown(): Awaitable<void> {

$async_gen = countdown();

foreach ($async_gen await as $value) {

// $value is of type int here

var_dump($value);

}

}

The await as syntax is shorthand for repeatedly doing await $async_gen->next(), just as the normal foreach syntax is shorthand for repeatedly calling next() on a normal iterator.

If you want to yield a key from an async generator as well, use the interface AsyncKeyedIterator. It has two type arguments: the key type and the value type. To iterate over one of these, you also use await as.

async function countdown(int $start): AsyncKeyedIterator<int, string> {

for ($i = $start; $i >= 0; --$i) {

await HH\Asio\usleep(1000000);

yield $i => (string)$i;

}

}

async function use_countdown(): Awaitable<void> {

foreach (countdown(10) await as $num => $str) {

// $num is of type int

// $str is of type string

var_dump($num, $str);

}

}

Finally, if you want to call the send() or raise() methods on an async generator, you need to use the interface AsyncGenerator instead. It has three type arguments: the key type, the value type, and the type you want to pass to send().

async function namifier(): AsyncGenerator<int, string, int> {

// Get the first id

$id = yield 0 => '';

// $id is of type ?int

while ($id !== null) {

$name = await get_name($id);

$id = yield $id => $name;

}

}

async function use_namifier(array<int> $ids): Awaitable<void> {

$namifier = namifier();

await $namifier->next();

// Note: this is poorly structured async code!

// For demonstration only. Don't await in a loop.

foreach ($ids as $id) {

$result = await $namifier->send($id);

// $result is of type ?(int, string)

}

}

There are some important things to point out here. First, even though the third type argument to AsyncGenerator is int, the result of a yield in the async generator is of type ?int. This is because it’s always valid to pass null to send(). (Doing so is equivalent to calling next().)

Second, the result of await $namifier->send($id) is of type ?(int, string). The tuple contains the yielded key and value. The reason it’s a nullable type is that the generator can always implicitly yield null, by means of yield break.

Third, remember that when calling next(), send(), and raise() on an async generator, you have to await them, not just call them.

Fourth, AsyncIterator and friends return actual values from their next() methods, rather than returning void as the non-async Iterator and friends do. The same applies to the send() and raise() methods of AsyncGenerator.

Finally, this code is for demonstration purposes only. Don’t write async code like this. In particular, don’t await in a loop. See Awaiting in a Loop for details. Unfortunately, there are few compelling examples of async generator code now, because there aren’t any extensions that use them. When there are, though, async generators will be an extremely powerful tool. For example, they could be used to implement streaming results from network services.

Exceptions in Async Functions

What we’ve seen so far is fairly straightforward: when you call an async function, it returns a wait handle. When you await a wait handle, you get its result: the value that the async function passed to its return statement. But what if the async function throws an exception?

The answer is that the same exception object will be re-thrown when the wait handle is awaited:

async function thrower(): Awaitable<void> {

throw new Exception();

}

async function main(): Awaitable<void> {

// Does not throw

$handle = thrower();

// Throws an Exception, the same object thrower() threw

await $handle;

}

If you’re using HH\Asio\v() or HH\Asio\m() to await multiple wait handles simultaneously, and one of the component wait handles throws an exception, the combined wait handle will re-throw that exception. If multiple component wait handles throw exceptions, the combined wait handle will re-throw one of them. All of the component wait handles will complete, though (whether they finish normally or throw).

async function thrower(string $message): Awaitable<void> {

throw new Exception($message);

}

async function main(): Awaitable<void> {

// Does not throw

$handles = [thrower('one'), thrower('two')];

// Throws either of the two Exception objects

$results = await HH\Asio\v($handles);

}

Often, this isn’t what you want. In cases like this, you usually want to get the results of the wait handles that succeeded, and just ignore the rest, or communicate failure in a different way.

asio-utilities provides an async function HH\Asio\wrap(), which takes a wait handle as an argument. It will await the wait handle you pass in, catch any exception that it throws, and return an object containing either the result of the passed-in wait handle if no exception was thrown, or the exception object if one was thrown. It does this in the form of an HH\Asio\ResultOrExceptionWrapper.

HH\Asio\ResultOrExceptionWrapper is an interface in asio-utilities, defined like this:

namespace HH\Asio {

interface ResultOrExceptionWrapper<T> {

public function isSucceeded(): bool;

public function isFailed(): bool;

public function getResult(): T;

public function getException(): \Exception;

}

}

§ isSucceeded() indicates whether the inner wait handle exited normally; i.e. by means of return.

§ isFailed() indicates whether the inner wait handle exited abnormally, by means of an exception.

§ getResult() returns the inner wait handle’s result if it exited normally, or re-throws the exception if not.

§ getException() returns the exception that the inner wait handle threw, or throws InvariantException if the inner wait handle didn’t throw.

Here’s an example:

async function thrower(): Awaitable<void> {

throw new Exception();

}

async function wrapped(): Awaitable<void> {

// Does not throw

$handle = HH\Asio\wrap(thrower());

// Does not throw

$wrapper = await $handle;

if ($wrapper->isFailed()) {

// Returns the same Exception object that thrower() threw

$exc = $wrapper->getException();

}

}

CAUTION

The examples in this section have had code like this:

$handle = thrower();

await $handle;

This is only to make it clear that calling the async function doesn’t throw an exception, and awaiting the wait handle does. In general, you shouldn’t separate the call from the await like this. Dropping Wait Handles explains why in detail.

Mapping and Filtering Helpers

When creating multiple wait handles to await in parallel, you’ll often have some collection of values that each need to be converted into a wait handle. Or you may need to filter some of them out. You can use the usual PHP and Hack builtin functions array_map() and array_filter()(or methods on Hack’s collection classes) to do this, but this can make your code a bit verbose.

asio-utilities provides a whole slew of concisely-named functions for processing arrays and collections with async mapping and filtering callbacks. They have names like vm(), vfk(), and mmw(). The names are terse, but these functions are so commonly used in async code that the conciseness is worth the loss of easy readability.

Here’s how to decode the names:

§ The first character is always v or m. This indicates what the function returns: a Vector or a Map.

§ Next, you might see m, mk, f, or fk. These indicate whether the values in the collection will be passed through a mapping (m and mk) or filtering (f and fk) callback. The presence of the k indicates whether the key from the collection will be passed to the callback as well.

§ Finally, there might be a w. If so, the values from the collection are passed through HH\Asio\wrap() after any mapping and filtering has been done.

The first argument is always the input array or collection. (The helpers actually accept Traversable or KeyedTraversable, as appropriate, so you can pass in iterators too.) If the function requires a callback for mapping or filtering, it is the second argument. (None of the functions require more than one callback.)

The mapping and filtering callbacks are async functions. Mapping callbacks must have one parameter, of the collection’s value type; or two parameters, of the collection’s key and value type respectively. They can return any type. Filtering callbacks have the same convention for parameters, and they must return booleans.

Mapping, especially, is very common: you’ll have an async function that does an async operation on a single value, and you’ll map that over an array or collection of values. For this, you would use vm(), vmk(), mm(), mmk(), or any of the above with a w appended. The basic operation of each helper is: create a wait handle for each value by passing it to the async callback, then await all those wait handles in parallel, then put the results into a Vector.

async function fourth_root(num $f): Awaitable<float> {

if ($f < 0) {

throw new Exception();

}

return sqrt(sqrt($f));

}

async function vector_with_mapping(): Awaitable<void> {

$strs = Vector {16, 81};

$roots = await HH\Asio\vm($strs, fun('fourth_root'));

// $roots is Vector {2, 3}

}

async function map_with_mapping_wrapped(): Awaitable<void> {

$nums = Map {

'minus eighty-one' => -81,

'sixteen' => 16,

};

$roots = await HH\Asio\mmw($nums, fun('fourth_root'));

// $roots['minus eighty-one'] is a failed ResultOrExceptionWrapper

// $roots['sixteen'] is a succeeded ResultOrExceptionWrapper with result 2

}

Filtering is less common. You’ll have an async function that results in a boolean, and apply it to all elements of a collection in parallel. For this, you would use vf(), vfk(), mf(), mfk(), or any of the above with a w appended. The basic operation of each helper is: create a wait handle for each value by passing it to the async callback, then filter the original array or collection with the resulting booleans.

async function is_user_admin(int $id): Awaitable<bool> {

// ...

}

async function admins_from_list(Traversable<int> $ids): Awaitable<Vector<int>> {

return HH\Asio\vf($ids, fun('is_user_admin'));

}

Note that HH\Asio\v() and HH\Asio\m() are not part of asio-utilities—they are built into HHVM and always available for use in Hack code.

Table 7-1 shows the full range of helper functions and what they do.

Table 7-1. asio-utilities helper functions

Name

Returns a…

Callback

Passes key to callback?

Wraps exceptions?

v()

Vector

n/a

n/a

No

vm()

Vector

Mapping

No

No

vmk()

Vector

Mapping

Yes

No

vf()

Vector

Filtering

No

No

vfk()

Vector

Filtering

Yes

No

vw()

Vector

n/a

n/a

Yes

vmw()

Vector

Mapping

No

Yes

vmkw()

Vector

Mapping

Yes

Yes

vfw()

Vector

Filtering

No

Yes

vfkw()

Vector

Filtering

Yes

Yes

m()

Map

n/a

n/a

No

mm()

Map

Mapping

No

No

mmk()

Map

Mapping

Yes

No

mf()

Map

Filtering

No

No

mfk()

Map

Filtering

Yes

No

mw()

Map

n/a

n/a

Yes

mmw()

Map

Mapping

No

Yes

mmkw()

Map

Mapping

Yes

Yes

mfw()

Map

Filtering

No

Yes

mfkw()

Map

Filtering

Yes

Yes

Lambda expression syntax (see Lambda Expressions) is very convenient in conjunction with these async helpers; they cut down on the boilerplate required by closure syntax. To rewrite one of the examples above:

async function fourth_root_strings(): Awaitable<void> {

$strs = array('16', '81');

$roots = await HH\Asio\vm($strs, async $str ==> (float)$str);

// $roots is array(2, 3)

}

Structuring Async Code

As we’ve seen, within a single function, async code looks very similar to naïve sequential code, and is just as easy to reason about. On that level, you don’t have to adapt to an unfamiliar new way of thinking.

To get the most benefit out of async, though, the higher-level organization of your code—what to put in which functions, and how to tie those functions together—requires some consideration of data dependencies. This is the idea that in order to generate one piece of data, you need some other piece of data.

In this section, we’ll look at how to break down a program’s logic in terms of data dependencies, and how to translate typical data dependency shapes into async code. We’ll also look at some common antipatterns, and why you should avoid them.

Data Dependencies

In a blogging application, generating a page of a single author’s posts might require a series of queries like this:

1. Fetch the IDs of the author’s posts—maybe all of them, maybe only the first 20 or so.

2. Fetch post data (title, excerpt, etc.) for each post ID.

3. Fetch comment count for each post ID.

The most intuitive way to understand a set of data dependencies is with a graph. Figure 7-3 shows the dependency graph for the scenario above. The arrows follow the direction of data flow; for example, each post ID flows into the fetching of post data, with the direction of the arrow.

A dependency graph

Figure 7-3. Dependency graph for “all posts by author” page

Learning how to structure async code well involves learning to recognize patterns in dependency graphs and translate them into async functions. This scenario has examples of some very common patterns.

§ Put each “chain”—a sequence of dependencies with no branching—into its own async function.

§ Put each bundle of parallel chains into its own async function.

§ Now that each bundle of parallel chains has been reduced to a single function, go back to the first step—there may be a new chain to reduce.

Note that “its own async function” doesn’t have to mean a named function. It’s often the best option, in terms of code cleanliness and readability, to use a closure (remember, closures can be async).

Your goal should be to fit every asynchronous operation that must happen in the course of a page request into this scheme. You should only have to call HH\Asio\join() once, at the very top level of your code, and its result should be all of the output for the page request.

For the “one author’s posts” page, we’ll use this scheme to break down the asynchronous operations into these async functions:

§ One function for each underlying fetch operation: fetching all of the author’s post IDs, fetching individual post data, and fetching comment count.

§ One function that bundles together a post-data-and-comment-count pair of chains. This will be a closure in the top-level function.

§ One top-level function that coordinates all the data fetching.

So this is what the code for the “one author’s posts” page might look like:

async function fetch_all_post_ids_for_author(int $author_id)

: Awaitable<array<int>> {

// Query database, etc.

// ...

}

async function fetch_post_data(int $post_id): Awaitable<PostData> {

// Query database, etc.

// ...

}

async function fetch_comment_count(int $post_id): Awaitable<int> {

// Query database, etc.

// ...

}

async function fetch_page_data(int $author_id)

: Awaitable<Vector<(PostData, int)>> {

$all_post_ids = await fetch_all_post_ids_for_author($author_id);

// An async closure that will turn a post ID into a tuple of

// post data and comment count

$post_fetcher = async function(int $post_id): Awaitable<(PostData, int)> {

list($post_data, $comment_count) =

await HH\Asio\v(array(

fetch_post_data($post_id),

fetch_comment_count($post_id),

));

return tuple($post_data, $comment_count);

};

// Transform the array of post IDs into an array of results,

// using the vm() function from asio-utilities

return await HH\Asio\vm($all_post_ids, $post_fetcher);

}

async function generate_page(int $author_id): Awaitable<string> {

$tuples = await fetch_page_data($author_id);

foreach ($tuples as $tuple) {

list($post_data, $comment_count) = $tuple;

// Render the data into HTML

// ...

}

// ...

}

WHICH FUNCTIONS SHOULD BE ASYNC?

Don’t be afraid to make a function async, even if it usually doesn’t need to await anything, or even if it never awaits anything. There’s no performance penalty for doing so. If it helps the function fit better with your other code, or if it might ever need to be async in the future, make it async.

SMART DATA FETCHING

It’s important to note that this example is just meant to demonstrate how to structure async code, using an easy-to-grasp application. Depending on what your storage backends are and how you have them configured, it might be possible to do this in a single roundtrip to the database, using JOIN queries and such.

At the very least, this example should be establishing a database connection only once, and passing the connection object around[26], instead of having each fetching function, like fetch_post_data(), do so itself.

It’s quite possible to use async when communicating with your storage backends, and still be very inefficient. Async doesn’t give you license to stop thinking about things like caching intelligently, batching fetches, and constructing efficient SQL queries.

Antipatterns

There are a few ways to structure async code that seem very tempting at first, but are actually hampering the async code’s ability to make efficient use of time.

These antipatterns are such because they create false dependencies; i.e. they cause one wait handle to wait for another (usually indirectly) even though it doesn’t need to. Good async code faithfully translates the pure, ideal dependency graph into code.

Awaiting in a Loop

Suppose you have an array of numerical user IDs, and an async function that loads data about a user (from a database, say) given a user ID. You want to turn the array of user IDs into an array of User objects. It’s tempting to do something like this:

async function load_user(int $id): Awaitable<User> {

// Call to memcache, database, ...

}

async function load_users(array<int> $ids): Awaitable<Vector<User>> {

$result = Vector {};

foreach ($ids as $id) {

$result[] = await load_user($id);

}

return $result;

}

This is entirely defeating the purpose of async functions. All the users will be loaded in serial, one after the other, with no parallelism at all. This code is creating a dependency graph that is a single long chain:

Dependency graph

These are false dependencies, though: you don’t need to finish loading the first user before you can start loading the second user. The real dependency graph, in which none of the individual user loads depends on any others, looks like this:

Dependency graph

To express the real dependency graph in code, do this (the vm() function is explained in Mapping and Filtering Helpers):

async function load_users(array<int> $ids): Awaitable<Vector<User>> {

return await HH\Asio\vm($ids, fun('load_user'));

}

In general, if you’re tempted to await in a loop, that’s probably because you have some collection of things to await. In that case, you should use one of the await-a-collection helpers (supplemented with array_map(), array_filter(), etc.) instead of iterating over the collection and awaiting in a loop.

CAUTION

This bears repeating: it’s never correct to await in a loop. This is by far the easiest trap for async beginners to fall into, and it completely erases the benefits of async. Don’t await in a loop.

The Multi-ID Pattern

Let’s go back to the “all posts by one author” example. Suppose that instead of two parallel queries for each post, we need to do two dependent queries; that is, do one query, and use its result to construct another query.

Let’s say, for example, that we want to display the text of the first comment on each post, instead of just the count. First we need to fetch the ID of the first comment on each post, and then fetch the content of those comments[27].

It’s tempting to implement that logic as follows:

async function fetch_first_comment_ids(array<int> $post_ids)

: Awaitable<array<int>> {

// Send a single database query with all post IDs

// ...

}

async function fetch_comment_text(array<int> $comment_ids)

: Awaitable<array<string>> {

// Send a single database query with all comment IDs

// ...

}

async function fetch_all_first_comments(int $author_id)

: Awaitable<array<string>> {

$all_post_ids = await fetch_all_post_ids_for_author($author_id);

$all_comment_ids = await fetch_first_comment_ids($all_post_ids);

return await fetch_comment_text($all_comment_ids);

}

This has the apparent advantage of guaranteeing only two trips to the database, regardless of how many posts you need to fetch data for. But this is poorly structured async code, again because it’s creating false dependencies. Figure 7-4 shows the dependency graph created by this code. In particular, note that fetching the text for any comment indirectly depends on fetching every comment ID, which doesn’t make sense.

Dependency graph

Figure 7-4. Dependency graph for bad first-comments code

The telltale sign of this antipattern is async functions that take multiple IDs, or lookup keys of any form, as arguments. They serve to create these horizontal false dependencies, which act as bottlenecks.

The real dependency graph that we should be creating doesn’t have those horizontal dependencies: fetching each comment’s text depends on fetching that comment’s ID and nothing else. Figure 7-5 shows what the graph should look like.

Dependency graph

Figure 7-5. Correct dependency graph for first-comments page

Translate this into code by following the guidelines above: group chains of dependencies into their own functions. In this case, we group the chain for each post into a closure.

async function fetch_first_comment(int $post_id): Awaitable<int> {

// Send database query with a single post ID

// ...

}

async function fetch_comment_text(int $post_id): Awaitable<string> {

// Send database query with a single comment ID

// ...

}

async function fetch_all_first_comments(int $author_id)

: Awaitable<Vector<string>> {

$all_post_ids = await fetch_all_post_ids_for_author($author_id);

$comment_fetcher = async function(int $post_id): Awaitable<string> {

$first_comment_id = await fetch_first_comment($post_id);

return await fetch_comment_text($first_comment_id);

};

return await HH\Asio\vm($all_post_ids, $comment_fetcher);

}

This code has the potential downside of incurring more roundtrips to the database, because it lacks the ability to send a query for more than one ID at a time. This problem can be solved fairly seamlessly with async; see Batching for details.

The takeaway from these antipatterns should be to always think about the structure of the data first. Let the data inform how you structure the code; don’t write code first and work out the dependencies it creates later.

Other Types of Waiting

Most of the wait handles you deal with will be for async functions and multiple other wait handles, but there are two other kinds that can be useful.

Sleeping

You can use a wait handle to wait for a length of time to pass, while doing nothing on the CPU. This is akin to calling the usleep() builtin function, except that it allows other wait handles to run during the sleep period.

asio-utilities provides a function for sleeping: HH\Asio\usleep(). It takes one argument: the length of time to sleep for, in microseconds [28]:

async function sleepForFiveSeconds(): Awaitable<void> {

echo "start\n";

await HH\Asio\usleep(5000000); // 5 million microseconds = 5 seconds

echo "finish, at least five seconds later\n";

}

Note that the second echo happens at least five seconds later, not exactly five seconds later. When this wait handle sleeps, another one might run that uses the CPU for more than five seconds without awaiting, and the async scheduler can’t interrupt it.

Rescheduling

To reschedule a wait handle means to send it to the back of the async scheduler’s queue—to voluntarily wait until other pending wait handles have run. There are a couple of reasons you might want to do this: to interleave polling loops with other async operations, and to do batching.

Polling

Ideally, your code would do all asynchronous work through async extensions. However, you may need to use some service that doesn’t have a corresponding async extension. You may be able to use rescheduling to make such services work harmoniously with your async code.

The key is that you must be able to make nonblocking calls to the service. If you can, you can use rescheduling in your polling loop to allow other wait handles to run after unsuccessful polls.

asio-utilities provides a function for rescheduling: HH\Asio\later(). It takes no arguments. All you have to do is call and await it:

async function poll_for_result(PollingService $svc): Awaitable<mixed> {

while (!$svc->isReady()) {

await HH\Asio\later();

}

return $svc->getResult();

}

If there are no other wait handles running, this amounts to a busy loop of polling. Depending on how expensive it is to poll, and the expected latency of the service, you may want to sleep in this situation instead, using HH\Asio\usleep().

Batching

If you’re doing some high-latency operation that can benefit from batching—database queries are a good example—rescheduling can help you here too. The key is that you write an async function that does a batched operation after rescheduling, to give other wait handles a chance to add their items to the batch.

In this example, suppose that our underlying asynchronous operation is a key-value lookup that requires a roundtrip over a network to a storage server. Each roundtrip is high-latency, but you can send multiple keys in a single request without increasing the overall time taken. (memcachedbehaves somewhat like this, but we won’t use its specific API.)

The code that uses this operation will look like this:

async function one(string $key): Awaitable<string> {

$subkey = await Batcher::lookup($key);

return await Batcher::lookup($subkey);

}

async function two(string $key): Awaitable<string> {

return await Batcher::lookup($key);

}

async function main(): Awaitable<void> {

$results = await HH\Asio\v(array(one('hello'), two('world')));

echo $results[0];

echo $results[1];

}

If Batcher::lookup() simply did the lookup operation immediately, executing both one() and two() would result in a combined total of three roundtrips to the storage server. However, there’s an optimization opportunity: if we could perform the first lookup in one() and the lookup intwo() in a single roundtrip, we could complete everything with only two roundtrips, total.

Here’s an implementation of the Batcher class that can do this:

class Batcher {

private static array<string> $pendingKeys = array();

private static ?Awaitable<array<string, string>> $waitHandle = null;

public static async function lookup(string $key): Awaitable<string> {

// Add this key to the pending batch

self::$pendingKeys[] = $key;

// If there's no wait handle about to start, create a new one

if (self::$waitHandle === null) {

self::$waitHandle = self::go();

}

// Wait for the batch to complete, and get our result from it

$results = await self::$waitHandle;

return $results[$key];

}

private static async function go(): Awaitable<array<string, string>> {

// Let other wait handles get into this batch

await HH\Asio\later();

// Now this batch has started; clear the shared state

$keys = self::$pendingKeys;

self::$pendingKeys = array();

self::$waitHandle = null;

// Do the multi-key roundtrip

return await multi_key_lookup($keys);

}

}

The private static property $waitHandle represents a batched roundtrip that is about to start. The public method, lookup(), checks to see if a batched roundtrip is about to start; if not, it creates a new one by calling go(). It awaits the batched roundtrip, then retrieves the result it’s interested in.

The await HH\Asio\later() in go() is the key to the batching. It functions as a “last call” for other wait handles that want to do lookups, causing go() to be deferred until other pending wait handles have run.

Consider the example of one() and two(). The proceedings start with this line:

$results = await HH\Asio\v(array(one('hello'), two('world')));

Both one() and two() are pending. Suppose one() gets to run first. It calls lookup(), which calls go(), which reschedules. The runtime looks for other wait handles it can run; two() is still pending, so that runs, calls lookup(), and gets suspended when it executes await self::$waitHandle (because that wait handle is already running).

After that, go() resumes, does it fetching, and returns its result. Both pending instances of lookup() receive their results, and pass them back to one() and two().

Common Mistakes

As we’ve seen, writing async code is broadly similar to writing normal sequential code. However, there are a few common traps you can fall into.

Dropping Wait Handles

When you call an async function, it returns a wait handle. When you await this wait handle, the async function’s body will execute to completion. But what happens if you don’t await the wait handle?

async function speak(): Awaitable<void> {

echo "one";

await HH\Asio\later();

echo "two";

echo "three";

}

function main(): void {

$handle = speak();

// Don't await it; just drop it

}

main();

How much of speak() will execute? In other words, what will be echoed?

The possible answers are nothing, one, and onetwothree. In addition, the answer you get is not guaranteed to be consistent between runs. It can also change based on the version of HHVM you’re running, the state of any other in-flight async functions, and the activities of butterflies flapping their wings on the other side of the world.

That is to say, the runtime has a lot of leeway to decide what to do. It is only allowed to suspend speak() when it encounters an await expression. Within that constraint, it may suspend and resume speak() as many times as it wants. This is to give the async scheduler the flexibility to arrange async execution as it sees fit, but it does mean that you have to be careful to await any wait handle that you create. Failing to await a wait handle will result in unpredictable behavior. Awaiting a wait handle guarantees that it will run to completion.

You may feel tempted to do something like this to implement detached tasks—that is, you want to start a task and let it run, but you don’t want to block anything else on waiting for it to finish. Non-essential logging in a web application is a common thing that tempts people to do this. Async doesn’t provide a way to detach tasks. The only way to force a wait handle to run is to await it, and there’s no way to await a wait handle without potentially blocking.

Even if you await all wait handles that you create, it’s still possible to see their effects in different orders. In this example, any side effects (writing to the output buffer, network or disk I/O, etc.) of some_unrelated_stuff() may happen before or after any side effects ofsome_async_function().

async function f(): Awaitable<void> {

$handle = some_async_function();

some_unrelated_stuff();

await $handle;

}

Generally, separating the creating of wait handles from awaiting them is discouraged; the creation and awaiting of a wait handle should happen as close together as possible. The example above would be better written as:

async function f(): Awaitable<void> {

some_unrelated_stuff();

await some_async_function();

}

Don’t assume, because you observe the “correct” ordering of effects once, that they will always happen in that order. The ordering can change between two executions of the same code. To avoid having to be concerned about this, you should generally try not to write async functions that have side effects whose order is important. If you want to enforce that two things happen in a specific order, you must create a dependency between them using await.

ASYNC DOESN’T CREATE THREADS

From the perspective of Hack code, the world is single-threaded, just like in PHP. An async function is not a thread; multiple async functions will not run in parallel. A single PHP/Hack environment’s code will not run on multiple CPU cores. (HHVM does run multiple web requests in parallel using system-level threads, but the PHP/Hack environments in those threads can’t substantively interact with each other.)

The async extensions may be using threads behind the scenes, but that’s an implementation detail, not visible to Hack code.

Of course, there are times when you should use threads for parallelism—when you’re doing CPU-intensive work that can be broken down into several tasks that need to synchronize with each other occasionally. In those cases, async will not help you, and in fact Hack is probably not the right language for the job.

Memoizing Async Functions

Since async functions are designed to be used with time-consuming operations, they are a natural fit with memoization. Memoization is a common programming pattern where the result of an expensive operation is cached, so that it can be returned cheaply the next time it’s needed:

function time_consuming_op_impl(): string {

// ...

}

function time_consuming_op(): string {

static $result = null;

if ($result === null) {

$result = time_consuming_op_impl();

}

return $result;

}

The special attribute __Memoize (see Special Attributes) will behave correctly when applied to an async function. When you want memoization, you should generally use that attribute. If you have a good reason not to (needing fine control over the memoization cache, for example), read on.

When manually memoizing async functions, there is a serious potential mistake to be aware of, which can result in a race condition. The key thing to remember is: memoize the wait handle, not the result.

Memoizing the result is the most intuitively obvious thing to do, like this:

async function time_consuming_op_impl(): Awaitable<string> {

// ...

}

async function time_consuming_op(): Awaitable<string> {

static $result = null;

if ($result === null) {

$result = await time_consuming_op_impl(); // Wrong! Bad!

}

return $result;

}

There’s a race condition here. Suppose there are two other async functions, one() and two(), that are both in the async scheduler queue, and they are both going to await time_consuming_op(). Then the following sequence of events can happen:

1. one() gets to run, and awaits time_consuming_op().

2. time_consuming_op() finds that the memoization cache is empty ($result is null), so it awaits time_consuming_op_impl(). It gets suspended.

3. two() gets to run, and awaits time_consuming_op(). Note that this is a new wait handle; it’s not the same wait handle as in step 1.

4. time_consuming_op() again finds that the memoization cache is empty, so it awaits time_consuming_op_impl() again. Now the time-consuming operation will be done twice.

If time_consuming_op_impl() has side effects—maybe it’s a database write—then this could end up being a serious bug. Even if there are no side effects, it’s still a bug; the time-consuming operation is being done multiple times when it only needs to be done once.

The root cause of the bug is that time_consuming_op() may get suspended between checking the cache and filling the cache. By checking the cache and finding it empty, it derives a fact about the state of the world: the operation has not yet completed. But after awaiting, and thus possibly getting suspended, that fact may no longer be true: the invariant that was supposed to hold inside the if block is violated.

As I said before, the correct solution is to memoize the wait handle, not the result:

async function time_consuming_op(): Awaitable<string> {

static $handle = null;

if ($handle === null) {

$handle = time_consuming_op_impl(); // Don't await here!

}

return await $handle; // Await here instead

}

This may seem unintuitive, because the function awaits every time it’s executed, even on the cache-hit path. But that’s fine: on every execution except the first, $handle is not null, so a new instance of time_consuming_op_impl() will not be started. The result of the one existing instance will be shared.

The race condition is gone. The sequence of events listed above is no longer possible: time_consuming_op() can’t be suspended between finding the cache empty and filling the cache. one() and two() will end up awaiting the same wait handle: the one that’s cached intime_consuming_op(). It’s not an error for this to happen; they will both wait for it to finish, and will both receive the result once it’s ready.

Async Extensions

In this section, we’ll look at each of the four async extensions included with HHVM 3.6: MySQL, MCRouter, cURL, and streams.

The language-level components of async have been around for several versions prior to 3.6, but these extensions are new in 3.6[29]. Some of them aren’t feature-complete yet, but they’ll improve in future versions.

MySQL

The async MySQL extension is an object-oriented MySQL API, reminiscent of the mysqli extension that comes with PHP and HHVM. We won’t cover it in full detail here; we’ll just look at the most important parts—establishing connections, using connection pools, making queries, and reading results.

Connecting and Querying

You start out with the class AsyncMysqlClient. It has a static async method connect() that creates a connection to a MySQL database. The signature looks like this:

class AsyncMysqlClient {

public static async function connect(

string $host,

int $port,

string $dbname,

string $user,

string $password,

int $timeout_micros = -1

): Awaitable<?AsyncMysqlConnection>;

}

The five required parameters are all the standard MySQL connection parameters: hostname, port, database name, username, and password. The last parameter is optional: the connection timeout in microseconds. A value of -1, the default, means to use the default timeout (which is 1 second in HHVM 3.6); a value of 0 means no timeout.

connect() results in an AsyncMysqlConnection (or null if there was an error establishing the connection). AsyncMysqlConnection has two async methods to query the database: query() and queryf(). query() just takes a string containing a query, and a timeout (following the same convention as connect()’s timeout, except that the default is 60 seconds).

queryf() is what you’ll be using most of the time, because it takes a query string with placeholders, and substitutes values for the placeholders, after appropriate escaping. It’s a variadic method: pass the query string as the first argument, and values for the placeholders as subsequent arguments.

async function fetch_user_name(int $user_id): Awaitable<string> {

$conn = await AsyncMysqlClient::connect(

'127.0.0.1',

3306,

'example',

'admin',

'hunter2',

);

if ($conn !== null) {

$result = await $conn->queryf('SELECT name FROM user WHERE id = %d', $user_id);

// ...

}

}

The full range of available placeholders is:

§ %T: a table name

§ %C: a column name

§ %s: a string

§ %d: an integer

§ %f: a float

§ %=s: nullable string comparison. If you pass a string, this will expand to = 'the string'; if you pass null, it will expand to IS NULL.

§ %=d: nullable integer comparison.

§ %=f: nullable float comparison.

§ %Q: raw SQL; the string you pass will be substituted in unescaped. This can be very dangerous, as it opens the possibility of SQL injection, which can be a serious security vulnerability. Avoid using it if at all possible.

The Hack typechecker understands queryf() query strings, and typechecks calls to queryf() to ensure that you’re passing the right number of arguments, and that the arguments have the right types:

async function do_something(AsyncMysqlConnection $conn): Awaitable<void> {

// Error: too few arguments

$result = await $conn->queryf('SELECT * FROM user WHERE id = %d');

}

The typechecker intentionally doesn’t recognize the placeholder %Q, to discourage its use. If you really need to use it, you can silence the error with an HH_FIXME comment (see Silencing Typechecker Errors).

queryf() will be getting support for more placeholder types in future, such as the %L family—%Ld for a list of integers, %Ls for a list of strings, etc.

Connection Pools

An important restriction of AsyncMysqlConnection is that you can’t make multiple queries over a single connection in parallel. That’s something you’ll often want to do when using async. The solution is to use AsyncMysqlConnectionPool. A connection pool is a collection of reusable connection objects; when a client requests a connection from the pool, it may get one that already exists, which avoids the overhead of establishing a new connection.

CAUTION

In HHVM earlier than 3.6.3, connection pools have a significant bug that can cause spurious timeouts. If you use connection pools, make sure you’re using HHVM 3.6.3 or later.

Create a connection pool like this:

$pool = new AsyncMysqlConnectionPool(array());

The constructor takes one argument, which is an array of configuration options. The possible options are:

§ per_key_connection_limit: the maximum number of connections allowed in the pool for a single combination of hostname, port, database, and username. Default: 50.

§ pool_connection_limit: the maximum number of connections allowed in the pool, total. Default: 5000.

§ idle_timeout_micros: the maximum amount of time, in microseconds, that a connection will be allowed to sit idle in the pool before being destroyed. Default: 4 seconds.

§ age_timeout_micros: the maximum age, in microseconds, that a connection in the pool will be allowed to reach before being destroyed. Default: 60 seconds.

§ expiration_policy: a string, either 'IdleTime' or 'Age', that specifies whether connections in the pool will be destroyed based on their idle time or age. Default: 'Age'.

For example, to create a pool with at most 100 connections and expires by idle time:

$pool = new AsyncMysqlConnectionPool(

array(

'pool_connection_limit' => 100,

'expiration_policy' => 'IdleTime',

)

);

Once you have a pool created, get connections from it by calling and awaiting its async method connect(), with the same set of arguments as you would pass to AsyncMysqlConnection::connect().

<<__Memoize>>

function get_pool(): AsyncMysqlConnectionPool {

return new AsyncMysqlConnectionPool([]);

}

async function get_connection(): Awaitable<?AsyncMysqlConnection> {

return await get_pool()->connect(

'127.0.0.1',

3306,

'example',

'admin',

'hunter2',

);

}

Query Results

The results of query() and queryf() are instances of the class AsyncMysqlResult. This is an abstract class; its two most important concrete subclasses are AsyncMysqlQueryResult and AsyncMysqlErrorResult.

AsyncMysqlQueryResult has four (non-async) methods for getting results: mapRows(), vectorRows(), mapRowsTyped(), and vectorRowsTyped(). All four methods return a Vector of rows. The “map” or “vector” part refers to how each row is represented. mapRows() andmapRowsTyped() return rows as Maps, mapping column names to values. vectorRows() and vectorRowsTyped() return rows as Vectors, containing values in the order they were specified in the query.

async function fetch_user_name(AsyncMysqlConnection $conn, int $user_id)

: Awaitable<string> {

$result = await $conn->queryf('SELECT name FROM user WHERE id = %d', $user_id);

invariant($result->numRows() === 1, 'exactly one row in result');

$map = $result->mapRows();

// The result you want is in $map['name']

$vector = $result->vectorRows();

// The result you want is in $vector[0]

}

The “typed” in the method names refers to how you want values from non-string columns represented. For example, if you have a column defined as type INTEGER in SQL, mapRowsTyped() and vectorRowsTyped() will return values from that column as integers in Hack, whereasmapRows() and vectorRows() will return values from that column as string representations of integers.

If the query resulted in an error, the result of query() or queryf() will be an AsyncMysqlErrorResult object. This class has three important non-async methods for determining what happened:

§ failureType(): returns one of two strings, 'TimedOut' or 'Failed'. The latter signifies any failure other than a timeout.

§ mysql_errno(): the numerical MySQL error code for the problem.

§ mysql_error(): a human-readable string describing the problem.

Updated documentation for the async MySQL extension is at http://docs.hhvm.com/manual/en/book.hack.async.mysql.php.

MCRouter and memcached

MCRouter is an open-source project developed by Facebook. It is a memcached protocol routing library, providing a wide variety of features that aid in scaling a memcached deployment: connection pooling, prefix-based routing, online configuration changes, and many more. It speaks thememcached ASCII protocol, and sits transparently between clients and memcached instances. It’s available at https://github.com/facebook/mcrouter.

A full exploration of how to use MCRouter is beyond the scope of this book. Here, we’ll simply be using the MCRouter library as a memcached client. The MCRouter extension mimics the Memcache and Memcached extensions that are part of PHP and Hack[30]. The MCRouter extension doesn’t support all operations that memcached and MCRouter itself support (cas, compare-and-swap, being one of the major omissions), but this support will improve in future versions.

The extension is centered around the class MCRouter, which represents a memcached client. There are two ways to get a MCRouter object: through the constructor (more flexible), or through the static method createSimple() (more convenient). These are the signatures:

class MCRouter {

public function __construct(array<string, mixed> $options, string $pid = '');

public static function createSimple(ConstVector<string> $servers): MCRouter;

}

The constructor behaves differently depending on whether $pid—for persistence ID—is empty. If $pid is empty, the constructor starts a transient client and returns an object representing it. If $pid is not empty, the extension looks for a client that already exists with that persistence ID, and returns one if it finds it; if not, it starts a new client with that persistence ID. Generally, transient clients should only be used for debugging and testing, not production.

The $options parameter is used to configure any new clients that are started. It must have one of the keys 'config_str' (mapping to a JSON configuration string) or 'config_file' (mapping to a string containing the path to a JSON configuration file). More information on how to configure MCRouter is in the MCRouter source repository and on its GitHub page.

MCRouter::createSimple() is a streamlined way to create a client; just pass it a Vector (see Chapter 6) of strings with server addresses where memcached is running. The strings are a hostname, followed by a colon, followed by a port number, such as '127.0.0.1:11211'.

MCRouter, the class, has async methods with names that mirror commands in the memcached ASCII protocol. They throw exceptions on failure (which includes things like getting a key that doesn’t exist), so the function HH\Asio\wrap() from asio-utilities comes in handy around this API. For example:

function fetch_user_name(MCRouter $mcr, int $user_id): Awaitable<string> {

$key = 'name:' . $user_id;

$cached_result = await HH\Asio\wrap($mcr->get($key));

if ($cached_result->isSucceeded()) {

return $cached_result->getResult();

}

// Fall back to querying database

// ...

}

There are async methods for several core memcached protocol commands:

§ get() to read the value for a given key.

§ set() to write a value, overwriting if a value already exists for the given key.

§ add() to write a value, but fail if a value already exists for the given key.

§ replace() to write a value, but fail if the value doesn’t already exist for the given key.

§ append() and prepend() to append or prepend data to the value for a given key.

§ incr() to atomically increment a numeric value.

§ del() to delete a key.

§ version() to get the remote server’s version.

Updated documentation for the async MCRouter extension is at http://docs.hhvm.com/manual/en/book.hack.mcrouter.php.

cURL

cURL is a library for transferring data to and from resources identified by URLs. In practice, it’s most often used to make HTTP and HTTPS requests.

The async cURL API in Hack consists of two functions:

async function curl_multi_await(resource $mh, float timeout = 1.0)

: Awaitable<int>;

namespace HH\Asio {

async function curl_exec(mixed $urlOrHandle): Awaitable<string>;

}

HH\Asio\curl_exec() is a convenience wrapper around curl_multi_await(). You can pass it a cURL handle (i.e. something returned from curl_init()), or a string containing a URL (in which case it will create the cURL handle for you), and it will execute the cURL handle asynchronously and return its result.

curl_multi_await() is the async equivalent of curl_multi_select(). It waits until there is activity on any of the cURL handles that are part of $mh, which must be a cURL multi handle (i.e. something returned from curl_multi_init()). When it completes, indicating that there was activity on at least one of the cURL handles, you process it with curl_multi_exec(), just as you do in non-async code.

Streams

This is the simplest of the async extensions. It consists of a single function, called stream_await(). Its job is to wait until a stream becomes readable or writable.

async function stream_await(resource $fp, int $events, float timeout = 0.0)

: Awaitable<int>;

§ $fp is the stream to watch for changes. It must be backed by a normal file, socket, tempfile, or pipe. Memory streams and user streams aren’t supported.

§ $events is one of the global constants STREAM_AWAIT_READ or STREAM_AWAIT_WRITE, or both of them bitwise-OR’ed together. It signifies what kind of change to watch for in the stream: watch for it to become readable (i.e. fread() on the stream will not block) or writable (i.e.fwrite() on the stream will not block). Note that a stream that is at end-of-file is considered readable, because fread() will not block.

§ $timeout is the maximum length of time, in seconds, to wait. If this is zero, the async function completes immediately; it’s really just a query for the status of the stream.

The result of the function is an integer indicating the current state of the stream, one of these four global constants:

§ STREAM_AWAIT_CLOSED: the stream is now closed.

§ STREAM_AWAIT_READY: the stream is now readable or writable (depending on what was passed as $events).

§ STREAM_AWAIT_TIMEOUT: the stream is in the same state as before, but the timeout triggered.

§ STREAM_AWAIT_ERROR: an error occurred.

stream_await() is similar to stream_select() in functionality—waiting for a stream to enter an interesting state—but it doesn’t have the multiplexing functionality of stream_select(). You can use HH\Asio\v() to await multiple stream wait handles simultaneously, but the resulting combined wait handle won’t complete until all of its constituent stream wait handles have completed. You can work around this, by wrapping putting the call to stream_await() inside another async function that uses the stream’s result:

async function read_all(array<resource> $fps): Awaitable<void> {

$read_single = async function(resource $fp) {

$status = await stream_await($fp, STREAM_AWAIT_READ, 1.0);

if ($status == STREAM_AWAIT_READY) {

// Read from stream

// ...

}

};

await HH\Asio\v(array_map($read_single, $fps));

}


[23] These functions are built into HHVM; they’re not part of asio-utilities. You can use them without installing the library.

[24] HH\Asio\join() is part of asio-utilities, but in future, it will be built into HHVM. In general, asio-utilities is where the team tests new async APIs before building them into HHVM itself.

[25] As it is, with the restrictions on where await can appear, there’s no way for an async function to get suspended in the middle of evaluating an expression. If await could appear anywhere, we would confront the issue of how to efficiently store the intermediate evaluation state of the expression, which isn’t as straightforward as it may sound.

[26] See MySQL for details on the async MySQL API.

[27] This may seem odd, since a typical, normalized database schema wouldn’t require the intermediate step of fetching comment IDs. However, in denormalized schemas—which have their merits, and are used in practice—this might not be possible.

[28] Measuring time on computers is always a tricky business. The timespan for which HH\Asio\usleep() actually sleeps may not be accurate to the microsecond, for various reasons, not least of which is the fact that the “clock” that underlies it varies according to what is available in the operating system and hardware where HHVM is running.

[29] Async has been extensively used within Facebook for some time, but with internal-only async extensions.

[30] There are two extensions for talking to memcached. Memcached is newer and supports more memcached features, so it’s generally recommended for use over Memcache.