Other Features of Hack - Hack and HHVM (2015)

Hack and HHVM (2015)

Chapter 4. Other Features of Hack

Hack has four major features that make the language different from PHP in fundamental ways: typechecking, collections, async, and XHP. Beyond those, though, there’s a wide range of smaller features that are designed to simplify certain common patterns or to address minor gaps.

Enums

An enum (short for enumeration) is a collection of related constants. Unlike simply creating several constants or class constants, creating an enum results in a new type: you can use the names of enums in type annotations. They also offer functionality like getting an array of all valid names or values, without resorting to heavyweight reflection APIs.

The syntax for an enum is: the keyword enum, followed by a name for the enum, then a colon, then either int or string (which will be the enum’s underlying type), then a brace-enclosed, semicolon-separated list of enum members. Each member is a name, followed by an equals sign, and then a value (which must match the enum’s underlying type).

enum CardSuit : int {

SPADES = 0;

HEARTS = 1;

CLUBS = 2;

DIAMONDS = 3;

}

Enum names have the same restrictions as class names (in regards to what characters they may contain, etc.), and it’s an error to have a class and an enum with the same name.

The names of enum members have the same restrictions as class constant names. The names must be unique within the enum; if there are two members with the same name, the typechecker will report an error, and HHVM will raise a fatal error.

The values of enum members must be scalars; that is, it must be possible to evaluate them statically. This is the same restriction that applies to class constants. The values don’t have to be unique within the enum. The only wrinkle if you have non-unique values is that calling getNames() on the enum (see Enum Functions) will throw an InvariantException.

You access the values with syntax similar to the syntax for class constants:

function suit_for_card_index(int $index): CardSuit {

if ($index < 13) {

return CardSuit::SPADES;

} else if ($index < 26) {

return CardSuit::HEARTS;

} else if ($index < 39) {

return CardSuit::CLUBS;

} else {

return CardSuit::DIAMONDS;

}

}

Enums are distinct types. For example, even though the underlying type of CardSuit is int, you can’t treat an int like a CardSuit, and vice versa.

function takes_int(int $x): void {

}

function takes_card_suit(CardSuit $suit): void {

}

function main() {

takes_int(CardSuit::SPADES); // Error

takes_card_suit(1); // Error

}

To convert a value of enum type to its underlying type, just use a regular PHP cast expression. To convert in the other direction, use the special enum functions assert() or coerce(), described below.

You can make it so that an enum type can be implicitly converted to its underlying type, by adding the keyword as and repeating the underlying type, just before the opening curly brace.

enum CardSuit : int as int {

SPADES = 0;

HEARTS = 1;

CLUBS = 2;

DIAMONDS = 3;

}

function takes_int(int $x): void {

}

function main(): void {

takes_int(CardSuit::HEARTS); // OK

}

One benefit of enums over class constants is that when a value of enum type is used as the controlling expression of a switch statement, the typechecker can ensure that all cases are handled. If some cases aren’t handled, the typechecker will report an error, telling you which cases are missing.

<?hh // strict

enum CardSuit : int {

SPADES = 0;

HEARTS = 1;

CLUBS = 2;

DIAMONDS = 3;

}

function suit_symbol(CardSuit $suit): string {

switch ($suit) {

case CardSuit::SPADES:

return "\xe2\x99\xa4";

case CardSuit::CLUBS:

return "\xe2\x99\xa7";

}

}

/home/oyamauchi/test.php:10:13,17: Switch statement nonexhaustive; the

following cases are missing: HEARTS, DIAMONDS (Typing[4019])

/home/oyamauchi/test.php:2:6,13: Enum declared here

Adding a default label will silence the error; you don’t have to explicitly handle all the enum members. Note that if you explicitly handle all cases and also have a default label, the typechecker will warn you that the default is redundant.

Enum Functions

As we’ve seen so far, enums act like pseudo-classes. They share classes’ name space, and their members are accessed with the same syntax. There’s one more similarity: every enum has six static methods that are used for getting information about the enum’s members, and converting arbitrary values to the enum type.

For example, if you’re passed an int and you want to use it as a CardSuit, you can do this:

function takes_card_suit(CardSuit $suit) {

// ...

}

function legacy_function(int $suit) {

$enum_suit = CardSuit::coerce($suit);

if ($enum_suit !== null) {

takes_card_suit($enum_suit);

}

}

These are all the methods. The return types assume that this is about an enum named ExampleEnum.

§ assert(mixed $value): ExampleEnum: if $value is of the enum’s underlying type and is a member of the enum, this returns $value, cast to the enum type. If it’s not, this throws an UnexpectedValueException.

§ assertAll(Traversable<mixed> $value): Container<ExampleEnum> calls assert() with every value in the given Traversable (see Core Interfaces) and returns a Container of the resulting correctly-typed values (or throws an UnexpectedValueException if any of the values aren’t members of the enum).

§ coerce(mixed $value): ?ExampleEnum is like assert(), but returns null if $value isn’t a member of the enum, instead of throwing an exception.

§ getNames(): array<ExampleEnum, string> returns an array mapping from the enum members’ values to their names. This will throw an InvariantException if the enum values are not unique within the enum.

§ getValues(): array<string, ExampleEnum> returns an array mapping from the enum members’ names to their values.

§ isValid(mixed $value): bool returns whether $value is a member of the enum.

Type Aliases

Type aliases are a way to give a new name to an existing type. There are two kinds of type alias—transparent and opaque—corresponding to two different reasons why you might want to rename a type.

Transparent Type Aliases

If you’re frequently using a complex type, you can give it a simple alias, both to reduce visual complexity and character count, and to make its true meaning clearer. For example, if you use the type Map<int, Vector<int>>, it may be clearer to give it an alias likeUserIDToFriendIDsMap. This is what transparent aliases are for.

The syntax is simple: the keyword type, followed by the new name for the type, an equals sign, and the type you’re renaming, which is called the underlying type.

type UserIDToFriendIDsMap = Map<int, Vector<int>>;

This declaration must be at the top level of a file, not inside any other statements. The type on the right of the equals sign can be any valid type annotation. Once the type alias is defined, the new name can be used in type annotations. Type aliases share name space with classes: it’s an error to have a type alias with the same name as a class.

Transparent alias types can be implicitly converted to their underlying types, and vice versa:

type transparent = int;

function make_transparent(int $x): transparent {

return $x; // OK: implicit conversion of int to transparent

}

function takes_int(int $x): void {

}

function main(): void {

$t = make_transparent(10);

takes_int($t); // OK: implicit conversion of transparent to int

}

Opaque Type Aliases

The other reason to create a type alias is that you may be using a primitive type with a special meaning. A very common example of this is using integers as user IDs. You can make a type alias called userid to distinguish integers being used as user IDs from other integers, which can help prevent mistakes where an integer representing something else, like a count or a timestamp, is used as a user ID.

Another example of this is with string types. You could define a type alias of string called sqlstring and use it in the interface to your SQL database, to prevent accidentally using a query string that hasn’t been properly escaped. (Another example of this kind of distinction is in Secure by Default.)

Opaque aliases are meant for this purpose. The difference between transparent and opaque aliases is that opaque alias types cannot be converted to their underlying type, and vice versa, except in the file where the alias is defined.

The syntax for opaque aliases is the same as for transparent aliases, except that the keyword type is replaced by newtype. The same restrictions apply: the alias type can’t have the same name as a class, and the declaration must be at the top level of a file.

newtype userid = int;

To demonstrate how to use an opaque alias, suppose we have one file that defines the alias, plus a conversion function:

newtype opaque = int;

function make_opaque(int $x): opaque {

return $x;

}

Note that the code in this file is allowed to implicitly convert the underlying type to the alias type—it returns a value of type int from a function whose return type is opaque, and the typechecker allows this.

In another file, we try to use it:

function takes_int(int $x): void {

}

function takes_opaque(opaque $x): void {

}

function main(): void {

takes_int(make_opaque(10)); // Error

takes_opaque(20); // Error

}

As this example shows, if you want an opaque alias to be useful outside its file, you have to define some way to convert between the alias type and the underlying type, in the same file. Otherwise, there will be no way for code in other files to convert between them, or to create values of the alias type.

Since opaque aliases are meant to be used for semantically significant aliases—like aliasing int as userid—forcing the use of an explicit conversion function is a feature, since it prevents accidental usage of a garden-variety integer as a user ID. The conversion function is also a good place to do verification: for example, you chould check that the passed-in integer is a plausible user ID by making sure it’s not negative.

START YOUR USER IDS HIGH

If you’re starting a new web app from scratch (i.e. with a blank database), here’s a very simple thing you can do that will instantly eliminate a whole class of insidious bugs for the rest of the app’s life. If you’re allocating user IDs using an autoincrement column in a database table (which is a very typical, reasonable thing to do), set the autoincrement value to something astronomically high before adding any rows to it. By “astronomically high”, I mean 248 or something in that neighborhood. (You can express that in code as 1 << 48.)

This way, it’s very unlikely that you’ll have non-user-ID integers that look like user IDs floating around your code. Array indices, array counts, and string lengths cannot be that high in PHP and Hack. Unix timestamps probably won’t be that high either, unless you’re dealing with dates 8.9 million years in the future. And there’s no need to worry about wasting too much ID space—starting at 248 still leaves you with 9 billion billion possible IDs.

Having done that, you can define an opaque type alias newtype userid = int, and in the conversion function, verify that the supposed user ID is greater than 248, and be almost certain that it’s valid.

An opaque alias can have a constraint type added to it, which allows code outside the file where the alias is defined to implicitly convert the alias type to the constraint type, but not vice versa. Often, the constraint type is the same as the underlying type.

The syntax for this is to add, between the alias type name and the equals sign, the keyword as and a type annotation (the constraint type).

For example, in one file, we define aliases:

newtype totally_opaque = int;

newtype with_constraint as int = int;

function make_totally_opaque(int $x): totally_opaque {

return $x;

}

function make_with_constraint(int $x): with_constraint {

return $x;

}

In another file, we try to use them:

function takes_int(int $x): void {

}

function takes_totally_opaque(totally_opaque $x): void {

}

function takes_with_constraint(with_constraint $x): void {

}

function main(): void {

takes_int(make_totally_opaque(20)); // Error

takes_int(make_with_constraint(20)); // OK

takes_totally_opaque(20); // Error

takes_with_constraint(20); // Error

}

This feature is useful when bridging legacy code with new code that uses opaque aliases. You can make an opaque alias userid with underlying type int, but you may still have legacy code which passes around user IDs as integers. To make things easier, you can add a constraint to the type alias so you can seamlessly pass values of type userid to functions that expect int.

Autoloading Type Aliases

Type aliases can be autoloaded by HHVM’s enhanced autoloading system, which is described in Enhanced Autoloading.

Array Shapes

There’s a very common pattern in PHP codebases of using arrays as pseudo-objects. For example, instead of defining a User class with properties for the user’s ID and name, code will simply pass around arrays with keys 'id' and 'name' to represent users.

Array shapes are a way to tell the Hack typechecker about the structure of an array in cases like this. The typechecker can verify that the array has the right set of keys, and the keys map to values of the right types.

The syntax for an array shape declaration is the keyword shape, followed by a parenthesis-enclosed, comma-separated list of key-value pairs. Each pair is a key—either a string literal or a class constant whose value is an integer or a string—followed by the token =>, followed by a type annotation. The only place where a shape expression is legal is on the right-hand side of a type alias (see Type Aliases).

type user = shape('id' => int, 'name' => string);

A shape is really just an array with special tracking by the typechecker. To create a shape, use the same syntax as the array() syntax for creating arrays, but use the shape keyword instead. The resulting value is an array whose keys and value types are tracked. If you pass a shape tois_array(), it will return true.

function make_user_shape(int $id, string $name): user {

return shape('id' => $id, 'name' => $name);

}

// This works also

function make_user_shape(int $id, string $name): user {

$user = shape();

$user['id'] = $id;

$user['name'] = $name;

return $user;

}

Note that within the body of the second version of the function, the value $user doesn’t conform to the user shape declaration until after the third line. This is not a problem; the typechecker only enforces conformance with the shape declaration when it’s checked against a type annotation—in this case, at the point of the return statement.

In the user example, both fields are required; if either of them is absent from the shape when it’s checked against an annotation, the typechecker will report an error.

<?hh

type user = shape('id' => int, 'name' => string);

function make_user_shape(int $id, string $name): user {

$user = shape();

$user['id'] = $id;

return $user;

}

/home/oyamauchi/test.php:7:10,14: Invalid return type (Typing[4057])

/home/oyamauchi/test.php:6:3,19: The field 'name' is missing

/home/oyamauchi/test.php:4:24,27: The field 'name' is defined

There’s currently no way to make a field truly optional. The closest available option is to make a field’s type nullable. In that case, the typechecker won’t complain if the field is absent when the shape is checked, but then reading from the field at runtime will result in an E_NOTICE-level error (undefined index). The best option is to make the field’s type nullable, and explicitly store null to the field if there’s no real value to store.

When reading fields of a shape, the typechecker will report an error if the field you’re accessing isn’t part of the shape’s declaration. (These examples assume the same definition of user as above.)

function log_user_data(user $user): void {

$id = $user['id'];

$name = $user['name'];

$is_admin = $user['is_admin']; // Error: the field 'is_admin' is missing

printf("%d(%s)(%d)", $id, $name, $is_admin);

}

When a shape is checked for conformance with a shape declaration, it will fail if it has any keys that aren’t part of the declaration.

$user = shape();

$user['id'] = 123;

$user['name'] = 'Your Benefactor';

$user['is_admin'] = true;

log_user_data($user); // Error: the field 'is_admin' is defined

If you use hh_client --type-at-pos on a shape, it will only say [shape]. To reiterate: a shape is just an array whose keys are tracked by the typechecker. No enforcement is done until a shape has to pass through a type annotation (i.e. when it’s passed to a function, returned from a function, or assigned to a property).

To facilitate tracking for shapes, the typechecker puts some restrictions on what you can do with them:

§ You can’t read or write with unknown keys. That is, you can’t do things like echo $shape[$key] or $shape[$key] = 10, even if $key is known statically. The expression between the square brackets must be either a string literal or a class constant whose value is an integer or a string—the same restriction as is placed on the keys in the shape description.

§ You can’t use the append operator, as in $shape[] = 10.

§ Shapes don’t implement Traversable or Container (see Core Interfaces). As such, you can’t iterate over a shape with foreach.

Lambda Expressions

Lambda expressions offer a straightforward simplification of PHP closure syntax, which has the downside that you have to name all the variables that the closure should capture from the enclosing scope. Lambda expressions create closures with all the necessary variables automatically captured.

For example, suppose you have an array of user IDs and you want to use array_map() to look up a User object for each ID.

$id_to_user_map = /* ... */

$user_ids = /* ... */

$users = array_map(

$user_ids,

function ($id) use ($id_to_user_map) { return $id_to_user_map[$id]; }

);

We can rewrite the closure in the last line as a lambda expression, like this:

$users = array_map($user_ids, $id ==> $id_to_user_map[$id]);

Notice that there’s no function keyword, no use list, no return keyword, and no curly braces. The variable $id_to_user_map is automatically captured from the enclosing scope, with no need to explicitly specify it. All captured variables are captured by value; it’s not possible to capture by reference using lambda expressions.

The syntax is based around the new operator ==>. To its left is the list of arguments to the closure. If there’s only one argument without a type annotation, all you need is a variable name, as in the example above. If you have zero arguments, more than one argument, any argument with a type annotation, or a return type annotation, you have to put parentheses around the argument list. Here’s an example with two type-annotated arguments and a return type annotation:

usort(

$players,

(Player $one, Player $two): int ==> $one->getScore() - $two->getScore()

);

To the right of the ==>, you can have one of two things: either an expression, or a brace-enclosed list of statements. If it’s just an expression, the value of that expression is what gets returned from the closure, as in the above example. If it’s a list of statements, you can use a normal returnstatement to return a value.

Here’s an example of the list-of-statements syntax:

array_map($players, $player ==> {

$total = 0;

foreach ($player->getScores() as $score) {

$total += $score;

}

return $total;

});

Other than the lack of capture by reference, there’s one more thing that you can do with regular closure syntax but not with lambda expressions: use variable variables. The language runtime has to inspect the closure’s body statically to determine which variables to capture, and in the presence of variable variables, it can’t do so.

$one = /* ... */

$other = /* ... */

$local_reader = function ($index) use ($one, $other) {

$name = ($index === 0 ? 'one' : 'other');

return $$name;

};

With a lambda expression, the language runtime would have no way of knowing it should capture $one and $other, and there would be no way to tell it. If you rewrote the closure in the example as a lambda expression, the variables $one and $other would be undefined in the lambda’s body and reading from them would result in “undefined variable” warnings.

Constructor Parameter Promotion

Constructor parameter promotion is a simple feature designed to reduce boilerplate code in constructors. If your codebase uses classes heavily, you probably have a lot of code like this:

class Employee {

private $id;

private $name;

private $department;

public function __construct($id, $name, $department) {

$this->id = $id;

$this->name = $name;

$this->department = $department;

}

}

This is bad because everything the class needs to store is repeated in four places: once as a property, once as a parameter of the constructor, and twice in the assignment expression in the body of the constructor. Constructor parameter promotion reduces the four down to one. The code above can be rewritten like this:

class Employee {

// Nothing needed here

public function __construct(private $id, private $name, private $department) {

// Nothing needed here

}

}

The syntax is very simple: all you have to do is put one of the access modifier keywords private, protected, or public before a parameter of the constructor. Promoted parameters can coexist with regular parameters, and they can be interleaved.

In addition to declaring a parameter of the constructor, the syntax declares a property of the same name with the given access modifier, and assigns the argument to the property. You can still put code in the body of the constructor, and it will run after the assignments are done.

This is compatible with type annotations: just put the type annotation between the access modifier keyword and the name. The type annotation applies to both the property and the parameter. You can also add default values for promoted parameters.

class User {

public function __construct(private int $id, private string $name = '') {

}

}

Attributes

Attributes are a syntactic extension that let you add metadata to functions, methods, classes, interfaces, and traits. You can access this metadata through small additions to the normal PHP reflection APIs.

Attributes are a structured substitute for information that is often encoded in documentation comments. Instead of requiring a separate program to extract this information, it becomes available programmatically through reflection.

/**

* MyFeatureTestCase

*

* @owner oyamauchi

* @deprecated

*/

class MyFeatureTestCase extends TestCase {

// ...

}

/**

* MyFeatureTestCase with attributes

*/

<<Owner('oyamauchi'), Deprecated>>

class MyFeatureTestCaseWithAttributes extends TestCase {

// ...

}

Attribute Syntax

Each attribute is a key mapping to an array of values. The keys are strings, and values are scalars (null, boolean literals, numeric literals, string literals, or arrays of those).

The syntax is very simple. Immediately before a function, method, class, interface, or trait, put attributes inside two pairs of angle brackets. Each attribute, at its simplest, is just a key: an unquoted string.

<<DarkMagic>>

function summon_demon() {

// ...

}

NOTE

Attribute keys beginning with two underscores are reserved for special use by the runtime and typechecker. Three such attributes exist in Hack/HHVM 3.6 (described in the next section), and there may be more in future.

To access this attribute, use ReflectionFunction’s method getAttributes() and getAttribute(). ReflectionClass and ReflectionMethod have the same methods.

$function = new ReflectionFunction('summon_demon');

echo "All attributes: \n";

var_dump($function->getAttributes());

echo "Just DarkMagic: \n";

var_dump($function->getAttribute('DarkMagic'));

All attributes:

array(1) {

["DarkMagic"]=>

array(0) {

}

}

Just DarkMagic:

array(0) {

}

If you call getAttribute() to read an attribute that isn’t there, it will return null with no error. Other than that, calling getAttribute($name) is equivalent to calling getAttributes() and indexing into the returned array with $name.

To add attribute values, add a parenthesis-enclosed, comma-separated list of scalars immediately after the attribute name:

<<Magic('dark')>>

function summon_demon() {

// ...

}

<<Magic('curse', 'dark')>>

function banish_to_eternal_void() {

// ...

}

$function = new ReflectionFunction('banish_to_eternal_void');

var_dump($function->getAttributes());

array(1) {

["Magic"]=>

array(2) {

[0]=>

string(5) "curse"

[1]=>

string(4) "dark"

}

}

You don’t have to declare attribute names anywhere before using them. They’re really little more than parseable comments.

Special Attributes

There are three attributes that are treated specially by the Hack typechecker and by HHVM. The two leading underscores in their names indicate that they’re special; this convention is reserved for use by special builtin attributes.

__Override

When this attribute is applied to a method, the Hack typechecker will check that the method is overriding a method from one of its ancestor classes. If it’s not overriding, the typechecker will report an error. Note that the method being overridden must be in a Hack file; if the method being overridden is in a PHP file, the typechecker can’t see it, and it will report an error.

__Override can be applied to methods defined in traits. The restriction won’t be enforced in the trait itself, but it will be enforced in any class that uses the trait—consistent with traits’ copy-and-paste semantics.

HHVM doesn’t treat this attribute specially; it won’t cause any runtime errors.

__ConsistentConstruct

In Hack, a child class’ __construct() method doesn’t have to have a signature that matches its parent class’ __construct() method. This is intentional; it’s perfectly reasonable for a child class to have different needs for its constructor. This can hide problems, though, in cases where constructors are being called polymorphically—as in new static().

A good example is in the factory pattern. There’s an abstract base class with several static factory methods, each of which calls new static(). Each child class is supposed to implement a constructor with the same signature.

<<__ConsistentConstruct>>

abstract class Reader {

protected function __construct(resource $file) { }

public static function fromFile(string $path): this {

return new static(fopen($path, 'r'));

}

public static function fromString(string $str): this {

$tmpfile = tmpfile();

fwrite($tmpfile, $str);

fseek($tmpfile, 0);

return new static($tmpfile);

}

abstract public function readItem(): mixed;

}

class BufferedReader extends Reader {

protected function __construct(resource $file) {

// Fill buffer ...

}

public function readItem(): mixed {

// ...

}

}

class TokenReader extends Reader {

// ...

}

Without __ConsistentConstruct, the child classes could have the wrong constructor signature and the Hack typechecker wouldn’t be able to report an error for it. Because the typechecker can’t tell which constructor will be invoked by the new static() call, it can’t fully typecheck the call. But with __ConsistentConstruct, the typechecker will report an error for constructors with non-matching signatures, so you know (indirectly) that the new static() call is typesafe.

This attribute is only significant to the typechecker; HHVM doesn’t treat it specially.

__Memoize

Unlike the other two special attributes, this one is treated specially by HHVM, and ignored by the Hack typechecker. It lets you use the common pattern of memoization, with assistance from the runtime that makes it more efficient than it can be with PHP or Hack code alone.

Memoization is a pattern of caching the result of a time-consuming computation. It’s often implemented like this:

function factorize_impl($num) {

// Some factorization algorithm

}

function factorize($num) {

static $cache = array();

if (!isset($cache[$num])) {

$cache[$num] = factorize_impl($num);

}

return $cache[$num];

}

Most of the code shown in the example is boilerplate, and the __Memoize attribute lets you remove all of it. Here’s the alternative, which lets the runtime manage the cache for you:

<<__Memoize>>

function factorize($num) {

// Some factorization algorithm

}

You can memoize functions or methods. There are a few restrictions:

§ You can’t memoize variadic functions (i.e. functions that take a variable number of arguments).

§ You can’t memoize functions that take any arguments by reference.

§ All arguments to the memoized function must be one of these types: bool, int, float, string, the nullable version of any of the previous types, an object of a class that implements the special interface IMemoizeParam, or an array or collection of any of the previous types.

IMemoizeParam declares a single non-static method: getInstanceKey(): string. The job of this method is to turn the object into a string that can be used as an array key in the memoization cache.

There are some things to watch out for when using __Memoize. First, be aware that it’s a time-memory tradeoff. It can make code faster by reducing the amount of computation it does, but it will also increase memory usage. This is not always the right tradeoff.

Second, HHVM makes no guarantees about when it will actually execute a memoized function, as opposed to simply returning a value from the cache. Don’t assume that the body of the function will only execute once for a given argument. HHVM is allowed, for example, to delete entries from the cache to free up memory—in fact, this is an advantage of using __Memoize instead of implementing memoization yourself.

Finally, note that HHVM doesn’t try to make sure that the function you’re memoizing has no side effects, or that it returns the same result for the same arguments no matter how many times it’s called. Both of these properties are important for a function being memoized; if they don’t hold, memoization might visibly change the program’s behavior.

Enhanced Autoloading

PHP provides autoloading for classes, and HHVM supports this, through both __autoload() and spl_autoload_register(). HHVM provides an additional feature that allows autoloading for functions and constants, in both PHP and Hack, plus autoloading for type aliases (see Type Aliases) in Hack only.

This feature has another advantage over PHP’s autoloading mechanisms: it can do its job without running any PHP code, so its performance is generally better. A successful autoload can be done entirely within the runtime, using just two hashtable lookups. For that reason, if you’re using HHVM, using this feature instead of PHP autoloading is strongly recommended.

The interface to this enhanced autoloading is the function autoload_set_paths(), in the HH namespace. It takes two arguments: an autoload map (which is an array), and a root path (which is a string). When HHVM needs to autoload something, it will perform the lookups in the autoload map.

The autoload map is an array. There are five optional string keys that are significant:

§ The keys 'class', 'function', 'constant', and 'type' each map to arrays. Those inner arrays—sub-maps—have keys that are names of entities (classes, functions, constants, and types, respectively), and values that are file paths where the corresponding entities can be found.

§ The key 'failure' maps to a callable value—the failure callback—that will be called if lookup in the above keys fails.

Here’s an example that sets up the autoload map and calls a function that isn’t loaded:

function autoload_fail(string $kind, string $name): ?bool {

// ...

}

function setup_autoloading(): void {

$map = array(

'function' => array('extricate' => 'lib/extricate.php')

);

HH\autoload_set_paths($map, __DIR__ . '/');

}

setup_autoloading();

extricate();

When the function extricate() is called, the runtime looks in the 'function' sub-map of the autoload map for the 'extricate' entry. When it finds the entry, it appends the file path to the root path, loads the file at that combined path, and continues execution.

If anything about that procedure fails—if the 'function' sub-map isn’t present, or the 'extricate' entry isn’t present, or the file doesn’t exist, or the file doesn’t actually contain a definition of extricate()—the failure callback is called. If it returns true, the runtime tries to callextricate() again, assuming the failure callback loaded it. If it didn’t , or if the failure callback returns false or null, the runtime declares failure, and raises a fatal error for an undefined function.

The failure callback gets passed two arguments: first, a string identifying the kind of entity being autoloaded ('class', 'function', 'constant', or 'type'), and second, a string with the entity’s name.

The most intuitive way to understand the whole algorithm is with a flowchart; see Figure 4-1.

A flowchart

Figure 4-1. Autoloading a function

There are two situations in which the algorithm is slightly different:

§ If the entity being autoloaded is a class, returning false and null from the failure callback cause different behaviors. If the callback returns false, the behavior is the same as in the function case: a fatal is immediately raised. But if the callback returns null, HHVM falls back to the standard PHP autoload mechanisms: __autoload() and the SPL autoload queue.

§ If the entity being autoloaded might be a type alias, HHVM will first try the 'class' sub-map, then the 'type' sub-map, then the failure callback with first argument 'class', then the failure callback with first argument 'type'. This is because any entity that could be a type alias can also be a class.

The only time an entity to autoload might be a type alias is during enforcement of a parameter type annotation or a return type annotation.

As a final note, the failure callback shouldn’t be routinely used for actual loading; it should be used mostly for error logging. The whole autoloading process is slower if the runtime has to fall back to the failure callback.

Integer Arithmetic Overflow

In PHP, if integer arithmetic operations overflow, the result is a float:

var_dump(PHP_INT_MAX + 1); // float(9.2233720368548E+18)

This is bad for performance: it means that almost every arithmetic operation in a program has to be checked for overflow, even though overflow is extremely unlikely in practice. It’s also questionable from a program-logic standpoint: in the example above of PHP_INT_MAX + 1, the conversion to float causes an immediate loss of precision.

HHVM includes a mode to make integer arithmetic follow the rules of two’s-complement arithmetic at runtime. This means the result of adding, subtracting, or multiplying two integers is always an integer[15]. The configuration option to turn this mode on ishhvm.ints_overflow_to_ints.

The Hack typechecker, in fact, always treats integer arithmetic operations as if they followed the rules of two’s-complement arithmetic, and this is not configurable. The justification is that if the typechecker were to follow PHP’s behavior, it would become very difficult to meaningfully typecheck anything involving the results of arithmetic operations. Besides that, the overflow-to-float behavior isn’t used in any other mainstream programming language, and it often surprises newcomers to PHP.

Nullsafe Method Call Operator

Hack adds a new operator for calling methods on an object that may be null. The operator is ?->, in contrast to the usual method call operator ->.

interface Reader {

public function readAll(): string;

}

function read_everything(?Reader $reader): ?string {

return $reader?->readAll();

}

If the value on the left-hand side of the operator is null, there is no warning or error, and the whole expression evaluates to null. Therefore, the type of this expression is the nullable version of the actual method’s return type.

This operator is very well-suited for chained method calls, since it allows any method in the chain to return null without requiring null checks everywhere, while still being safe from BadMethodCallException.

$name = $comment?->getPost()?->getAuthor()?->getName();

Trait and Interface Requirements

Traits are one of the trickiest areas for the typechecker to navigate. A trait is essentially a bundle of code taken out of context. To be useful, traits must be able to refer to properties and methods that they don’t define—the classes that use the traits will supply them.

To allow for stronger typechecking of traits, Hack provides a feature that allows you to restrict what classes may use a trait. Inside the definition of a trait, you can specify that any classes using it must extend a certain class, or implement a certain interface. This way, the typechecker can verify a property access or method call in a trait by checking it in the context of the classes and interfaces that are allowed to use the trait.

The syntax is require extends ClassName or require implements InterfaceName. These statements go at the top level of the trait definition.

class C {

public function methodFromClass(): void { }

}

interface I {

public function methodFromInterface(): void;

}

trait NoRequire {

public function f(): void {

$this->methodFromInterface(); // Error: could not find method

$this->methodFromClass(); // Error: could not find method

}

}

trait HasRequire {

require extends C;

require implements I;

public function f(): void {

$this->methodFromInterface(); // OK

$this->methodFromClass(); // OK

}

}

If a class uses a trait and doesn’t fulfill the trait’s requirements, the typechecker will report an error. Continuing from the previous example:

class Bad {

use HasRequire; // Error: failure to satisfy requirement

}

Note that require extends really does mean extends; that is, it’s an error for the class named in a trait’s require extends declaration to use that trait. Any class using the trait must be a descendant.

trait T {

require extends C;

}

class C {

use T; // Error

}

In addition to these declarations, Hack also allows traits to implement interfaces. When a trait that implements an interface is used, it behaves as if the “implements” declaration were transferred onto the class using the trait, and all the attendant restrictions are enforced. This is very similar to using require implements in the trait.

interface I {

public function methodFromInterface(): void;

}

trait T implements I {

public function f(): void {

$this->methodFromInterface(); // OK

}

}

class C {

use T; // Error: must provide an implementation for methodFromInterface()

}

Finally, require extends works in interfaces as well. Only classes that descend from the named class are allowed to implement the interface. (Again, this excludes the named class itself.)

interface I {

require extends ParentClass;

}

class ParentClass {

// It would be an error for this class to implement I

}

class ChildClass extends ParentClass implements I { // OK

}

class OtherChild implements I { // Error

}

Silencing Typechecker Errors

Suppose you have a core function, without type annotations, used all over the codebase, and you want to add annotations to it. This might cause type errors at a lot of the function’s callsites. The typechecker gives you a way to add the annotations to the core function anyway, and silence the errors at each callsite that turns out to have an error. That way, you get the annotations in (so that new code using that function will be well-typed) but remain error-clean, while the places you need to fix are easily searchable. This is the HH_FIXME comment.

Every error message reported by the typechecker has a numerical error code, shown at the end of the message. For example, consider this code:

<?hh // strict

function core_function(): int {

return 123;

}

function some_other(): string {

return core_function();

}

This generates the following error from the typechecker:

/home/oyamauchi/hack/test.php:8:10,24: Invalid return type (Typing[4110])

/home/oyamauchi/hack/test.php:7:24,29: This is a string

/home/oyamauchi/hack/test.php:3:27,29: It is incompatible with an int

The error code is 4110, the number shown in square brackets. (The word “Typing” just denotes the general category of the error, and isn’t part of the error code – there’s only one error 4110 in any category.)

To silence this error, add the comment /* HH_FIXME[4110] */ either before the function signature or before the return statement. You should also add an explanation of why the error needs to be silenced, within the comment, after the closing square bracket.

Any of the following versions of some_other() will silence the error:

function some_other(): int {

/* HH_FIXME[4110] from core_function return type */

return core_function();

}

function some_other(): string {

/* HH_FIXME[4110] from core_function return type */ return core_function();

}

/* HH_FIXME[4110] from core_function return type */

function some_other(): string {

return core_function();

}

/* HH_FIXME[4110] core_function return type */ function some_other(): string {

return core_function();

}

The syntax of the comment is precise. It must be a C-style /* */ comment; shell-style # comments and C++-style // comments won’t work. It has to start with HH_FIXME followed by the error code in square brackets.

An HH_FIXME will silence the given error on the line containing the first non-whitespace non-comment character after the end of the HH_FIXME comment. (In the example above, there are two pieces of code that work together to cause the error: the return statement and the return type annotation. Silencing either piece silences the whole error.)

You can apply multiple HH_FIXME comments to a single line by having multiple HH_FIXME comments before the line in question. For example:

function f(?void $nonsense): int {

return 'oops';

}

The error output for this is:

/home/oyamauchi/test.php:2:12,16: ?void is a nonsensical typehint (Typing[4066])

/home/oyamauchi/test.php:3:10,11: Invalid return type (Typing[4110])

/home/oyamauchi/test.php:2:26,28: This is an int

/home/oyamauchi/test.php:3:10,11: It is incompatible with a string

To silence both, do this:

/* HH_FIXME[4110] just an example */

/* HH_FIXME[4066] just an example */

function f(?void $nonsense): int {

return 'oops';

}

Only the specific error(s) identified in the comment(s) will be silenced, which is ideal – if some other error crops up on the silenced line, you’ll still hear about it. The error codes will remain stable across versions of the typechecker, so these comments won’t break with new versions.


[15] The mathematical phrase is “integers are closed under addition, subtraction, and multiplication”.