Generics - Hack and HHVM (2015)

Hack and HHVM (2015)

Chapter 3. Generics

Generics are a powerful feature of Hack’s type system that allow you to write typesafe code without knowing what types will be flowing through it. A class or function can be generic, which means that it lets the caller specify what types flow through it.

The best examples of generic constructs are arrays and collection classes (see Chapter 6 for detail on collection classes). Without the ability to specify the type of an array’s contents, any value that results from indexing into an array would be impossible to infer a type for. Setting a value in an array couldn’t be typechecked. These operations are pervasive in PHP and Hack code, and generics let the typechecker understand and verify them.

In this chapter, we’ll look at all the features that generics offer, and how to use them.

Introductory Example

We’ll start with a very simple example: a class that just wraps an arbitrary value. You would probably never write such a thing in practice[11], but it’s a good gentle introduction to generics. We’ll use it as a running example throughout this chapter.

To make a class generic, put an angle bracket-enclosed, comma-separated list of type parameters immediately after the name of the class. A type parameter is simply an identifier whose name starts with an uppercase T. Inside the definition of a generic class, you can use the type parameters in type annotations, in any of the three positions (properties, method parameters, and method return types).

class Wrapper<Tval> {

private Tval $value;

public function __construct(Tval $value) {

$this->value = $value;

}

public function setValue(Tval $value): void {

$this->value = $value;

}

public function getValue(): Tval {

return $this->value;

}

}

// There can be multiple type parameters

class DualWrapper<Tone, Ttwo> {

// ...

}

To use a generic class, you simply instantiate it as normal, and use the resulting object like any other:

$wrapper = new Wrapper(20);

$x = $wrapper->getValue();

In this example, thanks to Wrapper being generic, the typechecker knows that $x is an integer. It sees that you’re passing an integer to the constructor of Wrapper, and infers that it should typecheck usages of that particular Wrapper instance as though the class definition said int instead ofTval everywhere.

The typechecking that you get in this situation is just as strong as it would be if you used this class instead of Wrapper:

class WrapperOfInt {

private int $value;

public function __construct(int $value) {

$this->value = $value;

}

public function setValue(int $value): void {

$this->value = $value;

}

public function getValue(): int {

return $this->value;

}

}

The generic version, though, has the significant benefit that you can use it with any type. If you pass a string to the constructor of Wrapper, the return type of getValue() on that instance is string. If you pass a value of type ?float to the constructor of Wrapper, the return type ofgetValue() on that instance is ?float. And so on, with any other type you can think of.

This is the true power of generics: you can write a single implementation of Wrapper that wraps a value of any type, but that is still completely typesafe.

As the final piece of this introduction, here’s how to write a type annotation for an instance of a generic class. The syntax is: the name of the class, followed by an angle bracket-enclosed, comma-separated list of type annotations. Each annotation in the list is called a type argument.

function wrapped_input(): Wrapper<string> {

$input = readline("Enter text: ");

return new Wrapper($input);

}

The relationship between type parameters and type arguments is the same as the relationship between function parameters and function arguments: the type arguments are substituted for the uses of the type parameters in the generic class definition. In this case, the function is returning an instance of Wrapper, telling the typechecker that it should typecheck usages of this object as if the class definition said string instead of Tval everywhere.

Other Generic Entities

Classes aren’t the only kind of entity that can be made generic.

Functions and Methods

A generic function has a list of type parameters between its name and the opening parenthesis of its parameter list. It can be called like any other.

function wrap<T>(T $value): Wrapper<T> {

return new Wrapper($value);

}

function main(): void {

$w = wrap(20);

}

As this example shows, a generic function’s type parameters can be used in the function’s parameter types and return type.

Methods may also be generic. If a method is in a generic class or trait, it can use its enclosing class’ type parameters, as well as introducing its own.

class Logger {

public function logWrapped<Tval>(Wrapper<Tval> $value): void {

// ...

}

}

class Processor<Tconfig> {

public function checkValue<Tval>(Tconfig $config, Tval $value): bool {

// ...

}

}

Traits and Interfaces

Both traits and interfaces can be generic. The syntax is very similar to generic class syntax: type parameter list after the name.

trait DebugLogging<Tval> {

public static function debugLog(Tval $value): void {

// ...

}

}

interface WorkItem<Tresult> {

public function performWork(): Tresult;

}

Anything that uses a generic trait, or implements a generic interface, must specify type parameters:

class StringProducingWorkItem implements WorkItem<string> {

use DebugLogging<string>;

// ...

}

A generic class can pass along its type parameters to interfaces that it implements, or traits that it uses:

class ConcreteWorkItem<Tresult> implements WorkItem<Tresult> {

use DebugLogging<Tresult>;

// ...

}

Type Aliases

See Type Aliases for full detail on type aliases. They can be made generic by adding a list of type parameters immediately after the alias name.

type matrix<T> = array<array<T>>;

There is an interesting application of generics to type aliases in which you don’t use the type parameter on the right-hand side. A good example is serialization.

newtype serialized<T> = string;

function typed_serialize<T>(T $value): serialized<T> {

return serialize($value);

}

function typed_unserialize<T>(serialized<T> $value): T {

return unserialize($value);

}

This alias lets the typechecker distinguish between the serialized versions of various types, whereas the normal untyped serialize() API loses information about the type of the serialized value. It works without typechecker errors because it’s essentially unchecked: unserialize() has no return type annotation, so the typechecker simply trusts that whatever you do with its return value is correct (see Code Without Annotations).

Here, the typechecker knows that $unserialized is a string:

$serialized_str = typed_serialize("hi");

$unserialized = typed_unserialize($serialized_str);

You can also make guarantees about the type of a serialized value:

function process_names(serialized<array<string>> $arr): void {

foreach (typed_unserialize($arr) as $name) {

// $name is known to be a string here

// ...

}

}

Type Erasure

Generics are a purely typechecker-level construct—HHVM is almost completely unaware of their existence[12]. In effect, when HHVM runs generic code, it’s as if all type parameters and type arguments were stripped. This behavior is known as type erasure.

This has important consequences for what you can and can’t do with type parameters inside the definition of a generic entity. The only thing you can do with a type parameter is to use it in a type annotation. Here are things you can’t do with a type parameter that you can do with some other types:

§ Instantiate it, as in new T().

§ Use it as a scope, as in T::someStaticMethod() or T::$someStaticProperty or T::SOME_CONSTANT.

§ Pass it type arguments, as in function f<T>(T<mixed> $value).

§ Put it on the right-hand side of instanceof, as in $value instanceof T.

§ Cast to it, as in (T)$value.

§ Use it in place of a class name in a catch block, as in:

§ function f<Texc>(): void {

§ try {

§ something_that_throws();

§ } catch (Texc $exception) { // Error

§ // ...

}

§ Use it in the type of a static property, as in:

§ class SomeClass<T> {

§ // Also illegal because the property is uninitialized,

§ // but there would be no possible valid initial value anyway

§ public static T $property;

}

When type parameters are used as type annotations, they are not enforced at runtime. In this example, we use decl mode so that the typechecker doesn’t report errors on the method calls in f().

<?hh // decl

class GenericClass<T> {

public function takes_type_param(T $x): void {

}

public function takes_int(int $x): void {

}

}

function f(GenericClass<int> $gc): void {

// Both calls below would be typechecker errors,

// but this file is in decl mode

// No runtime error

$gc->takes_type_param('a string');

// Runtime error: catchable fatal

$gc->takes_int('a string');

}

Constraints

Within the definition of a generic entity, the typechecker knows nothing about the type parameters—that’s the whole point of generics. This means you can’t do much with a value whose type is a type parameter, other than pass it around. You can’t call it, call methods or access properties on it, index into it, do arithmetic operations on it, or anything like that. (There is a significant exception: any equality or identity comparison is allowed. That is ==, ===, !=, and !==.)

You can change that, though, by adding a constraint to the type parameter. A constraint restricts what the type parameter is allowed to be. The syntax is to add the keyword as and a type annotation after the identifier in the type parameter list. Let’s return to the introductory example of theWrapper class, and add a constraint to its type parameter:

class Wrapper<Tval as num> {

private Tval $value;

public function __construct(Tval $value) {

$this->value = $value;

}

public function setValue(Tval $value): void {

$this->value = $value;

}

public function getValue(): Tval {

return $this->value;

}

}

With that, any code that uses the class can only do so with a value whose type is compatible with num.

function f(int $int, float $float, num $num,

?int $nullint, string $string, mixed $mixed): void {

$w = new Wrapper($int); // OK

$w = new Wrapper($float); // OK

$w = new Wrapper($num); // OK

$w = new Wrapper($nullint); // Error

$w = new Wrapper($string); // Error

$w = new Wrapper($mixed); // Error

}

This also means that within the definition of Wrapper, the allowable operations on values of type Tval are the same as the allowable operations on values of type num. So we can add a method like this:

class Wrapper<Tval as num> {

private Tval $value;

public function add(num $addend): void {

// $this->value is known to be a num, so we can use the += operator on it

$this->value += $addend;

}

// ...

}

You can use any valid type annotation as the constraint. The most common case is to use the name of a class or interface, which lets you call methods declared by the class or interface:

interface HasID {

public function getID(): int;

}

function write_to_database<Tval as HasID>(Tval $value): void {

$id = $value->getID();

// ...

}

Each type parameter can have at most one constraint. If you want to restrict a type parameter to only classes that implement multiple specific interfaces, you can create an interface that combines them by extending all of them, and use that as your constraint.

interface HasID {

public function getID(): int;

}

interface HasHashCode {

public function getHashCode(): string;

}

interface HasIDAndHashCode extends HasID, HasHashCode {

}

function write_to_cache<Tval as HasIDAndHashCode>(Tval $value): void {

$id = $value->getID();

$hash_code = $value->getHashCode();

// ...

}

There’s no way to express a constraint like "Tval must implement this interface or that interface”.

As we’ve seen, a constraint type can be any valid type annotation; this includes other type parameters, even type parameters from earlier in the same parameter list. For example, these usages of constraints are valid:

class GenericClass<Tclass> {

public function genericMethod<Tmethod as Tclass>(): Tmethod {

// ...

}

}

function lookup<Tvalue, Tdefault as Tvalue>(string $key,

?Tdefault $default = null): Tvalue {

// ...

}

Unresolved Types, Revisited

In the introductory example, we saw that the typechecker is able to infer type arguments for generic classes when you use them. Here, the typechecker knows that Wrapper is being instantiated with int substituted for the type parameter Tval.

$w = new Wrapper(20);

The exact details of the inference algorithm are beyond our scope here, but it has some consequences that you need to know about.

Should the typechecker accept this code?

function takes_wrapper_of_int(Wrapper<int> $w): void {

// ...

}

function main(int $n): void {

$wrapper = new Wrapper($n);

takes_wrapper_of_int($wrapper);

}

Intuitively, it seems like it should be allowed, and in fact it is. The typechecker knows, on the last line of main(), that $wrapper is a wrapper of an integer, and allows the call.

What about this?

function main(string $str): void {

$wrapper = new Wrapper($str);

takes_wrapper_of_int($wrapper);

}

It seems as if this shouldn’t be allowed, and indeed it isn’t.

What about this?

function main(int $n, string $str): void {

$w = new Wrapper($n);

$w->setValue($str);

}

As we saw in the first example, the typechecker seems to understand that $wrapper is a Wrapper<int> after the first line. So it seems like the typechecker should report an error: you shouldn’t be able to pass a string as an argument to setValue() on a Wrapper<int>. But in fact, this code is legal.

This is another place where the typechecker uses unresolved types. We first saw them in Unresolved Types, where they were used as a way for the typechecker to track a variable that could have multiple different types at a single point in a program, depending on the path taken to get there. With generics, the typechecker uses unresolved types to remember types that haven’t been explicitly specified, while retaining the freedom to adjust them as it sees more code.

After the first line, the typechecker is certain that $w is a Wrapper, but there has been no explicit indication of what its type argument is. It remembers that it has seen this object being used in a way that’s consistent with it having the type Wrapper<int>, but that type argument of int is an unresolved type. Then, upon seeing the call $w->setValue('a string'), the typechecker looks at the type of $w to see if the call is legal. When it sees the unresolved type argument, instead of raising an error, it adds string to the unresolved type. So, as far as the typechecker is concerned, $w could be either a Wrapper<int> or a Wrapper<string>.

To the human reader, this is unintuitive: obviously there’s a string inside $w. But the typechecker is unaware of the semantics of Wrapper: it doesn’t understand that Wrapper only holds a single value. All the typechecker knows is that it has seen $w being used as if it were a Wrapper<int>, and also as if it were a Wrapper<string>.

An unresolved type argument becomes resolved when it is checked against a type annotation. This example brings everything together:

function takes_wrapper_of_int(Wrapper<int> $w): void {

// ...

}

function main(): void {

$w = new Wrapper(20);

takes_wrapper_of_int($w);

$w->setValue('a string'); // Error!

}

This time, the typechecker reports an error on the last line. When $w is passed to takes_wrapper_of_int(), it has to be checked against the function’s parameter type annotation. At that point, the type of $w is resolved; the typechecker has seen concrete evidence that $w is supposed to be a Wrapper<int>. Now that the type is resolved, the typechecker will not be lenient in checking calls to setValue(). Calling setValue('a string') on a Wrapper instance with resolved type Wrapper<int> is invalid, so the typechecker reports an error.

Generics and Subtypes

Let’s return to the introductory example of the Wrapper class. Should the typechecker accept this code?

function takes_wrapper_of_num(Wrapper<num> $w): void {

// ...

}

function takes_wrapper_of_int(Wrapper<int> $w): void {

takes_wrapper_of_num($w);

}

The question is whether it’s valid to pass a wrapper of an integer to something that expects a wrapper of a num. It seems like it should be: int is a subtype of num (meaning any value that is an int is also a num), so it seems that Wrapper<int> should likewise be a subtype ofWrapper<num>.

In fact, the typechecker reports an error for this example. It would be incorrect for the typechecker to assume that the subtype relationship of int and num transfers over to the subtype relationship between Wrapper<int> and Wrapper<num>.

To illustrate why, consider that takes_wrapper_of_num() could do this:

function takes_wrapper_of_num(Wrapper<num> $w): void {

$w->setValue(3.14159);

}

That, by itself, is valid: setting the value inside a Wrapper<num> to a value of type float. But if you pass a Wrapper<int> to this version of takes_wrapper_of_num(), it will end up not being a wrapper of an integer anymore. So the typechecker can’t accept passing a Wrapper<int>to takes_wrapper_of_num(); it’s not typesafe. Note that that’s a hard rule—the typechecker doesn’t consider what takes_wrapper_of_num() is actually doing. Even if takes_wrapper_of_num() were empty, the typechecker would still report an error.

Now for another example: should the typechecker accept this?

function returns_wrapper_of_int(): Wrapper<int> {

// ...

}

function returns_wrapper_of_num(): Wrapper<num> {

return returns_wrapper_of_int();

}

Again, although this intuitively seems fine, the typechecker reports an error. The reasoning is similar. Suppose we fill in the blanks like this:

function returns_wrapper_of_int(): Wrapper<int> {

static $w = new Wrapper(20);

return $w;

}

function returns_wrapper_of_num(): Wrapper<num> {

return returns_wrapper_of_int();

}

function main(): void {

$wrapper_of_num = returns_wrapper_of_num();

$wrapper_of_num->setValue(2.71828);

}

This is clearly invalid—after main() executes, any call to returns_wrapper_of_int() will return a wrapper of something that’s not an int. So, again, the typechecker has to report an error for the return statement in returns_wrapper_of_num().

Arrays and Collections

Arrays and immutable Hack collection classes—ImmVector, ImmMap, ImmSet, and Pair—behave differently. They follow the intuitive notion that, for example, array<int> is a subtype of array<num>. This usage of arrays, for example, is valid:

function takes_array_of_num(array<num> $arr): void {

// ...

}

function takes_array_of_int(array<int> $arr): void {

takes_array_of_num($arr); // OK

}

Similar behavior holds for the value types[13] of immutable collection classes, regardless of whether you annotate them with their own names, or (as is recommended) with interface names like ConstVector:

function takes_constvector_of_num(ConstVector<num> $cv): void {

// ...

}

function takes_constvector_of_int(ConstVector<int> $cv): void {

takes_constvector_of_num($cv); // OK

}

function takes_constmap_of_arraykey_mixed(ConstMap<string, mixed> $cm): void {

// ...

}

function takes_constmap_of_string_int(ConstMap<string, int> $cm): void {

takes_constmap_of_arraykey_mixed($cm); // OK

}

Why is this valid for arrays and immutable collections, but not for Wrapper?

In the case of immutable collections, it’s simply that they’re immutable. Even if you pass an ImmVector<int> to a function that takes an ImmVector<num>, that function has no way to get a non-integer value into the vector. There’s nothing it can do to violate the contract that the vector must only contain integers.

In the case of arrays, the reason is similar. For this purpose, arrays behave very much like immutable collections because of their pass-by-value semantics. In the example above, from the perspective of takes_array_of_num(), the array in the body of takes_array_of_int() actually is read-only. takes_array_of_num() can’t cause that array to have non-integers in it, because it doesn’t have access to the original array; it only has access to a copy.

Advanced: Covariance and Contravariance

Unless you’re writing some very general, collection-like library, it’s very unlikely that you need to read past here. For the vast majority of use cases, all you need is to know that the above rules exist, and to understand why. The rest of this section is about how to modify those rules when you need to.

The concept of how the subtype relationships of generic types are affected by the subtype relationships of their type arguments is called variance. There are three kinds of variance. Suppose we have a generic class called Thing, with a type parameter T. Then (using int and num as example type arguments):

§ If Thing<int> is a subtype of Thing<num>, we say that Thing is covariant on T. Arrays are covariant on both their type parameters, and immutable collection classes are covariant on their value type parameters.

§ If Thing<num> is a subtype of Thing<int>, we say that Thing is contravariant on T. Counterintuitive though it may be, there are real applications for contravariance.

§ If neither of the above is true, we say that Thing is invariant on T.

Syntax

The syntax to make a generic type covariant on a type parameter is to put a plus sign before the type parameter. You only do this in the parameter list; within the definition, just use the type parameter’s name as before. Similarly, to make a generic type contravariant on a type parameter, put a minus sign before the type parameter.

class CovariantOnT<+T> {

private T $value; // No + here

// ...

}

class ContravariantOnT<-T> {

private T $value; // No - here

// ...

}

class InvariantOnT<T> {

private T $value;

// ...

}

A class is allowed to have type parameters with different variances:

class DifferentVariances<Tinvariant, +Tcovariant, -Tcontravariant> {

// ...

}

Here are some memory aids you can use to remember the terms and the syntax:

§ Covariance: the prefix co- means “with”, and the subtype relationship of a generic type goes with—“in the same direction as”—the subtype relationship of arguments to a covariant type parameter. Since they go together, the symbol is a plus sign.

§ Contravariance: the prefix contra- means “against”, and the subtype relationship of a generic type goes against the subtype relationship of arguments to a contravariant type parameter. Since they go in opposite directions, the symbol is a minus sign.

When to Use Them

Most classes you write won’t use covariance or contravariance. These features are useful in a few specific situations:

§ Covariance is for read-only types. For example, if we remove the setValue() method from Wrapper, then it’s read-only with respect to its type parameter Tval. That is, it only outputs values of type Tval; it never takes them as input except in the constructor. So Wrapper can be covariant on Tval.[14]

§ Contravariance is for write-only types. For example, a generic class that serializes values of type T to a log file might be write-only with respect to values of type T. That is, it only takes values of type T as input; it never outputs them.

The typechecker enforces this by setting restrictions on how you can use covariant and contravariant type parameters. Specifically, each kind of type parameter is only allowed to appear in certain places in the code, called covariant positions and contravariant positions.

First, the simple part:

§ Public and protected property types: invariant type parameters only.

§ Return types: invariant or covariant type parameters. These are covariant positions.

§ Function and method parameter types, except constructors: invariant or contravariant type parameters. These are contravariant positions.

§ Private property types, and constructor parameter types: any type parameter.

Now, the slightly tricky part. It is possible to have a contravariant position inside another contravariant position, in which case the inner contravariant position is actually covariant. Here’s an example:

class WriteOnly<-T> {

private T $value;

public function __construct(T $value) {

$this->value = $value;

}

// Error!

public function passToCallback((function(T): void) $callback): void {

$callback($this->value);

}

}

The contravariant type parameter T appears in a parameter type (the type of $callback) inside another parameter type (the type of passToCallback()). This is a contravariant position inside another contravariant position, so it’s covariant, and thus invalid.

You can see why this is, intuitively: the way passToCallback() is written makes it possible for something outside of WriteOnly to get a value of type T out of a WriteOnly instance, which makes it not actually write-only.

A covariant position inside a covariant position is still covariant. Covariance and contravariance work somewhat like positive and negative numbers under multiplication: positive times positive is positive, but negative times negative is also positive.

Covariance

Let’s remove setValue() from Wrapper, and make its type parameter covariant.

class Wrapper<+Tval> {

private Tval $value;

public function __construct(Tval $val) {

$this->value = $val;

}

public function getValue(): Tval {

return $this->value;

}

}

The covariant type parameter Tval appears as the type of a private property, a parameter to the constructor, and a return type; all of these are positions where covariant type parameters are allowed. The typechecker will accept this without error.

The next example is also accepted now. The restrictions placed on the covariant type parameter ensure that there’s no way to break type safety while treating a Wrapper<int> as a Wrapper<num>.

function takes_wrapper_of_num(Wrapper<num> $w): void {

// ...

}

function takes_wrapper_of_int(Wrapper<int> $w): void {

takes_wrapper_of_num($w); // OK

}

If you add a method to modify the value, the typechecker will report an error, saying that a covariant type parameter is appearing in a non-covariant position:

class Wrapper<+Tval> {

public function setValue(Tval $value): void { // Error

$this->value = $value;

}

// ...

}

Similarly, if you change the $value property’s access modifier to public or protected, the typechecker will report an error, saying that a non-private property is always an invariant position—i.e. you can’t use covariant or contravariant type parameters there.

Contravariance

Contravariant types are less common, simply because write-only types are less common than read-only types. We’ll look at contravariance through a class that builds up a buffer of values and then writes them as JSON to a stream.

class JSONLogger<-Tval> {

private resource $stream;

private array<Tval> $buffer = array();

public function __construct(resource $stream) {

$this->stream = $stream;

}

public function log(Tval $value): void {

$buffer[] = $value;

}

public function flush(): void {

fwrite($this->stream, json_encode($this->buffer));

$this->buffer = array();

}

}

Note that the contravariant type parameter Tval only appears in a method parameter and a private property, so the typechecker accepts this code. If you were to make $buffer public or protected, or add a method with Tval in the return type, the typechecker would report an error.

The contravariant type parameter means that JSONLogger<num> is a subtype of JSONLogger<int>, which may seem counterintuitive. This code demonstrates:

function wants_to_log_ints(JSONLogger<int> $logger): void {

$logger->log(20);

}

function wants_to_log_nums(JSONLogger<num> $logger): void {

wants_to_log_ints($logger); // OK

$logger->log(3.14);

}

The code is passing a JSONLogger<num> to something that expects a JSONLogger<int>. This is fine, because a JSONLogger<num> can do anything that a JSONLogger<int> can (and more). Since there’s no way to get a value of type Tval back out of a JSONLogger, no code outside the class can get a value from it of a type that it doesn’t expect.


[11] It’s not as useless as it may seem, though—this is a good way to have something resembling reference semantics for primitive types. This is more useful in Hack than in PHP, since PHP-style references aren’t allowed in Hack.

[12] The lone exception is in the return types of async functions. See Chapter 7.

[13] It doesn’t hold for key types because of variance rules (see Advanced: Covariance and Contravariance). The key type parameter appears in contravariant positions, like the parameter of get(), so it can’t be covariant. This is likely to change in future, as a special case.

[14] Note that Wrapper could have read/write functionality that doesn’t involve Tval, and Tval could still be covariant. The read-only nature of Tval is what counts, not the read-only nature of Wrapper.