PHP Features Not Supported in Hack - Hack and HHVM (2015)

Hack and HHVM (2015)

Chapter 5. PHP Features Not Supported in Hack

Hack has many features that PHP doesn’t, and it is also missing a few of PHP’s features. The choices to omit these features weren’t made lightly: there are deep technical reasons behind most of them, stemming from concerns about type safety and performance.

It’s important to note, though, that these restrictions only apply to Hack. When HHVM is running PHP code (that is, code in any file with <?php at the top), it produces the same results as the PHP interpreter from php.net. PHP code that uses a feature absent from Hack can still interoperate seamlessly with Hack code.

In this chapter, we’ll explore these unsupported features, analyzing why they are hard or impossible to implement static type analysis for, and why they’re hard to compile into efficient native code. If you’re simply looking to get started with Hack, it’s enough to skim the section headers of this chapter.

Once more, to be clear: HHVM supports all of these features when running regular PHP code.

References

Of the features that Hack doesn’t support, references are the most fundamental. They are a cross-sectional language feature: they have deep influence on how PHP engines represent program values, on how variables are handled, on function call and return mechanisms, and on memory management.

References make it very difficult to do sound static analysis. They allow the possibility of “action at a distance”, where innocuous-looking code can have unknowable effects. Passing a variable to a function by reference means that anything can happen to that variable, and the typechecker has no way of knowing what that is (since type inference is function-local, as described in Inference Is Function-Local). This makes it impossible to ensure type safety around references, and difficult to execute code around them efficiently.

Another example of “action at a distance” making type inference difficult is the problem of object properties. As we saw in Inference on Properties, the typechecker must be very conservative with its inference around object properties, because there are so many ways to act upon properties at a distance. This problem is even worse with references, which is why Hack simply ignores them.

There’s a separate, small way in which references are bad for performance: accessing a variable that is a reference requires an additional pointer dereference, compared to accessing a regular variable. This means an additional memory access, which puts pressure on cache memories and incurs more roundtrips to main memory.

GARBAGE COLLECTION

Another thing that makes references troublesome is that they allow PHP code to observe the runtime’s copy-on-write optimization of array copying, by taking a reference to an element in an array. Apart from any philosophical arguments about why this is bad, there’s a practical one too. The fact that copy-on-write is observable by PHP code makes it hard, if not impossible, to implement tracing garbage collection in a PHP engine.

As it is, if PHP engines don’t use naïve reference counting, they’ll cause observable behavior differences (also known as “bugs”). This means that there’s very little freedom to experiment with other memory management algorithms, which closes off a source of possibly significant performance gains.

The global Statement

The global statement is forbidden in Hack, because it’s implemented with references under the hood. The statement global $x is syntactic sugar for $x =& $GLOBALS['x'].

In partial mode, you can read from and write to the $GLOBALS array without references, so you can use that to work around the lack of a global statement. (In strict mode, Hack will report an error, saying $GLOBALS is an undefined variable.)

Top-Level Code

As a corollary to the ban on the global statement, most top-level code is forbidden in strict mode. (It is allowed, but not typechecked, in partial mode.) You’re allowed to define named entities (functions, classes, etc.) and use the require/include family of statements at the top level in all modes.

This is simply because top-level code exists in global scope[16], so any read or write of a local variable is actually a read or write of a global variable.

You can get rid of top-level code that doesn’t rely on being in global scope simply by wrapping it in a function. If it does rely on being in global scope, it’ll need a more substantial rewrite to become valid Hack.

Every program, whether a script or a web app, starts execution in top-level code, so every program will need at least one partial-mode file to serve as an entry point. Ideally, that one partial-mode file will only have one top-level statement other than require and definitions: a function call that is the gateway to the bulk of the program’s logic.

<?hh

require_once 'lib/autoload.php';

function main() {

setup_autoload();

do_initialization();

$controller = find_controller();

$controller->execute();

}

main();

Old-Style Constructors

An old-style constructor is a method that has the same name as its enclosing class:

class Thing {

public function Thing() {

echo 'constructor!';

}

}

$t = new Thing(); // prints 'constructor!'

This design was presumably inspired by C++ and Java, but was replaced in PHP 5 by the “unified” constructor __construct. Hack’s ban on old-style constructors is to avoid a potentially confusing, and in this case redundant, feature. The interactions of old-style and new-style constructors, especially in the presence of inheritance, are complex and inconsistent, and there’s no reason to have both.

This isn’t just a Hack change; a future release of PHP will remove the feature as well.

Case-Insensitive Name Lookup

In PHP, function and class names are looked up case-insensitively. That is, if you define a function named compute, you can call it by writing CoMpUtE(). If you try to do that in Hack, however, the typechecker will report an error, saying the function CoMpUtE is undefined.

Note, however, that although Hack is case-sensitive, it’s not valid to define two functions (or two classes, etc.) that have names differing only in casing. That’s because Hack has to be able to interoperate with PHP, in which name lookups are still case-insensitive.

This restriction actually has nothing to do with either type safety or performance. It would have been very simple to implement case-insensitive name lookup in the Hack typechecker, and it wouldn’t affect the typechecker’s ability to do type inference.

Rather, it’s a philosophical decision. Most general-purpose programming languages are case-sensitive, including PHP’s spiritual ancestors: Perl, C, and Java. It makes code marginally easier to read, and makes it no harder to write.

On the performance side, case-insensitive lookup is slightly less efficient than case-sensitive lookup, because the target string must undergo case normalization before being used as the key in a hashtable lookup. In HHVM, it also incurs a small memory penalty, because each function and class must store both the original name from source code (for use in error messages and reflection) and a case-normalized version of the name (for use in hashtable lookups).

Variable Variables

Variable variables look like this:

$name = 'x';

$x = 'well this is silly';

echo $$name; // Prints 'well this is silly'

This isn’t allowed in Hack because, in general, it’s impossible to infer types around a construct like that. When the typechecker sees an expression like $$name, it has no idea what the type of that expression is, or even whether that’s a valid variable access.

And that’s just in the case of reading a variable variable. An expression like $$name = 10 could, in general, change the type of any local variable in scope, and the typechecker has no hope of understanding the possible effects.

This reasoning echoes the reasons why references aren’t allowed in Hack. Variable variables allow action at a distance. They allow code to read and write local variables through a layer of abstraction that is opaque to the typechecker.

There’s also a performance concern. While converting PHP and Hack to bytecode, HHVM assigns each local variable a number, starting at 0 and increasing. At runtime, it stores all of a function’s local variables consecutively in memory, in numerical order, and it can access each one with a single machine instruction. If variable variables aren’t involved, at runtime, HHVM doesn’t need to remember local variable names; each usage of a local variable is replaced with its number, and that number is all that’s needed to find the variable’s contents in memory. But if variable variables are involved, HHVM has to set up and tear down a mapping of local variable names to memory locations when entering and exiting the function. This takes extra time and memory.

Dynamic Properties

In PHP, you can create an object property by assigning something to it, in much the same way you can create a local variable:

<?php

class C {

}

function f(): void {

$c = new C();

$c->prop = 'hi';

echo $c->prop; // Prints 'hi'

}

In Hack, this isn’t valid; you have to declare all properties. If you know all the properties an object will have in advance, declare them; if you don’t, use a shape (see Array Shapes) or a Map (see Chapter 6) instead of an object.

This is for both type safety and performance. In general, it’s impossible for the typechecker to infer the types of dynamic properties, or even whether a given dynamic property exists when it’s read from.

The performance concern is that HHVM reserves slots within an object’s memory for declared properties, allowing it to access a declared property by looking at a known, constant offset from the beginning of an object’s memory. No hashtable lookups are involved. It can’t do this with dynamic properties; it has to store those in a hashtable, incurring hashtable lookups every time a dynamic property is read or written. (This is very similar to the performance concern around variable variables.)

Mixing Method Call Syntax

In PHP, you can call static methods with non-static method call syntax, and you can call non-static methods with static method call syntax:

class C {

public function nonstaticMethod() { }

public static function staticMethod() { }

}

C::nonstaticMethod(); // Allowed in PHP

$c = new C();

$c->staticMethod(); // Allowed in PHP

Both of these are invalid in Hack. Static methods can only be called with :: syntax, and non-static methods can only be called with -> syntax.

The main reason why Hack forbids this behavior is that if a non-static method is called with :: syntax, then $this is null inside the method. That’s problematic; it’s not reasonable to expect a non-static method to tolerate $this being null. There will be an error as soon as it tries to call a method or access a property on $this. If the method doesn’t use $this, then it probably shouldn’t be a non-static method in the first place.

The distinction between static and non-static methods exists for a reason—does the method need an object context to work in, or not?—and allowing this distinction to be erased at callsites makes the distinction useless, and gains nothing.

isset, empty, and unset

These three expressions are allowed in partial mode, but not in strict mode.

All three of them are irregularities in PHP’s syntax and semantics. They look like normal functions, but they’re not. They’re special-cased in PHP’s grammar so that it’s possible to pass undefined variables and index expressions (like $nonexistent['nonexistent']) without incurring warnings. They are also unusual in that the arguments you pass to isset and unset cannot be arbitrary expressions[17]; you can only pass expressions that would be valid lvalues—expressions that could appear on the left-hand side of an assignment expression. This “looks-like-a-function-but-isn’t” phenomenon hurts language cleanliness, which is one argument against these features.

In Hack, there’s no reason to use isset or empty to test whether a variable is defined: it should be knowable, statically, whether a variable is defined at a given position.

For testing the existence of array elements, use the builtin function array_key_exists() instead of isset or empty. Don’t worry about performance: HHVM heavily optimizes calls to array_key_exists().

unset is a bit different. There’s simply no reason to use it on variables in Hack. If you want to get the same effect, just assign null to the variable. In PHP, there’s one other reason to use unset on a variable, which is to break a reference relationship, but in Hack this isn’t necessary, since references aren’t supported.

The one hole in functionality that this restriction creates is that you can’t remove elements from an array in strict mode. The preferred alternative is to use a collection (see Chapter 6) instead of an array. If that’s not feasible, you can work around this, by defining, in a partial-mode file, a helper function that uses unset.

Others

§ eval() and its close relative create_function(). Their effects are, of course, impossible to analyze statically. It’s also generally bad programming practice to use things like this. Usages of eval generally fall into two categories: simple enough that eval isn’t necessary, since the code can just be written normally instead; or complex enough that they pose a significant correctness or security risk.

§ The extract() function. Using it won’t result in an error from the typechecker, in any mode. However, the typechecker makes no attempt to track its effects on the local variable environment; it will assume that all local variables have the same value after a call to extract() as they did before.

§ The goto statement. It is the subject of a famous old debate among programmers, many of whom have strong opinions one way or the other. There’s no point in rehashing the whole debate here; the important thing is that the Hack team comes down on the “no” side, so Hack doesn’t allow goto.

§ In a similar vein, the break and continue statements are not allowed to take arguments in Hack. (In PHP, this is used to break or continue out of multiple nested loops.)

§ The increment and decrement operators (+ and +--+) can’t be applied to strings. In regular PHP, this has a variety of interesting behaviors: + applied to "9" yields "10", applied to "a" it yields "b", and applied to "z" it yields "aa". There is little practical use for this behavior, but if you need it, your only option is to code it manually.

§ The and, or, and xor operators. Instead of the first two, use && and || respectively, but beware: they fall in a different place in the order of operator precedence. Use parentheses to make sure your expression is parsed the way you expect. There’s no exact alternative to the xor operator. The closest alternative is the ^ operator, which implements bitwise XOR as opposed to logical XOR; it also has different precedence.


[16] Or not, depending on where the file is included from—another reason it’s hard to typecheck.

[17] This restriction applied to empty as well, until PHP 5.5.