Introduction - Hack and HHVM (2015)

Hack and HHVM (2015)

Chapter 1. Introduction

For most of its history, Facebook has held internal hackathons every few months. For hackathons, engineers are encouraged to come up with ideas that aren’t related to their day jobs, form teams, and try to make something cool, in the span of a day or two.

In 2007, one hackathon in November resulted in an interesting experiment: a tool that could convert PHP programs into equivalent C++ programs and then compile them with a C++ compiler. The idea was that the C++ program would run a lot faster than the PHP original, since it could take advantage of all the optimization work that has gone into C++ compilers over the years.

This possibility was of great interest to Facebook. It was gaining a lot of new users, and supporting more users requires more CPU cycles. As you run out of available CPU cycles, unless you buy more CPUs, which gets very expensive, you have to find a way to consume fewer CPU cycles per user. Facebook’s entire web front-end was written in PHP, and any way to get that PHP code to consume fewer CPU cycles was welcome.

Over the next seven years, the project grew far beyond its hackathon origins. As a PHP-to-C++ transformer called HPHPc, in 2009 it became the sole execution engine powering Facebook’s web servers. In early 2010, it was open-sourced under the name HipHop for PHP. And then, starting in 2010, an entirely new approach to execution—just-in-time compilation to machine code, with no C++ involved—grew out of HPHPc’s codebase, and eventually superseded it. This just-in-time compiler, called the HipHop Virtual Machine, or HHVM for short, took over Facebook’s entire web server fleet in early 2013. The original PHP-to-C++ transformer is gone; it is not deployed anywhere and its code has been deleted.

The origin of Hack is entirely separate. Its roots are in a project that attempted to use static analysis on PHP to automatically detect potential security bugs. Fairly soon, it turned out that the nature of PHP makes it fundamentally difficult to get static analysis that’s deep enough to be useful. Thus the idea of “strict mode” was born: a modification of PHP, with some features removed (such as references), and a sophisticated type system added. Authors of PHP code could opt into strict mode, gaining stronger checking of their code while retaining full interoperability.

Hack’s direction since then belies its origin as a type system on top of PHP. It has gained new features with significant effects on the way Hack code is structured, like async. It has added new features specifically meant to make the type system more powerful, like collections. Philosophically, it’s a different language from PHP, carving out a new position in the space of programming languages.

This is how we got where we are today: a modern, dynamic programming language with robust static typechecking, executing with just-in-time compilation on an engine with full PHP compatibility and interoperability.

What are Hack and HHVM?

Hack and HHVM are closely related, and there has occasionally been some confusion as to what exactly the terms refer to.

Hack is a programming language. It’s based on PHP, shares much of PHP’s syntax, and is designed to be fully interoperable with PHP. However, it would be severely limiting to think of Hack as nothing more than some decoration on top of PHP. Hack’s main feature is robust static typechecking, which is enough of a difference from PHP to qualify Hack as a language in its own right. Hack is useful for developers working on an existing PHP codebase, and has many affordances for that situation, but it’s also an excellent choice for ground-up development of a new project.

Beyond static typechecking, Hack has several other features that PHP doesn’t have, and most of this book is about those features: async functions, XHP, and many more. It also intentionally lacks a handful of PHP’s features, to smooth some rough edges.

HHVM is an execution engine. It supports both PHP and Hack, and it lets two languages interoperate: code written in PHP can call into Hack code, and vice versa. When executing PHP, it’s intended to be usable as a drop-in replacement for the standard PHP interpreter from php.net. This book has a few chapters that cover HHVM: how to configure and deploy it, and how to use it to debug and profile your code.

Finally, separate from HHVM, there is the Hack typechecker: a program that can analyze Hack code (but not PHP code) for type errors. The typechecker is somewhat stricter than HHVM about what it will accept, although HHVM will become stricter to match the typechecker in future versions. The typechecker doesn’t really have a name, other than the command you use to run it, hh_client. I’ll refer to it as “the Hack typechecker” or just “the typechecker”.

As of now, HHVM is the only execution engine that runs Hack, which is why the two may sometimes be conflated.

Who This Book is For

This book is for readers who are comfortable with programming. It spends no time explaining concepts common to many programming languages, like control flow, data types, functions, and object-oriented programming.

Hack is a descendant of PHP. This book doesn’t specifically explain common PHP syntax, except in areas where Hack differs, so basic knowledge of PHP is helpful. If you’ve never used PHP, you’ll still be able to understand much of the code in this book if you have experience with other programming languages. The syntax is generally very straightforward to understand.

You don’t need to have worked on a large PHP codebase. Hack is useful for codebases of all sizes—from simple stand-alone scripts to multi-million-line web apps like Facebook. There’s nothing here that you won’t understand if you’ve never worked on a complex high-traffic PHP website.

There is some material that assumes familiarity with typical web app tasks like querying relational databases and memcached (in Chapter 7) and generating HTML (in Chapter 8). You can skip these parts if they’re not relevant to you, but they require no knowledge that you wouldn’t get from experience with even a small, basic web app.

I hope to make this book not just an explanation of how things are, but also of how they came to be that way. Programming language design is a hard problem; it’s essentially the art of navigating hundreds of tradeoffs at once. It’s also subject to a surprising range of pragmatic concerns like backward compatibility, and Hack is no exception. If you’re at all interested in a case study of how one programming language made its way through an unusual set of constraints, this book should provide.

Philosophy

There are a few principles that underlie the design of both Hack and HHVM, which can help you understand how things came to be the way they are.

Program Types

There is a single observation about programs that informs both HHVM’s approach to optimizing and executing code, and Hack’s approach to verifying it. That is: behind most programs in dynamically-typed languages, a statically-typed program is hiding.

Consider this code, which works as both PHP and Hack:

for ($i = 0; $i < 10; $i++) {

echo $i + 100;

}

Although it’s not explicitly stated anywhere, it’s obvious to any human reader that $i is always an integer. The computer science term for this is that $i is monomorphic: it only ever has one type. A typechecker could make use of this property to verify that the expression $i + 100 makes sense. An execution engine could make use of this property to compile $i + 100 into efficient machine code to do the addition.

A loop variable may seem like a trivial example, but it turns out that in real-world PHP codebases, most values are monomorphic. This makes intuitive sense, because you can’t do much with a value—do arithmetic on it, index into it, call methods on it, etc.—without knowing what its type is. Most code, even in dynamically-typed languages, does not check the type of each value before doing anything with it, which means that there must be hidden assumptions about the types of values. If the code mostly runs without runtime type errors, then those hidden assumptions must be true most of the time.

HHVM’s approach is to assume that this observation usually holds, and to compile PHP and Hack to machine code accordingly. Since it compiles programs while they are running, it knows the types flowing through each piece of code it’s about to compile. It outputs machine code that assumes those types: in the code example above, when compiling the expression $i + 100, HHVM would see that $i is an integer and use a single hardware addition instruction to do the addition.

The purpose of Hack, meanwhile, is to bring the hidden statically-typed program into the light. It makes some types explicit with annotations, and verifies the rest with type inference. The idea is that Hack doesn’t significantly constrain existing PHP programs; rather, it makes behavior that the programs already had explicit, and exposes it to robust static analysis.

This point is worth repeating: Hack’s static typing is not supposed to require a different style of programming. The language is designed to give you a better way to express the programs you were already writing.

Gradual Migration

Hack originated in the shadow of a multi-million-line PHP codebase. There’s no way to convert a codebase of that size from one language to another in one fell swoop, no matter how similar the languages are, so Hack has evolved with very gradual migration paths from PHP. Hack code can use functions and classes written in PHP, and vice versa. For every feature of Hack, there is a seamless way for code that uses it to interact with code that doesn’t use it.

In addition, the standard Hack/HHVM distribution comes with tools to do automated migration of PHP to Hack. It also includes a tool that transpiles Hack into PHP, for use by library authors who want to migrate to Hack while preserving a way for non-HHVM users to use their code. These tools are described in detail in (to come).

HHVM, for its part, is intended to run PHP code identically to the standard PHP interpreter. The first step in migrating a PHP codebase to Hack is to switch to running that PHP code on HHVM. The only significant code changes that should be required in this step are around extensions: not all PHP and Zend extensions are compatible with HHVM. There should be no changes required because of differing behavior in the core language.

Make no mistake, though: despite its origins, Hack is an excellent choice if you’re starting a new project from scratch. In fact, you’ll get the most benefit out of Hack that way: the language is at its best when a codebase is 100 percent Hack.

How The Book is Organized

The central feature of Hack is static typechecking. It cuts broadly across all of Hack’s other features, and is the most significant difference between Hack and PHP. The book starts by exploring that topic in detail in Chapter 2. Almost everything else in the book depends on an understanding of the content in that chapter, so if you haven’t seen Hack before, I very strongly recommend reading it thoroughly. That content is supplemented by Chapter 3, which discusses a particularly interesting part of Hack’s type system.

The rest of Hack’s features are mostly orthogonal to each other. Chapter 4 explains several of Hack’s smaller features. Chapter 5 shows the few PHP features that are gone from Hack, and explains why. Chapter 6 explains how and why to use Hack’s collection classes. Chapter 7 explains Hack’s support for multitasking, and Chapter 8 explains Hack’s syntax and library for generating HTML sanely and securely.

Finally, (to come) explores some of the tools for working with Hack code, including a PHP-to-Hack migration tool and an interactive debugger. Chapter 9 covers the process of setting up, configuring, deploying, and monitoring HHVM.

Versions

This book is about Hack and HHVM version 3.6, which was released on March 11, 2015. (HHVM and the Hack typechecker live in the same codebase, and are released as a single package.) By the time you read this, there will already be newer versions out. However, 3.6 is a Long Term Support release; it will be updated with security and bug fixes for 48 weeks after its release.

HHVM 3.6 implements PHP 5.6 semantics[1]. It supports all of the features new in PHP 5.6—constant scalar expressions, variadic functions, exponentiation operator, etc. These features are present in Hack 3.6 as well. In general, as new versions of PHP come out, HHVM adds support for the new features and semantics, for Hack code as well as PHP code.

[1] The matching minor version numbers are a coincidence. There’s no relationship between HHVM/Hack and PHP version numbers, in general.