Java 8 in Action: Lambdas, streams, and functional-style programming (2015)

Appendix D. Lambdas and JVM bytecode

You may wonder how the Java compiler implements lambda expressions and how the Java virtual machine (JVM) deals with them. If you think lambda expressions can simply be translated to anonymous classes, you should read on. This appendix briefly discusses how lambda expressions are compiled, by examining the generated class files.

D.1. Anonymous classes

We showed in chapter 2 that anonymous classes can be used to declare and instantiate a class at the same time. As a result, just like lambda expressions, they can be used to provide the implementation for a functional interface.

Because a lambda expression provides the implementation for the abstract method of a functional interface, it would seem straightforward to ask the Java compiler to translate a lambda expression into an anonymous class during the compilation process. But anonymous classes have some undesirable characteristics that impact the performance of applications:

· The compiler generates a new class file for each anonymous class. The filename usually looks like ClassName$1, where ClassName is the name of the class in which the anonymous class appears, followed by a dollar sign and a number. The generation of many class files is undesirable, because each class file needs to be loaded and verified before being used, which impacts the startup performance of the application. If lambdas were translated to anonymous classes, you’d have one new class file for each lambda.

· Each new anonymous class introduces a new subtype for a class or interface. If you had a hundred different lambdas for expressing a Comparator, that would mean a hundred different subtypes of Comparator. In certain situations, this can make it harder to improve runtime performance by the JVM.

D.2. Bytecode generation

A Java source file is compiled to Java bytecode by the Java compiler. The JVM can then execute the generated bytecode and run the application. Anonymous classes and lambda expressions use different bytecode instructions when they’re compiled. You can inspect the bytecode and constant pool of any class file using the command

javap -c -v ClassName

Let’s try to implement an instance of the Function interface using the old Java 7 syntax, as an anonymous inner class, as shown in the following listing.

Listing D.1. A Function implemented as an anonymous inner class

import java.util.function.Function;

public class InnerClass {

Function<Object, String> f = new Function<Object, String>() {

@Override

public String apply(Object obj) {

return obj.toString();

}

};

}

Doing this, the corresponding generated bytecode for the Function created as an anonymous inner class will be something along the lines of this:

0: aload_0

1: invokespecial #1 // Method java/lang/Object."<init>":()V

4: aload_0

5: new #2 // class InnerClass$1

8: dup

9: aload_0

10: invokespecial #3 // Method InnerClass$1."<init>":(LInnerClass;)V

13: putfield #4 // Field f:Ljava/util/function/Function;

16: return

This code shows the following:

· An object of type InnerClass$1 is instantiated using the bytecode operation new. A reference to the newly created object is pushed on the stack at the same time.

· The operation dup duplicates that reference on the stack.

· This value then gets consumed by the instruction invokespecial, which initializes the object.

· The top of the stack now still contains a reference to the object, which is stored in the f1 field of the LambdaBytecode class using the putfield instruction.

InnerClass$1 is the name generated by the compiler for the anonymous class. If you want to reassure yourself, you can inspect the InnerClass$1 class file as well, and you’ll find the code for the implementation of the Function interface:

class InnerClass$1 implements

java.util.function.Function<java.lang.Object, java.lang.String> {

final InnerClass this$0;

public java.lang.String apply(java.lang.Object);

Code:

0: aload_1

1: invokevirtual #3 //Method

java/lang/Object.toString:()Ljava/lang/String;

4: areturn

}

D.3. InvokeDynamic to the rescue

Now let’s try to do the same using the new Java 8 syntax as a lambda expression. Inspect the generated class file of the code in the following listing.

Listing D.2. A Function implemented with a lambda expression

import java.util.function.Function;

public class Lambda {

Function<Object, String> f = obj -> obj.toString();

}

You’ll find the following bytecode instructions:

0: aload_0

1: invokespecial #1 // Method java/lang/Object."<init>":()V

4: aload_0

5: invokedynamic #2, 0 // InvokeDynamic

#0:apply:()Ljava/util/function/Function;

10: putfield #3 // Field f:Ljava/util/function/Function;

13: return

We explained the drawbacks in translating a lambda expression in an anonymous inner class, and indeed you can see that the result is very different. The creation of an extra class has been replaced with an invokedynamic instruction.

The invokedynamic instruction

The bytecode instruction invokedynamic was introduced in JDK7 to support dynamically typed languages on the JVM. invokedynamic adds a further level of indirection when invoking a method, to let some logic dependent on the specific dynamic language determine the call target. The typical use for this instruction is something like the following:

def add(a, b) { a + b }

Here the types of a and b aren’t known at compile time and can change from time to time. For this reason, when the JVM executes an invokedynamic for the first time, it consults a bootstrap method, implementing the language-dependent logic that determines the actual method to be called. The bootstrap method returns a linked call site. There’s a good chance that if the add method is called with two ints, the subsequent call will also be with two ints. As a result, it’s not necessary to rediscover the method to be called at each invocation. The call site itself can contain the logic defining under which conditions it needs to be relinked.

In listing D.2, the features of the invokedynamic instruction have been used for a slightly different purpose than the one for which they were originally introduced. In fact, here it’s used to delay the strategy used to translate lambda expressions in bytecode until runtime. In other words, using invokedynamic in this way allows deferring code generation for implementing the lambda expression until runtime. This design choice has positive consequences:

· The strategy used to translate the lambda expression body to bytecode becomes a pure implementation detail. It could also be changed dynamically, or optimized and modified in future JVM implementations, preserving the bytecode’s backward compatibility.

· There’s no overhead, such as additional fields or static initializer, if the lambda is never used.

· For stateless (noncapturing) lambdas it’s possible to create one instance of the lambda object, cache it, and always return the same. This is a common use case, and people were used to doing this explicitly before Java 8; for example, declaring a specific Comparator instance in a static final variable.

· There’s no additional performance cost because this translation has to be performed, and its result linked, only when the lambda is invoked for the first time. All subsequent invocations can skip this slow path and call the formerly linked implementation.

D.4. Code-generation strategies

A lambda expression is translated into bytecode by putting its body into one of a static method created at runtime. A stateless lambda, one that captures no state from its enclosing scope, like the one we defined in listing D.2, is the simplest type of lambda to be translated. In this case the compiler can generate a method having the same signature of the lambda expression, so the result of this translation process can be logically seen as follows:

public class Lambda {

Function<Object, String> f = [dynamic invocation of lambda$1]

static String lambda$1(Object obj) {

return obj.toString();

}

The case of a lambda expression capturing final (or effectively final) local variables or fields, as in the following example, is a bit more complex:

public class Lambda {

String header = "This is a ";

Function<Object, String> f = obj -> header + obj.toString();

}

In this case the signature of the generated method can’t be the same as the lambda expression, because it’s necessary to add extra arguments to carry the additional state of the enclosed context. The simplest solution to achieve this is to prepend the arguments of the lambda expression with an additional argument for each of the captured variables, so the method generated to implement the former lambda expression will be something like this:

public class Lambda {

String header = "This is a ";

Function<Object, String> f = [dynamic invocation of lambda$1]

static String lambda$1(String header, Object obj) {

return obj -> header + obj.toString();

}

More information about the translation process for lambda expressions can be found here: http://cr.openjdk.java.net/~briangoetz/lambda/lambda-translation.html.