Implementations Galore - Python (2016)

Python (2016)

CHAPTER 22: Implementations Galore

Near the beginning of this book, we talked about the different “compilers” or implementations available in Python. Just as there is a battle raging about which Python version to use, many are also debating about which implementation is best. Now that you have learned what you can in basic Python programming, you should be ready for this discussion as well. Our objective in this chapter will not be to decide which implementation is best, but to try and define each according to what they can do.

Notice that when people talk about the “Python” language, most of them do not just mean Python but also CPython, the default Python implementation. In reality, the term “Python” refers to the language’s specification—something that can then be implemented in different ways. Python is basically an interface (which nullifies the common question of whether Python is compiled or interpreted—these are characteristics of implementations).

As you go through the different Python implementations, you will inevitably hear about the terms “machine code” and “byte code”. There is a good difference between the two. Machine code is what C compiles to, something that is subsequently run through the processor directly. Each of the instructions sent to your machine’s CPU tells it to move things around. On the other hand, bytecode is where Java compiles to, something that is then run using the Java Virtual Machine. This is in essence an abstraction—that of a computer, which has the same ability to execute programs. Each of the instructions are being handled by its virtual machine, in turn interacting with your computer.

When it comes to performance, we can shorten the comparison to this: bytecode can be very portable and secure though machine code beats it when it comes to speed. Because of its nature, machine code can look different depending on the machine you are programming on. This makes it optimized depending on your setup. Bytecode, on the other hand, can be homogeneous.

Many beginners to Python are told that Python is a “compiled language” because of its .pyc files. There’s only a half truth to this, as the .pyc is a compiled bytecode. This is then interpreted. This means that if you have run the Python code once, having the .pyc file will make it run faster the next time. This is because the bytecode no longer has to be compiled.

CPython. This is the de facto standard, and the reference implementation which has been written in C. We have discussed this near this book’s start, but it is worthwhile to note some of its other defining features. CPython works by compiling the code to intermediate bytecode. This is then interpreted later by a virtual machine. This implementation provides the best compatibility level between Python’s packages and the extension modules of C.

If you will be writing code that is open source and you wish to reach as much audience as you can, then it is best to target CPython. Besides, this is the sole implementation option, if you will be relying on C functions for your code to function. Note that all the Python versions (that of the language itself) are implemented in C—this is because of CPython’s use as the reference implementation.

PyPy. This is an interpreter that is implemented in RPython—a restricted and statistically-typed Python subset. As previously mentioned, it has a just-in-time compiler with support for multiple backends (CLI, C, and JVM).

PyPy has been hailed by some stalwarts as the “future of Python”. This may be a justly-earned title—it is built for maximum compatibility with the CPython reference, while also improving performance. So if you wish to increase your code’s performance, PyPy should be on your radar—based on some benchmarks, it can be more than 5 times faster than CPython!

As of this point, PyPy has released a beta that targets Python 3. PyPy’s default version supports version 2.7.

Jython. This is an implementation that compiles the code into Java bytecode. This is then executed by a Java Virtual Machine (JVM). It is also able to additionally import and utilize any Java class as it would a Python module.

Jython is best used if you anticipate interfacing with an existing codebase in Java. It can also be used for any reason when you need to code in Python for the JVM. As of the time of this writing, it supports Python 2.7.

IronPython. This is an implementation meant for the .NET framework. It has the ability to use both .NET and Python libraries, while also retaining the ability to expose the code in Python to other languages within the .NET framework.

It also provides access to the Python Tools for Visual Studio, which then directly integrates into the development environment of Visual Studio. This makes it perfect for Windows-based developers.

PythonNet. This is a package (also called Python for .Net) which helps provide near-flawless integration of natively installed Python installations together with the .NET CLR (Common Language Runtime). This is an inverse approach than the one taken by IronPython. However, instead of competing, the approach comes across as complementary.

Together with Mono, this package enables the native installations to work across non-Windows operating systems—Linux and OS X, as examples. This allows the native Python installation to operate in the .NET framework. It can also be run alongside IronPython without issues.

Cython. This is a Python superset which also includes all the bindings needed to call the different C functions. It allows the programmer to use thes extensions to the Python code. It also allows the addition of static typing into the existing Python code, allowing compilation and letting it reach performance akin to C.

In this view, it can be seen as similar to PyPy. However, it is not the same. In this case, typing in the code is being enforced before it is passed to a compiler. In comparison, PyPy enforces writing in simple Python while the compiler itself is responsible for adding optimizations.

Numba. This is also a “just-in-time” compiler, however it adds the feature to an annotated Python code. In the simplest sense, the programmer just gives Numba the hints, and it takes care of the optimization. This comes as a part of Anaconda, a distribution that offers a set of packages needed for data analysis and management.

Ipython. This is different from all that has been discussed here—it is a computing environment which offers an interactive environment, with added support for browser experience, GUI toolkits, and more.

Psyco. This is an extension module, and is one of the first just-in-time efforts. It has since been marked as “dead” (unmaintained), and its lead developer now works for PyPy.

Note that all these differences wouldn’t exactly tell you which implementation to use unless you have a very specific need that can be answered by one or two among the list. In the first place, the list that we have is not even exhaustive. All Python implementations may differ in their behavior, and they surely differ in the way the Python source code is tossed around. However, these differences rarely play a huge part in the end result of the code’s capabilities. As it goes, the differences tend to disappear over time.

Spotlight on PyPy

Earlier, we mentioned an implementation that some are calling the “future of Python”. We included that for a reason. Despite what we said in the preceding paragraph, we felt it was important to shed more light on this implementation alone to at the very least give you a more detailed idea of how implementations work (and how complicated they are).

Remember how CPython is written in C, and some others in Java and .NET? Well, PyPy is a Python implementation written in Python. If you remember the previous definition of Python as a language, this is just a wee bit paradoxical. And here’s where things are going to get confusing.

We already mentioned how machine code is faster than bytecode. Someone then had the great idea to compile some of the bytecode and run it as native code. This will, inevitably, cost us something—like the time it would need to compile the bytecode. However, the end result would make the implementation much faster! This sort of thinking is what lied behind the just-in-time compilation, affectionately called JIT by its users.

The hybrid technique will produce a product that combines the benefits of both compilers and interpreters. In the most basic terms, a JIT system like PyPy will use compilation in order to speed up interpreted systems.

Here is a common approach that is being taken by JITs:

1. Bytecode that has to be frequently executed needs to be identified.

2. This is then compiled into native (machine) code.

3. The result is cached.

4. When the exact same bytecode is set to run, the pre-compiled machine code will be grabbed for speed boosts.

For many, this is what makes PyPy so unique—not that it is alone, since there have been other efforts before. The thing is, PyPy is so much more than its predecessors—it also aims to be light on memory, support cross-platform compatibility, and be stackless-supportive. As mentioned earlier, it is also highly compatible with the de facto CPython compiler, meaning it can also run Django, Flask, and similar others.

Despite all the sunshine and rainbows, there is also a lot of confusion surrounding PyPy. For example, someone even submitted a proposal to create a “PyPyPy”—a nonsensical attempt to create an even faster implementation. Part of the reason for the confusion is because PyPy is a couple of things:

1. As mentioned earlier, it is an interpreter that is written in Python-RPython to be exact, a subset of the language with static typing. In the Python language, it is mostly impossible to rigorously reason about types. Why? Here is an example:

a=random.code([1,”spam”])

This is a valid Python code, but what is the typeof the x? How can one reason about variable types when these are not even strictly enforced? Some flexibility is sacrificed in Rpython, but instead it is a lot easier to reason about things such as memory management. This allows for a good deal of optimization.

2. PyPy is a type of compiler that compiles the Rpython code for different targets. Of course, it also adds in JIT. By default, the platform is C (as in, Rpython-to-C), though others such as JVM can also be added.

Now, how do these two items add to the confusion? Let's take this further. Think of the first definition as an interpreter that is written in RPython. It will take the user's code, compiling it down into bytecode. However, since the interpreteris written in Python, another implementation is needed to interpret it.

The creators of PyPy could have just used CPython, but that is not very fast. Instead, the second definition of PyPy comes into play. This is also called the “RPython Toolchain”. It compiles the entire PyPy interpreter to code for a different platform, such as CLI, JVM, or C. Again, JIT is thrown in—adding this to the interpreter essentially generates its very own compiler. If you think of it, it is kind of crazy—you are compiling an interpreter, in the process adding another standalone separate compiler. No matter how roundabout it may see, it works.

The overall result is an executable, a standalone one that interprets the source code while exploiting optimizations through JIT. If you dare to think about the abstract nature of it all, you can be writing an interpreter for any language, throw it to PyPy, and get the JIT magic to work. This is because the interpreter focuses on the optimization of the interpreter itself, not just the details of the language that is being interpreted.

CONCLUSION

Python is a fun language. That alone makes it stand out as not every language can claim the simplicity and the readability that are inherent to Python. This also makes Python very useful, as its ease of learning and deployment can make it suitable for a number of tasks (though the same trait also arguably disqualifies it for others).

So what are you waiting for? This book has given you all that you need to jump in. The next step for you is to apply what you have learned, seek further study, and master the Python language! Now that Python programmers are gaining prominence, you have just gained a new qualification that can take you to even greater heights.