Michael Maclean

PHP at the speed of C

For common cases, most dynamic languages are “fast enough”. For most PHP applications, the most overhead is probably hitting network services like the database, and this accounts for far more of the run time than the speed of the language’s function calls. Also, PHP is a fairly thin glue layer on top of a lot of C libraries, which mean that for any significant algorithmic work, you’re probably already calling out to a C library underneath. If this wasn’t the case, basic operations like string searches and regular expressions would be extremely slow.

However, every now and again you do want to do something with PHP that does involve complex algorithms, for which you might not have a C library to hand. Even if you do have a C library or the ability to create one, writing extensions for PHP to expose this functionality has a bit of a learning curve. Wouldn’t it be nice if we could write code in PHP, and have it accelerated somehow to C-like speeds?

The Mandelbrot set

The Mandelbrot set is a set of complex numbers generated using an algorithm that involves a large number of iterations. When plotted with an appropriate colouring scheme, they make fractals which can look very nice. Plotting them is relatively simple, though it’s quite CPU intensive. I have written some fairly naïve code to generate these images, which you can find in this gist. The output looks like this:

Mandelbrot set

There’s a fair bit of code there, but the function that takes most of the time is the mandelbrot() one. It’s where all the iteration happens. On the VM I’m testing this in1 it takes around 80 seconds to run, under a release build of PHP 5.6.5:

vagrant@vagrant-ubuntu-trusty-64:~/src$ php colourmandel.php
Data took 78.744249
Render took 0.631219

That’s pretty slow, by any metric. It’s iterating up to 4096 times for every pixel in an 800x512 image, so unlike most PHP apps it’s going to be CPU-bound rather than dependent on the network or other services. I wonder if there’s a way to speed it up, without having to drop down to pure C?

HHVM

Well, one way is to install HHVM:

vagrant@vagrant-ubuntu-trusty-64:~/src$ hhvm mandelbrot.php
Data took 5.852351
Render took 0.236238

HHVM, in its default setup, features a Just-In-Time (JIT) compiler, that traces every code path that is executed. If it finds a path that is “hot” – executed a large number of times – it will jump in and compile that code branch down to straight machine code and execute that instead. As this code is essentially arithmetic taking place in a tight loop, it’s a perfect case for HHVM’s JIT compiler to jump in and optimize.

If you’ve got complete control over your infrastructure and don’t mind changing PHP implementations, HHVM could well be the way to go. It also has a number of other extras which may be of interest. But what if you don’t?

Recki Compiler Toolkit

Recently, Anthony Ferrara (known throughout the Internet and beyond as @ircmaxell) and Joe Watkins (similarly well-known as @krakjoe) have been working on a new set of toys for solving this problem while staying on the “standard” PHP runtime. Recki-CT is a set of tools that implement a PHP compiler, in PHP. While this might you think of things like PyPy, which implements a Python virtual machine in Python, this is not Recki’s goal – it doesn’t provide a VM, so it can’t run PHP by itself. However, it can parse PHP code and generate other code from it.

Recki uses the well-known PHP-Parser library by Nikita Popov to generate a graph-based representation of the code, and convert it to an intermediate representation. To get here involves a few steps, which are described in Recki’s documentation, but essentially:

  • It generates a tree-based representation of the code called an abstract syntax tree
  • From the AST, it generates a control flow graph
  • It then converts this graph into static single assignment form, where every variable is only assigned once, and all are defined before use. This makes it simpler to optimize.
  • Next, it repeatedly runs optimizations on this graph

This intermediate form is pretty low-level, and it is comparatively simple to generate code from it for a variety of targets. One of the targets Recki can use is a second component, JitFu, which is a PHP extension allowing us to generate machine code at run time.

First steps with Recki and JitFu

So, I’ve decided I’m not yet ready to throw out all my existing infrastructure, and that (perhaps) I’m tied to some extensions that currently only work on main-line PHP. Where do I start?

First off, I need to follow the instructions on Recki’s README, which include installing libjit and the JitFu extension. Then, we need to install Recki using Composer, and then add Composer’s autoloader to the top of the PHP file as normal. Once we have these in place, we can start to look at how to use JitFu to speed our code up.

composer require recki-ct/recki-ct

If you have a look at the mandelbrot() function in the file, we can see it does all the work itself. For each pixel in the image, it iterates over a mathematical function until either the absolute value of the complex number “escapes” (tends towards infinity), or it hits the maximum number of iterations that we’ve previously defined to be 4096.

For each row of pixels in the image it will create an array containing the number of iterations for each pixel. Then, it creates another array representing columns, which contains the list of rows. Finally, the render() function is used to convert the number of iterations into a colour, but we can ignore that for now as it’s fairly quick by comparison.

We’ll change the main() function so that it will attempt to precompile the mandelbrot() function, and then execute that. Also, we’ll need to tell Recki what the expected input and output types are for our function so that it knows what it’s dealing with. We do this using a docblock, in the same way that we would if we were using phpDocumentor or similar. You can see this version of the code here.

/**
 * @param double $xMin
 * @param double $xMax
 * @param double $yMin
 * @param double $yMax
 * @param int $xPx
 * @param int $yPx
 * @return array
 */
 function mandelbrot($xMin, $xMax, $yMin, $yMax, $xPx, $yPx)
 // ...

And now the changes to the main function – these replace lines 100 to 104:

    $start = microtime(true);
    $function = ReckiCT\Jit::jitfu('mandelbrot');
    printf("Compiling took %F\n", $mandelTime);

    $start = microtime(true);
    $data = $function($xMin, $xMax, $yMin, $yMax, $xPx, $yPx);
    $mandelTime = microtime(true) - $start;
    printf("Data took %F\n", $mandelTime);

Then we try and run it:

vagrant@vagrant-ubuntu-trusty-64:~/src/recki-ct$ php mandelbrotWithRecki.php

Fatal error: Uncaught exception 'LogicException' with message 'Found node without parser rule: Expr_Array' in /home/vagrant/src/recki-ct/lib/ReckiCT/Parser/Parser.php:112
Stack trace:
#0 /home/vagrant/src/recki-ct/lib/ReckiCT/Parser/Rules/Assign.php(41): ReckiCT\Parser\Parser->parseNode(Object(PhpParser\Node\Expr\Array_), Object(ReckiCT\Parser\State))
#1 /home/vagrant/src/recki-ct/lib/ReckiCT/Parser/Parser.php(109): ReckiCT\Parser\Rules\Assign->parse(Object(PhpParser\Node\Expr\Assign), Object(ReckiCT\Parser\State))
#2 /home/vagrant/src/recki-ct/lib/ReckiCT/Parser/Parser.php(96): ReckiCT\Parser\Parser->parseNode(Object(PhpParser\Node\Expr\Assign), Object(ReckiCT\Parser\State))
#3 /home/vagrant/src/recki-ct/lib/ReckiCT/Parser/Parser.php(86): ReckiCT\Parser\Parser->parseStmtList(Array, Object(ReckiCT\Parser\State))
#4 /home/vagrant/src/recki-ct/lib/ReckiCT/Jit.php(222): ReckiCT\Parser\Parser->parseFunction(Object(PhpParser\Node\Stmt\Function_))
#5 /home/vagrant/src/recki-ct/lib/ReckiCT/Jit.php(240): ReckiCT\Jit->getFunctionG in /home/vagrant/src/recki-ct/lib/ReckiCT/Parser/Parser.php on line 112

Oh. The problem here is that Recki is still pretty young (it’s only around 6 months since its first release), and it currently doesn’t handle arrays. That’s OK, we can refactor the code a little, and break out the part of the iteration that returns a number into a separate function, which looks like the one below. Note that Recki doesn’t currently do constant lookups either, so we pass the maximum number of iterations as a parameter.

/**
 * @param double $cReal
 * @param double $cImag
 * @param int $maxIterations
 * @return int
 */
function mandelbrotIteration($cReal, $cImag, $maxIterations)
{
    $originalReal = $cReal;
    $originalImag = $cImag;

    $ni = 0;
    while ($ni < $maxIterations && ($cReal * $cReal * $cImag * $cImag < 4.0)) {
        $tmp = $cReal * $cReal - $cImag * $cImag + $originalReal;
        $cImag = $cImag * $cReal * 2 + $originalImag;
        $cReal = $tmp;

        $ni++;
    }

    if ($ni == $maxIterations) {
        $ni = 0;
    }

    return $ni;
}

The new version of the file is here. Now, let’s try it:

vagrant@vagrant-ubuntu-trusty-64:~/src/recki-ct$ php mandelbrotAfterRefactor.php
Compiling took 1.537641
Data took 3.724597
Render took 0.595848

Not bad! It took about 1.5 seconds to compile, and then ran in 3.7 - somewhere around a 2100% improvement! We can do a little better too, as we can grab the intermediate representation that JitFu has made, and cache it.

    if (file_exists('mandelbrotIteration.ir')) {
        $start = microtime(true);
        $ir = file_get_contents('mandelbrotIteration.ir');
        $iterationFunc = ReckiCT\Jit::getInstance()->compileIrJitFu($ir, 'fibo');
        $loadTime = microtime(true) - $start;
        printf("Loading took %F\n", $loadTime);
    } else {
        $start = microtime(true);
        $iterationFunc = ReckiCT\Jit::jitfu('mandelbrotIteration');
        $compileTime = microtime(true) - $start;
        printf("Compiling took %F\n", $compileTime);
        file_put_contents('mandelbrotIteration.ir', 
            ReckiCT\Jit::getInstance()->getFunctionIr('mandelbrotIteration'));
    }

Follow along with the code here. Now let’s see how that looks:

vagrant@vagrant-ubuntu-trusty-64:~/src/recki-ct$ php mandelbrotWithCachedIR.php
Loading took 0.177504
Data took 3.636709
Render took 0.653733

Nice.

The next step with Recki

Recki has another trick up its sleeve that we’ve not seen yet. Earlier, I hinted that it can generate code for multiple targets. One of these targets happens to be C code, wrapped in a PHP extension. This might be an interesting option for people who are a little wary of running code like JitFu on their production servers. Also, it means we can throw all of the knowledge and tricks contained within our C compiler at the code as well, to get as much speed as possible.

To demonstrate this, let’s lift the mandelbrotIteration() function right out of the code, and put it in its own file. I’ll call it iteration.php, and you can see it here. I can then run this command, assuming I’m in the Recki-CT root directory:

php bin/compile_pecl compile -O mandelbrot iteration.php

I’m passing the -O option, which passes -O3 to the C compiler to dial up its own optimizations. You should see some familiar text scrolling past as Recki generates, configures, and compiles a PECL extension for us. Next, we can return to our Mandelbrot code rip out all the additions we put in for our JitFu experiments, and indeed remove the mandelbrotIteration() function entirely. I’ve chosen to wrap it inside a function_exists() check instead, so that I can test with or without the extension loaded to prove I get the same output. You can get that version here.

vagrant@vagrant-ubuntu-trusty-64:~/src/recki-ct$ php -dextension=./mandelbrot.so mandelbrotWithExtension.php
Data took 2.411557
Render took 0.637504

Now we’re down to 2.4 seconds. Not bad at all. Remember, we’re only optimizing the innermost loop here – once Recki is able to optimize functions using arrays, we could potentially speed it up even further.

Summary

Recki-CT is a very young project, so this is not the kind of thing I’d be looking to use in production quite yet, but it’s certainly something to keep an eye on. Particularly where we have certain functions within our applications that are very CPU-heavy, we might well be able to take a considerable amount of load off by optimizing just those functions and leaving the rest of our app stack as standard. As mentioned before, PHP has been talked about in the past as being a glue layer between C libraries and application code. If we can write PHP and generate optimized C code from that, then the barrier to entry for taking advantage of this is considerably lower.

Update

For comparison, I translated the code to straight C reasonably directly, which you can see here. It’s a standalone C program, not a PHP extension. It runs in around 2.1 seconds on the same VM:

vagrant@vagrant-ubuntu-trusty-64:~/src/Mandelbrot-browser$ ./basic-mandelbrot
Data took 2.107163
Render took 0.000840

1. I realise that VMs don't give a terribly accurate benchmark, but these tests deal with orders of magnitude, so they'll reflect the speed difference enough to be getting on with.