0xDEADBEEF

RSS odkazy
««« »»»

PHP refcount as optimization

9. 7. 2024 #PHP #optimalizace #en

The PHP refcount mechanism can be used for optimizations.

It's been used for that purpose for years. When the code needs to change one of the two by-value types – string or array – first it checks the number of references and when the value of the refcount equals one, meaning only the current code holds the reference and change will be seen by no one else. In this case, it can modify the data of the strings and arrays directly in-place without allocating a copy first.

But it can also be used directly.

For example the function unpack for parsing binary data must return a array (for some reason indexed from one) even if it parsed only one int. As a result, the simple unpack('v', $str)[1] takes 75 ns, which is shamefully slow. Essentially, it's just three instructions: cmp + jb to check if there's enough bytes in the string and one mov. This should not take 75 ns but one clock cycle.

However, I can optimize a case when I'm parsing just one integer a little bit. If I take advantage of the fact that the array is by-value type, I can implement it like this (pseudo-code, it would be a native function):

function unpack($str) {
  static $arr1 = new array [1 => null];

  if (refcount($arr) > 1) {
    $arr1 = new array [1 => null];
  }

  $arr1[1] = chomp2Bytes($str);
  return $arr1;
}

The changed `unpack' function would internally always hold one reference to the array and repeatedly use it as the return value.

One of two situations may happen:

  1. The caller code changes the returned array. We don't mind that because the array is a by-value and the program can change such objects only in-place if they have refcount = 1 (see macro SEPARARATE_ARRAY). And since unpack always holds one reference and the caller code the other, the array must be copied and only the copy will be changed. The version held by the unpack will be unmodified.
  2. The caller code saves the returned array in a variable. This situation we detect directly in the next call to unpack because suddenly the refcount is higher than 1 and we simply allocate a new array. But this happens very rarely, at least for simple cases, when I extract only one int and I don't use the syntax for named results à la unpack('Vid\Vtimestamp\vflags').

With this optimization, unpack('v', $str)[1] takes 50 ns (time measured in the interpreter, JIT turned off). One third better, but still shamefully slow.

The main problem with the unpack function is its complex semantic, where it has to parse a pattern string (in this case just 'v', but unpack has to be able to handle all cases with complicated patterns like unpack('Vid/Vdate/Ccount/V*children', $str) and then, in a gigantic switch, jump on those few instructions that do actual useful work interpreting some number of bytes as an int or float of a given length, a given signedness and a given endiannes. This logic ale will take good 25 ns.

That was, in the end, the reason I'm writing this as an article and not as a PHP patch. Even with an effort of all my strength, I can't get into numbers that I wouldn't have to be ashamed of.


To give you an idea of how the unpack function stands compared to the native alternative (test code here, PHP JIT disabled, CPU i5-4570):

*                     ns/iter
empty loop               7.87
noop()                  15.20
unpackInt32LE()         16.26
unpack('V', $str)[1]    72.18

The first line is an empty loop to estimate the necessary overhead of the interpreter.

noop() calls a native function that does nothing. This tests the overhead of the loop + the call of a native functions.

unpackInt32LE is a native function that takes 4 bytes from the string and interprets them as little endian integer.

It is the following function (in language D), which is exported to PHP via my (so far private) library for easy creation of PHP extensions.

uint unpackInt32LE(scope const(ubyte)[] str) {
  pragma(inline, true);
  if (str.length < 4) throw new Exception("not enough bytes");
  return *(cast(uint*) str.ptr);
}

The last line unpack('V', $str)[1]' includes time of iteration, native function calls, and the intrinsic unpack' machinations.

The int extraction itself (+ parsing function arguments) takes up one nanosecond in this case, loop overhead + the native function call is 15× larger and the unpack itself is still 5× slower than this. There's no point in trying to optimize the inefficiently designed function when the native option is so much faster and just using FFI, without any problems surpasses the speed of unpack.

píše k47 (@kaja47, k47)