Valhalla/MVT microbenchmarks (first benchmarks and first results)

Thu Jul 27 18:14:27 UTC 2017

Thank you for pointing me out.

I did some evaluations in that area and updated my report.

I've collected "time to performance" metrics for derived value types 
with CountedLoops.
arraysum...mhie_derivedLoop reaches peak performance after ~45 SECONDS 
of execution.
matrix.MHIE_Derived...TotalLoop reaches peak performance after more than 
220 SECONDS of execution.
Such time to performance is too big and should be evaluated. That is why 
I decided do not increase warmup time of my benchmarks.

Also such slow warmup can't be explained simply by slower 
reachingcompilation thresholds.
Derived value types (with counted loops) are ~3x-4x times slower than 
boxed types analogues in the interpreter, but time to performance is 
~20x-40x times worse.
I can only guess that LambdaForms machinery has it's own counters and 
thresholds before generating the final bytecode.

On 07/27/2017 05:27 AM, Roland Westrelin wrote:
>> The only exception seems to be those multIJKTotalLoop/multIKJTotalLoop
>> benchmarks, in which, as you point out, something is preventing full
>> MH compilation.  I'm sure our C2 gurus will look into that soon.
> Those benchmarks apparently need a warmup that's a lot longer. I see
> performance improved dramatically at warmup iteration 83 with
> MH_Derived.multIJKTotalLoop.
>
> Compilation heuristics work by counting the number of invocations of a
> method and adding the number of times a backbranch is taken in a
> method. Once that value crosses a threshold compilation is triggered. It
> looks like it doesn't work well with the loop combinators. Maybe the
> actual loop is in some method that gets compiled early but for maximum
> performance we need some caller of that method to be compiled so method
> handles are known constant and that only happens much later.
>
> Roland.

-- 
Best regards,
Sergey Kuksenko