Valhalla/MVT microbenchmarks (first benchmarks and first results)
Sergey Kuksenko
sergey.kuksenko at oracle.com
Thu Jul 27 18:14:27 UTC 2017
Thank you for pointing me out.
I did some evaluations in that area and updated my report.
I've collected "time to performance" metrics for derived value types
with CountedLoops.
arraysum...mhie_derivedLoop reaches peak performance after ~45 SECONDS
of execution.
matrix.MHIE_Derived...TotalLoop reaches peak performance after more than
220 SECONDS of execution.
Such time to performance is too big and should be evaluated. That is why
I decided do not increase warmup time of my benchmarks.
Also such slow warmup can't be explained simply by slower
reachingcompilation thresholds.
Derived value types (with counted loops) are ~3x-4x times slower than
boxed types analogues in the interpreter, but time to performance is
~20x-40x times worse.
I can only guess that LambdaForms machinery has it's own counters and
thresholds before generating the final bytecode.
On 07/27/2017 05:27 AM, Roland Westrelin wrote:
>> The only exception seems to be those multIJKTotalLoop/multIKJTotalLoop
>> benchmarks, in which, as you point out, something is preventing full
>> MH compilation. I'm sure our C2 gurus will look into that soon.
> Those benchmarks apparently need a warmup that's a lot longer. I see
> performance improved dramatically at warmup iteration 83 with
> MH_Derived.multIJKTotalLoop.
>
> Compilation heuristics work by counting the number of invocations of a
> method and adding the number of times a backbranch is taken in a
> method. Once that value crosses a threshold compilation is triggered. It
> looks like it doesn't work well with the loop combinators. Maybe the
> actual loop is in some method that gets compiled early but for maximum
> performance we need some caller of that method to be compiled so method
> handles are known constant and that only happens much later.
>
> Roland.
--
Best regards,
Sergey Kuksenko
More information about the valhalla-dev
mailing list