Performance impact of decommissioning arrayStorageProperties to legacy code.
sergey.kuksenko at oracle.com
Wed Jun 10 04:52:26 UTC 2020
New analysis was done with modified benchmark to cover polymorphic
array store. Array store was mixed for array of Object, array of
interface, array of abstract class and array of concrete class.
Here are performance results for polymorphic array store:
|baseline(ns)| v-66 (ns) | v-72 (ns) | v-66/baseline | v-72/baseline | v-72/v-66
G1GC (compressedOops) : 380 | 445 | 420 | -17.1% | -10.5% | 5.6%
G1GC (uncompressedOops) : 300 | 400 | 390 | -33.3% | -30.0% | 2.5%
ParallelGC (compressedOops) : 310 | 360 | 350 | -16.1% | -12.9% | 2.8%
ParallelGC (uncompressedOops) : 284 | 330 | 300 | -16.2% | -5.6% | 9.1%
ZGC (uncompressedOops) : 285 | 314 | 310 | -10.2% | -8.8% | 1.3%
EpsisonGC (compressedOops) : 284 | 340 | 320 | -19.7% | -12.7% | 5.9%
EpsisonGC (uncompressedOops) : 277 | 294 | 300 | -6.1% | -8.3% | -2.0%
New column added - speedup v-72 over v-66.
For polymorphic array store the picture is not so bright, but anyway
Decommission arrayStorageProperties gives performance speedup (except 1
In case of polymorphic array store access to Klass is performed
always, and clearing extra bits from klass ptr has negative effect. By
the way, which field of Klass has offset 0xE8?
What is interesting - it's quite large difference between baseline
and both Valhalla versions in case of G1GC.
Comparing generated code of baseline and v-72 it was found two
1. Different layout of basic blocks (some jumps are reverted, je ->
But it shouldn't be the source of regression, profiling has shown
that number of branches and branch-missed the same for baseline and
2. Access to layout helper and checking if it's array of values.
Tobias, What do you think? Does it make sense to play with layout
helper? Nothing prevents us to make 1 bit tags and test & jump and check
what we get?
On 6/9/20 8:13 AM, Tobias Hartmann wrote:
> Hi Sergey,
> thanks again for the nice report! Comments below.
> On 09.06.20 06:43, Sergey Kuksenko wrote:
>> Note: Unroll and out of hoisting was happened only for ZGC, ParallelGC and EpsilonGC. It was not
>> done for G1 by unknown reason. Maybe this need attention.
> That's unexpected. Is it the same with mainline?
>> Decommission arrayStorageProperties has positive performance effect for aastore operation in any
>> conditions. The really nice fact that aastore completely doesn't have negative performance effects
>> for legacy code in Valhalla. The fact is klass ptr is loaded for every aastore operation and
>> checked if runtime of the array is Object (for this benchmark it's the simplest form of array
>> store check). In v-66 arrayStorageProperties bits clearing should be done.
>> In v-72 there are no Valhalla checks at all (we already checked if it's Object - don't need to
>> do anything else).
> Right. This is because C2 speculates on the array being monomorphic (MomorphicArrayCheck
> optimization) and we can then omit all inline type specific checks. Have you checked with a
> polymorphic array store? In that case you should see flat/null-free checks and these will have an
> impact on performance.
More information about the valhalla-dev