RFR: 8344232: [PPC64] secondary_super_cache does not scale well: C1 and interpreter

Martin Doerr mdoerr at openjdk.org
Tue Jan 21 17:22:42 UTC 2025


On Tue, 21 Jan 2025 14:33:15 GMT, Richard Reingruber <rrich at openjdk.org> wrote:

>> PPC64 implementation of https://github.com/openjdk/jdk/commit/ead0116f2624e0e34529e47e4f509142d588b994. I have implemented a couple of rotate instructions.
>> The first commit only implements `lookup_secondary_supers_table_var` and uses it in C2. The second commit makes the changes to use it in the interpreter, runtime and C1.
>> C1 part is refactored such that the same code as before this patch is generated when `UseSecondarySupersTable` is disabled. Some stubs are modified to provide one more temp register.
>> 
>> Performance difference can be observed when C2 is disabled (measured on Power10):
>> 
>> 
>> -XX:TieredStopAtLevel=1 -XX:-UseSecondarySupersTable:
>> SecondarySuperCacheHits.test  avgt   15  13.028 ± 0.005  ns/op
>> SecondarySuperCacheInterContention.test     avgt   15  417.746 ± 19.046  ns/op
>> SecondarySuperCacheInterContention.test:t1  avgt   15  417.852 ± 17.814  ns/op
>> SecondarySuperCacheInterContention.test:t2  avgt   15  417.641 ± 23.431  ns/op
>> SecondarySuperCacheIntraContention.test  avgt   15  340.995 ± 5.620  ns/op
>> 
>> 
>> 
>> -XX:TieredStopAtLevel=1 -XX:+UseSecondarySupersTable:
>> SecondarySuperCacheHits.test  avgt   15  14.539 ± 0.002  ns/op
>> SecondarySuperCacheInterContention.test     avgt   15  25.667 ± 0.576  ns/op
>> SecondarySuperCacheInterContention.test:t1  avgt   15  25.709 ± 0.655  ns/op
>> SecondarySuperCacheInterContention.test:t2  avgt   15  25.626 ± 0.820  ns/op
>> SecondarySuperCacheIntraContention.test  avgt   15  22.466 ± 1.554  ns/op
>> 
>> 
>> `SecondarySuperCacheHits` seems to be slightly slower, but `SecondarySuperCacheInterContention` and `SecondarySuperCacheIntraContention` are much faster (when C2 is disabled).
>
> src/hotspot/cpu/ppc/c1_Runtime1_ppc.cpp line 607:
> 
>> 605:                        super_klass = R4,
>> 606:                        temp1_reg = R6;
>> 607:         __ check_klass_subtype_slow_path(sub_klass, super_klass, temp1_reg, noreg); // may return with CR0.eq if successful
> 
> The comment is unclear to me. Where is the result of the subtype check? Can it also return with CR0.ne if successful?
> I noticed you added the `crandc` to `check_klass_subtype_slow_path_linear()` but if we reach there calling from this location then the `crandc` is not emitted because `L_success == nullptr`. Is this ok?
> I'd appreciate comments on the masm methods explaining how the result of the subtype check is conveyed.

The correct result is always in CR0 with this PR.
"return" means "blr", here. That can optionally be used in case of success. In this case, CR0 is always "eq".
I've moved the `crandc` instruction into `check_klass_subtype_slow_path_linear` which contains such a "blr" for a success case. This way, the linear version works exactly as before.
The new code `check_klass_subtype_slow_path_table` doesn't use "blr". That's why I added "may" to the comment.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22881#discussion_r1924113575


More information about the hotspot-dev mailing list