RFR: 8344232: [PPC64] secondary_super_cache does not scale well: C1 and interpreter
Martin Doerr
mdoerr at openjdk.org
Tue Jan 21 17:22:42 UTC 2025
On Tue, 21 Jan 2025 14:33:15 GMT, Richard Reingruber <rrich at openjdk.org> wrote:
>> PPC64 implementation of https://github.com/openjdk/jdk/commit/ead0116f2624e0e34529e47e4f509142d588b994. I have implemented a couple of rotate instructions.
>> The first commit only implements `lookup_secondary_supers_table_var` and uses it in C2. The second commit makes the changes to use it in the interpreter, runtime and C1.
>> C1 part is refactored such that the same code as before this patch is generated when `UseSecondarySupersTable` is disabled. Some stubs are modified to provide one more temp register.
>>
>> Performance difference can be observed when C2 is disabled (measured on Power10):
>>
>>
>> -XX:TieredStopAtLevel=1 -XX:-UseSecondarySupersTable:
>> SecondarySuperCacheHits.test avgt 15 13.028 ± 0.005 ns/op
>> SecondarySuperCacheInterContention.test avgt 15 417.746 ± 19.046 ns/op
>> SecondarySuperCacheInterContention.test:t1 avgt 15 417.852 ± 17.814 ns/op
>> SecondarySuperCacheInterContention.test:t2 avgt 15 417.641 ± 23.431 ns/op
>> SecondarySuperCacheIntraContention.test avgt 15 340.995 ± 5.620 ns/op
>>
>>
>>
>> -XX:TieredStopAtLevel=1 -XX:+UseSecondarySupersTable:
>> SecondarySuperCacheHits.test avgt 15 14.539 ± 0.002 ns/op
>> SecondarySuperCacheInterContention.test avgt 15 25.667 ± 0.576 ns/op
>> SecondarySuperCacheInterContention.test:t1 avgt 15 25.709 ± 0.655 ns/op
>> SecondarySuperCacheInterContention.test:t2 avgt 15 25.626 ± 0.820 ns/op
>> SecondarySuperCacheIntraContention.test avgt 15 22.466 ± 1.554 ns/op
>>
>>
>> `SecondarySuperCacheHits` seems to be slightly slower, but `SecondarySuperCacheInterContention` and `SecondarySuperCacheIntraContention` are much faster (when C2 is disabled).
>
> src/hotspot/cpu/ppc/c1_Runtime1_ppc.cpp line 607:
>
>> 605: super_klass = R4,
>> 606: temp1_reg = R6;
>> 607: __ check_klass_subtype_slow_path(sub_klass, super_klass, temp1_reg, noreg); // may return with CR0.eq if successful
>
> The comment is unclear to me. Where is the result of the subtype check? Can it also return with CR0.ne if successful?
> I noticed you added the `crandc` to `check_klass_subtype_slow_path_linear()` but if we reach there calling from this location then the `crandc` is not emitted because `L_success == nullptr`. Is this ok?
> I'd appreciate comments on the masm methods explaining how the result of the subtype check is conveyed.
The correct result is always in CR0 with this PR.
"return" means "blr", here. That can optionally be used in case of success. In this case, CR0 is always "eq".
I've moved the `crandc` instruction into `check_klass_subtype_slow_path_linear` which contains such a "blr" for a success case. This way, the linear version works exactly as before.
The new code `check_klass_subtype_slow_path_table` doesn't use "blr". That's why I added "may" to the comment.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/22881#discussion_r1924113575
More information about the hotspot-dev
mailing list