RFR: 8344232: [PPC64] secondary_super_cache does not scale well: C1 and interpreter

Richard Reingruber rrich at openjdk.org
Tue Jan 21 14:35:38 UTC 2025


On Wed, 25 Dec 2024 15:40:23 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

> PPC64 implementation of https://github.com/openjdk/jdk/commit/ead0116f2624e0e34529e47e4f509142d588b994. I have implemented a couple of rotate instructions.
> The first commit only implements `lookup_secondary_supers_table_var` and uses it in C2. The second commit makes the changes to use it in the interpreter, runtime and C1.
> C1 part is refactored such that the same code as before this patch is generated when `UseSecondarySupersTable` is disabled. Some stubs are modified to provide one more temp register.
> 
> Performance difference can be observed when C2 is disabled (measured on Power10):
> 
> 
> -XX:TieredStopAtLevel=1 -XX:-UseSecondarySupersTable:
> SecondarySuperCacheHits.test  avgt   15  13.028 ± 0.005  ns/op
> SecondarySuperCacheInterContention.test     avgt   15  417.746 ± 19.046  ns/op
> SecondarySuperCacheInterContention.test:t1  avgt   15  417.852 ± 17.814  ns/op
> SecondarySuperCacheInterContention.test:t2  avgt   15  417.641 ± 23.431  ns/op
> SecondarySuperCacheIntraContention.test  avgt   15  340.995 ± 5.620  ns/op
> 
> 
> 
> -XX:TieredStopAtLevel=1 -XX:+UseSecondarySupersTable:
> SecondarySuperCacheHits.test  avgt   15  14.539 ± 0.002  ns/op
> SecondarySuperCacheInterContention.test     avgt   15  25.667 ± 0.576  ns/op
> SecondarySuperCacheInterContention.test:t1  avgt   15  25.709 ± 0.655  ns/op
> SecondarySuperCacheInterContention.test:t2  avgt   15  25.626 ± 0.820  ns/op
> SecondarySuperCacheIntraContention.test  avgt   15  22.466 ± 1.554  ns/op
> 
> 
> `SecondarySuperCacheHits` seems to be slightly slower, but `SecondarySuperCacheInterContention` and `SecondarySuperCacheIntraContention` are much faster (when C2 is disabled).

src/hotspot/cpu/ppc/c1_Runtime1_ppc.cpp line 607:

> 605:                        super_klass = R4,
> 606:                        temp1_reg = R6;
> 607:         __ check_klass_subtype_slow_path(sub_klass, super_klass, temp1_reg, noreg); // may return with CR0.eq if successful

The comment is unclear to me. Where is the result of the subtype check? Can it also return with CR0.ne if successful?
I noticed you added the `crandc` to `check_klass_subtype_slow_path_linear()` but if we reach there calling from this location then the `crandc` is not emitted because `L_success == nullptr`. Is this ok?
I'd appreciate comments on the masm methods explaining how the result of the subtype check is conveyed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22881#discussion_r1923838785


More information about the hotspot-dev mailing list