RFR: 8331658: secondary_super_cache does not scale well: C1 [v2]
Vladimir Ivanov
vlivanov at openjdk.org
Mon Jun 3 19:44:41 UTC 2024
On Wed, 29 May 2024 09:32:41 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> This is the C1 version of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450).
>>
>> The new logic in this PR is as simple as I can make it. It is a somewhat-simplified version of the C2 change in [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). In order to reduce risk I haven't touched the existing slow subtype stub.
>> The register allocation logic in the existing code is pretty gnarly, and I have no desire to break anything at this point in the release cycle, so I have allocated just one register more than the existing code does.
>>
>> Performance is pretty good. Before and after:
>>
>> x64, AMD 2950X, 8 cores:
>>
>>
>> Benchmark Mode Cnt Score Error Units
>> SecondarySuperCacheHits.test avgt 5 0.959 ± 0.091 ns/op
>> SecondarySuperCacheInterContention.test avgt 5 42.931 ± 6.951 ns/op
>> SecondarySuperCacheInterContention.test:t1 avgt 5 42.397 ± 7.708 ns/op
>> SecondarySuperCacheInterContention.test:t2 avgt 5 43.466 ± 8.238 ns/op
>> SecondarySuperCacheIntraContention.test avgt 5 74.660 ± 0.127 ns/op
>>
>> SecondarySuperCacheHits.test avgt 5 1.480 ± 0.077 ns/op
>> SecondarySuperCacheInterContention.test avgt 5 1.461 ± 0.063 ns/op
>> SecondarySuperCacheInterContention.test:t1 avgt 5 1.767 ± 0.078 ns/op
>> SecondarySuperCacheInterContention.test:t2 avgt 5 1.155 ± 0.052 ns/op
>> SecondarySuperCacheIntraContention.test avgt 5 1.421 ± 0.002 ns/op
>>
>> AArch64, Mac M3, 8 cores:
>>
>>
>> Benchmark Mode Cnt Score Error Units
>> SecondarySuperCacheHits.test avgt 5 0.835 ± 0.021 ns/op
>> SecondarySuperCacheInterContention.test avgt 5 74.078 ± 18.095 ns/op
>> SecondarySuperCacheInterContention.test:t1 avgt 5 81.863 ± 42.492 ns/op
>> SecondarySuperCacheInterContention.test:t2 avgt 5 66.293 ± 11.254 ns/op
>> SecondarySuperCacheIntraContention.test avgt 5 335.563 ± 6.171 ns/op
>>
>> SecondarySuperCacheHits.test avgt 5 1.212 ± 0.004 ns/op
>> SecondarySuperCacheInterContention.test avgt 5 0.871 ± 0.002 ns/op
>> SecondarySuperCacheInterContention.test:t1 avgt 5 0.626 ± 0.003 ns/op
>> SecondarySuperCacheInterContention.test:t2 avgt 5 1.115 ± 0.006 ns/op
>> SecondarySuperCacheIntraContention.test avgt 5 0.696 ± 0.001 ns/op
>>
>>
>>
>> The first test, `SecondarySuperCacheHits`, showns a small regression. It's...
>
> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:
>
> JDK-8331658: secondary_super_cache does not scale well: C1
It's unfortunate to see C1-specific version of secondary supers table lookup. Why don't you reuse `MacroAssembler::lookup_secondary_supers_table` instead?
Also, in the context of C1, do performance benefits justify additional implementation complexity? As an alternative, migrating `MacroAssembler::check_klass_subtype_slow_path` away from linear search to a table lookup would also do the job and cover all cases of subtype checks in the JVM.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19426#issuecomment-2145980997
More information about the hotspot-compiler-dev
mailing list