RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v23]
Gui Cao
gcao at openjdk.org
Wed Oct 9 12:46:06 UTC 2024
On Mon, 9 Sep 2024 13:32:24 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>>
>> HotSpot shared runtime
>> ----------------------
>>
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>>
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>>
>> Having said that, the software popcount is really not bad.
>>
>> x86
>> ---
>>
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>>
>> AArch64
>> -------
>>
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>>
>> Generally
>> ---------
>>
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>>
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>>
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>>
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits:
>
> - Merge from 4ff72dc57e65e99b129f0ba28196994edf402018
> - Fix s390
> - Use post-incrememnt RegSet operator.
> - Merge branch 'clean' into JDK-8331658-work
> - Fix merge
> - Merge branch 'clean' into JDK-8331658-work
> - Merge from JDK head.
> - Cleanup
> - Fix shared code
> - Fix shared code
> - ... and 51 more: https://git.openjdk.org/jdk/compare/4ff72dc5...a7612674
Hi, I ran some jmh tests on the arm64 platform and the performance `SecondarySupersLookup.testPositive` seems to have decreased, is this as expected?
before this patch:
Benchmark Mode Cnt Score Error Units
SecondarySupersLookup.testNegative00 avgt 15 2.455 ± 0.215 ns/op
SecondarySupersLookup.testNegative01 avgt 15 2.481 ± 0.202 ns/op
SecondarySupersLookup.testNegative02 avgt 15 2.455 ± 0.216 ns/op
SecondarySupersLookup.testNegative03 avgt 15 2.457 ± 0.212 ns/op
SecondarySupersLookup.testNegative04 avgt 15 2.463 ± 0.209 ns/op
SecondarySupersLookup.testNegative05 avgt 15 2.462 ± 0.211 ns/op
SecondarySupersLookup.testNegative06 avgt 15 2.455 ± 0.216 ns/op
SecondarySupersLookup.testNegative07 avgt 15 2.455 ± 0.215 ns/op
SecondarySupersLookup.testNegative08 avgt 15 2.456 ± 0.215 ns/op
SecondarySupersLookup.testNegative09 avgt 15 2.499 ± 0.199 ns/op
SecondarySupersLookup.testNegative10 avgt 15 2.456 ± 0.214 ns/op
SecondarySupersLookup.testNegative16 avgt 15 2.459 ± 0.214 ns/op
SecondarySupersLookup.testNegative20 avgt 15 2.458 ± 0.216 ns/op
SecondarySupersLookup.testNegative30 avgt 15 2.458 ± 0.215 ns/op
SecondarySupersLookup.testNegative32 avgt 15 2.457 ± 0.217 ns/op
SecondarySupersLookup.testNegative40 avgt 15 2.456 ± 0.217 ns/op
SecondarySupersLookup.testNegative50 avgt 15 2.482 ± 0.209 ns/op
SecondarySupersLookup.testNegative55 avgt 15 12.217 ± 1.594 ns/op
SecondarySupersLookup.testNegative56 avgt 15 12.756 ± 1.523 ns/op
SecondarySupersLookup.testNegative57 avgt 15 11.641 ± 1.264 ns/op
SecondarySupersLookup.testNegative58 avgt 15 11.088 ± 0.066 ns/op
SecondarySupersLookup.testNegative59 avgt 15 11.668 ± 1.256 ns/op
SecondarySupersLookup.testNegative60 avgt 15 21.025 ± 0.146 ns/op
SecondarySupersLookup.testNegative61 avgt 15 20.944 ± 0.175 ns/op
SecondarySupersLookup.testNegative62 avgt 15 21.159 ± 0.297 ns/op
SecondarySupersLookup.testNegative63 avgt 15 49.390 ± 1.943 ns/op
SecondarySupersLookup.testNegative64 avgt 15 49.426 ± 0.989 ns/op
SecondarySupersLookup.testPositive01 avgt 15 1.710 ± 0.070 ns/op
SecondarySupersLookup.testPositive02 avgt 15 1.726 ± 0.071 ns/op
SecondarySupersLookup.testPositive03 avgt 15 1.565 ± 0.169 ns/op
SecondarySupersLookup.testPositive04 avgt 15 1.591 ± 0.064 ns/op
SecondarySupersLookup.testPositive05 avgt 15 1.684 ± 0.115 ns/op
SecondarySupersLookup.testPositive06 avgt 15 1.546 ± 0.156 ns/op
SecondarySupersLookup.testPositive07 avgt 15 1.522 ± 0.134 ns/op
SecondarySupersLookup.testPositive08 avgt 15 1.479 ± 0.114 ns/op
SecondarySupersLookup.testPositive09 avgt 15 1.742 ± 0.061 ns/op
SecondarySupersLookup.testPositive10 avgt 15 1.531 ± 0.123 ns/op
SecondarySupersLookup.testPositive16 avgt 15 1.540 ± 0.150 ns/op
SecondarySupersLookup.testPositive20 avgt 15 1.558 ± 0.169 ns/op
SecondarySupersLookup.testPositive30 avgt 15 1.531 ± 0.096 ns/op
SecondarySupersLookup.testPositive32 avgt 15 1.541 ± 0.139 ns/op
SecondarySupersLookup.testPositive40 avgt 15 1.487 ± 0.101 ns/op
SecondarySupersLookup.testPositive50 avgt 15 1.521 ± 0.144 ns/op
SecondarySupersLookup.testPositive60 avgt 15 1.512 ± 0.163 ns/op
SecondarySupersLookup.testPositive63 avgt 15 1.745 ± 0.100 ns/op
SecondarySupersLookup.testPositive64 avgt 15 1.557 ± 0.190 ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'
Apply this patch:
Benchmark Mode Cnt Score Error Units
SecondarySupersLookup.testNegative00 avgt 15 2.559 ± 0.141 ns/op
SecondarySupersLookup.testNegative01 avgt 15 2.603 ± 0.119 ns/op
SecondarySupersLookup.testNegative02 avgt 15 2.550 ± 0.145 ns/op
SecondarySupersLookup.testNegative03 avgt 15 2.588 ± 0.117 ns/op
SecondarySupersLookup.testNegative04 avgt 15 2.556 ± 0.144 ns/op
SecondarySupersLookup.testNegative05 avgt 15 2.623 ± 0.090 ns/op
SecondarySupersLookup.testNegative06 avgt 15 2.674 ± 0.074 ns/op
SecondarySupersLookup.testNegative07 avgt 15 2.615 ± 0.132 ns/op
SecondarySupersLookup.testNegative08 avgt 15 2.559 ± 0.136 ns/op
SecondarySupersLookup.testNegative09 avgt 15 2.581 ± 0.119 ns/op
SecondarySupersLookup.testNegative10 avgt 15 2.612 ± 0.111 ns/op
SecondarySupersLookup.testNegative16 avgt 15 2.571 ± 0.142 ns/op
SecondarySupersLookup.testNegative20 avgt 15 2.594 ± 0.119 ns/op
SecondarySupersLookup.testNegative30 avgt 15 2.560 ± 0.144 ns/op
SecondarySupersLookup.testNegative32 avgt 15 2.653 ± 0.129 ns/op
SecondarySupersLookup.testNegative40 avgt 15 2.594 ± 0.115 ns/op
SecondarySupersLookup.testNegative50 avgt 15 2.604 ± 0.125 ns/op
SecondarySupersLookup.testNegative55 avgt 15 12.003 ± 1.077 ns/op
SecondarySupersLookup.testNegative56 avgt 15 11.483 ± 0.053 ns/op
SecondarySupersLookup.testNegative57 avgt 15 12.506 ± 1.394 ns/op
SecondarySupersLookup.testNegative58 avgt 15 12.027 ± 1.157 ns/op
SecondarySupersLookup.testNegative59 avgt 15 13.481 ± 1.117 ns/op
SecondarySupersLookup.testNegative60 avgt 15 20.952 ± 0.080 ns/op
SecondarySupersLookup.testNegative61 avgt 15 21.006 ± 0.196 ns/op
SecondarySupersLookup.testNegative62 avgt 15 21.007 ± 0.098 ns/op
SecondarySupersLookup.testNegative63 avgt 15 48.050 ± 1.293 ns/op
SecondarySupersLookup.testNegative64 avgt 15 49.669 ± 0.730 ns/op
SecondarySupersLookup.testPositive01 avgt 15 4.235 ± 0.044 ns/op
SecondarySupersLookup.testPositive02 avgt 15 4.215 ± 0.032 ns/op
SecondarySupersLookup.testPositive03 avgt 15 4.211 ± 0.032 ns/op
SecondarySupersLookup.testPositive04 avgt 15 4.219 ± 0.022 ns/op
SecondarySupersLookup.testPositive05 avgt 15 4.244 ± 0.025 ns/op
SecondarySupersLookup.testPositive06 avgt 15 4.217 ± 0.038 ns/op
SecondarySupersLookup.testPositive07 avgt 15 4.221 ± 0.034 ns/op
SecondarySupersLookup.testPositive08 avgt 15 4.233 ± 0.030 ns/op
SecondarySupersLookup.testPositive09 avgt 15 4.266 ± 0.069 ns/op
SecondarySupersLookup.testPositive10 avgt 15 4.223 ± 0.039 ns/op
SecondarySupersLookup.testPositive16 avgt 15 4.234 ± 0.023 ns/op
SecondarySupersLookup.testPositive20 avgt 15 4.223 ± 0.038 ns/op
SecondarySupersLookup.testPositive30 avgt 15 4.219 ± 0.033 ns/op
SecondarySupersLookup.testPositive32 avgt 15 4.225 ± 0.052 ns/op
SecondarySupersLookup.testPositive40 avgt 15 7.201 ± 2.232 ns/op
SecondarySupersLookup.testPositive50 avgt 15 4.198 ± 0.022 ns/op
SecondarySupersLookup.testPositive60 avgt 15 6.369 ± 1.828 ns/op
SecondarySupersLookup.testPositive63 avgt 15 55.590 ± 0.208 ns/op
SecondarySupersLookup.testPositive64 avgt 15 58.098 ± 1.861 ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2402219616
More information about the hotspot-dev
mailing list