RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v23]

Gui Cao gcao at openjdk.org
Wed Oct 9 12:46:06 UTC 2024


On Mon, 9 Sep 2024 13:32:24 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits:
> 
>  - Merge from 4ff72dc57e65e99b129f0ba28196994edf402018
>  - Fix s390
>  - Use post-incrememnt RegSet operator.
>  - Merge branch 'clean' into JDK-8331658-work
>  - Fix merge
>  - Merge branch 'clean' into JDK-8331658-work
>  - Merge from JDK head.
>  - Cleanup
>  - Fix shared code
>  - Fix shared code
>  - ... and 51 more: https://git.openjdk.org/jdk/compare/4ff72dc5...a7612674

Hi, I ran some jmh tests on the arm64 platform and the performance `SecondarySupersLookup.testPositive` seems to have decreased, is this as expected?
before this patch:

Benchmark                             Mode  Cnt   Score   Error  Units
SecondarySupersLookup.testNegative00  avgt   15   2.455 ± 0.215  ns/op
SecondarySupersLookup.testNegative01  avgt   15   2.481 ± 0.202  ns/op
SecondarySupersLookup.testNegative02  avgt   15   2.455 ± 0.216  ns/op
SecondarySupersLookup.testNegative03  avgt   15   2.457 ± 0.212  ns/op
SecondarySupersLookup.testNegative04  avgt   15   2.463 ± 0.209  ns/op
SecondarySupersLookup.testNegative05  avgt   15   2.462 ± 0.211  ns/op
SecondarySupersLookup.testNegative06  avgt   15   2.455 ± 0.216  ns/op
SecondarySupersLookup.testNegative07  avgt   15   2.455 ± 0.215  ns/op
SecondarySupersLookup.testNegative08  avgt   15   2.456 ± 0.215  ns/op
SecondarySupersLookup.testNegative09  avgt   15   2.499 ± 0.199  ns/op
SecondarySupersLookup.testNegative10  avgt   15   2.456 ± 0.214  ns/op
SecondarySupersLookup.testNegative16  avgt   15   2.459 ± 0.214  ns/op
SecondarySupersLookup.testNegative20  avgt   15   2.458 ± 0.216  ns/op
SecondarySupersLookup.testNegative30  avgt   15   2.458 ± 0.215  ns/op
SecondarySupersLookup.testNegative32  avgt   15   2.457 ± 0.217  ns/op
SecondarySupersLookup.testNegative40  avgt   15   2.456 ± 0.217  ns/op
SecondarySupersLookup.testNegative50  avgt   15   2.482 ± 0.209  ns/op
SecondarySupersLookup.testNegative55  avgt   15  12.217 ± 1.594  ns/op
SecondarySupersLookup.testNegative56  avgt   15  12.756 ± 1.523  ns/op
SecondarySupersLookup.testNegative57  avgt   15  11.641 ± 1.264  ns/op
SecondarySupersLookup.testNegative58  avgt   15  11.088 ± 0.066  ns/op
SecondarySupersLookup.testNegative59  avgt   15  11.668 ± 1.256  ns/op
SecondarySupersLookup.testNegative60  avgt   15  21.025 ± 0.146  ns/op
SecondarySupersLookup.testNegative61  avgt   15  20.944 ± 0.175  ns/op
SecondarySupersLookup.testNegative62  avgt   15  21.159 ± 0.297  ns/op
SecondarySupersLookup.testNegative63  avgt   15  49.390 ± 1.943  ns/op
SecondarySupersLookup.testNegative64  avgt   15  49.426 ± 0.989  ns/op
SecondarySupersLookup.testPositive01  avgt   15   1.710 ± 0.070  ns/op
SecondarySupersLookup.testPositive02  avgt   15   1.726 ± 0.071  ns/op
SecondarySupersLookup.testPositive03  avgt   15   1.565 ± 0.169  ns/op
SecondarySupersLookup.testPositive04  avgt   15   1.591 ± 0.064  ns/op
SecondarySupersLookup.testPositive05  avgt   15   1.684 ± 0.115  ns/op
SecondarySupersLookup.testPositive06  avgt   15   1.546 ± 0.156  ns/op
SecondarySupersLookup.testPositive07  avgt   15   1.522 ± 0.134  ns/op
SecondarySupersLookup.testPositive08  avgt   15   1.479 ± 0.114  ns/op
SecondarySupersLookup.testPositive09  avgt   15   1.742 ± 0.061  ns/op
SecondarySupersLookup.testPositive10  avgt   15   1.531 ± 0.123  ns/op
SecondarySupersLookup.testPositive16  avgt   15   1.540 ± 0.150  ns/op
SecondarySupersLookup.testPositive20  avgt   15   1.558 ± 0.169  ns/op
SecondarySupersLookup.testPositive30  avgt   15   1.531 ± 0.096  ns/op
SecondarySupersLookup.testPositive32  avgt   15   1.541 ± 0.139  ns/op
SecondarySupersLookup.testPositive40  avgt   15   1.487 ± 0.101  ns/op
SecondarySupersLookup.testPositive50  avgt   15   1.521 ± 0.144  ns/op
SecondarySupersLookup.testPositive60  avgt   15   1.512 ± 0.163  ns/op
SecondarySupersLookup.testPositive63  avgt   15   1.745 ± 0.100  ns/op
SecondarySupersLookup.testPositive64  avgt   15   1.557 ± 0.190  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'


Apply this patch:

Benchmark                             Mode  Cnt   Score   Error  Units
SecondarySupersLookup.testNegative00  avgt   15   2.559 ± 0.141  ns/op
SecondarySupersLookup.testNegative01  avgt   15   2.603 ± 0.119  ns/op
SecondarySupersLookup.testNegative02  avgt   15   2.550 ± 0.145  ns/op
SecondarySupersLookup.testNegative03  avgt   15   2.588 ± 0.117  ns/op
SecondarySupersLookup.testNegative04  avgt   15   2.556 ± 0.144  ns/op
SecondarySupersLookup.testNegative05  avgt   15   2.623 ± 0.090  ns/op
SecondarySupersLookup.testNegative06  avgt   15   2.674 ± 0.074  ns/op
SecondarySupersLookup.testNegative07  avgt   15   2.615 ± 0.132  ns/op
SecondarySupersLookup.testNegative08  avgt   15   2.559 ± 0.136  ns/op
SecondarySupersLookup.testNegative09  avgt   15   2.581 ± 0.119  ns/op
SecondarySupersLookup.testNegative10  avgt   15   2.612 ± 0.111  ns/op
SecondarySupersLookup.testNegative16  avgt   15   2.571 ± 0.142  ns/op
SecondarySupersLookup.testNegative20  avgt   15   2.594 ± 0.119  ns/op
SecondarySupersLookup.testNegative30  avgt   15   2.560 ± 0.144  ns/op
SecondarySupersLookup.testNegative32  avgt   15   2.653 ± 0.129  ns/op
SecondarySupersLookup.testNegative40  avgt   15   2.594 ± 0.115  ns/op
SecondarySupersLookup.testNegative50  avgt   15   2.604 ± 0.125  ns/op
SecondarySupersLookup.testNegative55  avgt   15  12.003 ± 1.077  ns/op
SecondarySupersLookup.testNegative56  avgt   15  11.483 ± 0.053  ns/op
SecondarySupersLookup.testNegative57  avgt   15  12.506 ± 1.394  ns/op
SecondarySupersLookup.testNegative58  avgt   15  12.027 ± 1.157  ns/op
SecondarySupersLookup.testNegative59  avgt   15  13.481 ± 1.117  ns/op
SecondarySupersLookup.testNegative60  avgt   15  20.952 ± 0.080  ns/op
SecondarySupersLookup.testNegative61  avgt   15  21.006 ± 0.196  ns/op
SecondarySupersLookup.testNegative62  avgt   15  21.007 ± 0.098  ns/op
SecondarySupersLookup.testNegative63  avgt   15  48.050 ± 1.293  ns/op
SecondarySupersLookup.testNegative64  avgt   15  49.669 ± 0.730  ns/op
SecondarySupersLookup.testPositive01  avgt   15   4.235 ± 0.044  ns/op
SecondarySupersLookup.testPositive02  avgt   15   4.215 ± 0.032  ns/op
SecondarySupersLookup.testPositive03  avgt   15   4.211 ± 0.032  ns/op
SecondarySupersLookup.testPositive04  avgt   15   4.219 ± 0.022  ns/op
SecondarySupersLookup.testPositive05  avgt   15   4.244 ± 0.025  ns/op
SecondarySupersLookup.testPositive06  avgt   15   4.217 ± 0.038  ns/op
SecondarySupersLookup.testPositive07  avgt   15   4.221 ± 0.034  ns/op
SecondarySupersLookup.testPositive08  avgt   15   4.233 ± 0.030  ns/op
SecondarySupersLookup.testPositive09  avgt   15   4.266 ± 0.069  ns/op
SecondarySupersLookup.testPositive10  avgt   15   4.223 ± 0.039  ns/op
SecondarySupersLookup.testPositive16  avgt   15   4.234 ± 0.023  ns/op
SecondarySupersLookup.testPositive20  avgt   15   4.223 ± 0.038  ns/op
SecondarySupersLookup.testPositive30  avgt   15   4.219 ± 0.033  ns/op
SecondarySupersLookup.testPositive32  avgt   15   4.225 ± 0.052  ns/op
SecondarySupersLookup.testPositive40  avgt   15   7.201 ± 2.232  ns/op
SecondarySupersLookup.testPositive50  avgt   15   4.198 ± 0.022  ns/op
SecondarySupersLookup.testPositive60  avgt   15   6.369 ± 1.828  ns/op
SecondarySupersLookup.testPositive63  avgt   15  55.590 ± 0.208  ns/op
SecondarySupersLookup.testPositive64  avgt   15  58.098 ± 1.861  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2402219616


More information about the hotspot-dev mailing list