RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v23]

Andrew Haley aph at openjdk.org
Mon Sep 9 13:32:24 UTC 2024


> This patch expands the use of a hash table for secondary superclasses
> to the interpreter, C1, and runtime. It also adds a C2 implementation
> of hashed lookup in cases where the superclass isn't known at compile
> time.
> 
> HotSpot shared runtime
> ----------------------
> 
> Building hashed secondary tables is now unconditional. It takes very
> little time, and now that the shared runtime always has the tables, it
> might as well take advantage of them. The shared code is easier to
> follow now, I think.
> 
> There might be a performance issue with x86-64 in that we build
> HotSpot for a default x86-64 target that does not support popcount.
> This means that HotSpot C++ runtime on x86 always uses a software
> emulation for popcount, even though the vast majority of machines made
> for the past 20 years can do popcount in a single instruction. It
> wouldn't be terribly hard to do something about that.
> 
> Having said that, the software popcount is really not bad.
> 
> x86
> ---
> 
> x86 is rather tricky, because we still support
> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
> well as 32- and 64-bit ports. There's some further complication in
> that only `RCX` can be used as a shift count, so there's some register
> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
> rather gnarly, with multiple levels of conditionals at compile time
> and runtime.
> 
> AArch64
> -------
> 
> AArch64 is considerably more straightforward. We always have a
> popcount instruction and (thankfully) no 32-bit code to worry about.
> 
> Generally
> ---------
> 
> I would dearly love simply to rip out the "old" secondary supers cache
> support, but I've left it in just in case someone has a performance
> regression.
> 
> The versions of `MacroAssembler::lookup_secondary_supers_table` that
> work with variable superclasses don't take a fixed set of temp
> registers, and neither do they call out to to a slow path subroutine.
> Instead, the slow patch is expanded inline.
> 
> I don't think this is necessarily bad. Apart from the very rare cases
> where C2 can't determine the superclass to search for at compile time,
> this code is only used for generating stubs, and it seemed to me
> ridiculous to have stubs calling other stubs.
> 
> I've followed the guidance from @iwanowww not to obsess too much about
> the performance of C1-compiled secondary supers lookups, and to prefer
> simplicity over absolute performance. Nonetheless, this is a
> complicated patch that touches many areas.

Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits:

 - Merge from 4ff72dc57e65e99b129f0ba28196994edf402018
 - Fix s390
 - Use post-incrememnt RegSet operator.
 - Merge branch 'clean' into JDK-8331658-work
 - Fix merge
 - Merge branch 'clean' into JDK-8331658-work
 - Merge from JDK head.
 - Cleanup
 - Fix shared code
 - Fix shared code
 - ... and 51 more: https://git.openjdk.org/jdk/compare/4ff72dc5...a7612674

-------------

Changes: https://git.openjdk.org/jdk/pull/19989/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=22
  Stats: 1052 lines in 22 files changed: 778 ins; 140 del; 134 mod
  Patch: https://git.openjdk.org/jdk/pull/19989.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989

PR: https://git.openjdk.org/jdk/pull/19989


More information about the hotspot-dev mailing list