RFR: JDK-8180450: secondary_super_cache does not scale well [v9]

Vladimir Ivanov vlivanov at openjdk.org
Wed Apr 10 19:07:15 UTC 2024


On Wed, 10 Apr 2024 15:41:41 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This PR is a redesign of subtype checking. 
>> 
>> The implementation of subtype checking in the HotSpot JVM is now twenty years old. There have been some performance-related bugs reported, and the only way to fix them is a redesign of the way it works.
>> 
>> So what's changed, so that the old design should be replaced?
>> 
>> Firstly, the computers of today aren't the computers of twenty years ago. It's not merely a matter of speed: the systems are much more parallel, both in the sense of having more cores and each core can run many instructions in parallel. Because of this, the speed ratio between memory accesses and the rate at which we can execute instructions has become wider and wider.
>> 
>> The most severe reported problem is to do with the "secondary supers cache". This is a 1-element per-class cache for interfaces (and arrays of interfaces). Unfortunately, if two threads repeatedly update this cache, the result is that a cache line ping-pongs between cores, causing a severe slowdown. 
>> 
>> Also, the linear search for an interface that is absent means that the entire list of interfaces has to be scanned. This plays badly with newer language features such as JEP 406, pattern matching for switch.
>> 
>> However, the computers of today can help us. The very high instruction-per-cycle rate of a Great Big Out-Of-Order (GBOOO) processor allows us to execute many of the instructions of a hash table lookup in parallel, as long as we avoid dependencies between instructions. 
>> 
>> The solution
>> ------------
>> 
>> We use a hashed lookup of secondary supers. This is a 64-way hash table, with linear probing for collisions. The table is compressed, in that null entries are removed, and the resulting hash table fits into the same secondary supers array as today's unsorted array of secondary supers. This means that existing code in HotSpot that simply does a linear scan of the secondary supers array does not need to be altered.
>> 
>> We add a bitmap field to each Klass object. This bitmap contains an occupancy bit corresponding to each element of the hash table, with a 1 indicating element presence. As well as allowing the hash table to be decompressed, this bimap is used as a simple kind of Bloom Filter. To determine whether a superclass is present, we simply have to check a single bit in the bitmap. If the bit is clear, we know that the superclass is not present. If the bit is set, we have to do a little arithmetic and then consult the hash table.
>> 
>> It works like th...
>
> Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 78 commits:
> 
>  - JDK-8180450: secondary_super_cache does not scale well
>  - JDK-8180450: secondary_super_cache does not scale well
>  - JDK-8180450: secondary_super_cache does not scale well
>  - Merge branch 'clean' into JDK-8180450
>  - InlineSecondarySupersTest is on by default.
>  - InlineSecondarySupersTest is on by default.
>  - JDK-8180450: secondary_super_cache does not scale well
>  - JDK-8180450: secondary_super_cache does not scale well
>  - JDK-8180450: secondary_super_cache does not scale well
>  - JDK-8180450: secondary_super_cache does not scale well
>  - ... and 68 more: https://git.openjdk.org/jdk/compare/b80ba085...8dc2ac13

I'm happy with the current state of the patch. Thanks a lot for incorporating the changes I proposed. I find it easier to reason about the implementation now and hope it'll help others navigating in the code.


> ... new lookup code is substantially larger than before, particularly for x86, and this might in some cases change inlining behaviour and cause regressions. To ameliorate that I've added another option, -XX:-InlineSecondarySupersTest, which generates stubs for the search code. While that does solve the code expansion problem, the additional call&return overhead doubles the time for each lookup, so I'm reluctant to recommend it for general use.

Alternatively, C2 inlining heuristics can be taught to discount inlined part of secondary supers lookup in generated code. It was the remediation chosen [1] for regressions introduced by post-call nops (part of Loom support).


> I would have liked to save the hashed interfaces list in the CDS archive file, but there are test cases for the Serviceability Agent that assume the interfaces are in the order in which they are declared. I think the tests are probably just wrong about this. For the same reason, I have to make copies of the interfaces array and sort the copies, rather than just using the same array at runtime.

I checked that CDS support works fine with the latest PR. I took a look at transitive interface sharing and it turned out to be a bit more complicated than a SA-specific test issue. 

Both `transitive_interfaces` and `local_interface` arrays can be used as `secondary_supers`. While the order of elements is non-significant for `transitive_interfaces`, `local_interfaces` determines initialization order of superinterfaces mandated by JVMS [2] (see `InstanceKlass::initialize_super_interfaces()`).

There's another minor issue with reusing arrays from CDS archive (see newly introduced comment in Klass::remove_unshareable_info() [3]), but with the following patch (on top up-to-date PR state) I see only a single test failure [4] which is indeed SA-specific and looks like a test bug to me:
  https://github.com/iwanowww/jdk/commit/f701e8bdb5269f144aa9c70e9dce9076394f09cf


[1] https://bugs.openjdk.org/browse/JDK-8300002

[2] JVMS-5.5
"7. Next, if C is a class rather than an interface, then let SC be its superclass and let SI1, ..., SIn be all superinterfaces of C (whether direct or indirect) that declare at least one non-abstract, non-static method. The order of superinterfaces is given by a recursive enumeration over the superinterface hierarchy of each interface directly implemented by C. For each interface I directly implemented by C (in the order of the interfaces array of C), the enumeration recurs on I's superinterfaces (in the order of the interfaces array of I) before returning I." 


[3]  Klass::remove_unshareable_info()

  // FIXME: validation in Klass::hash_secondary_supers() may fail for shared klasses.
  // Even though the bitmaps always match, the canonical order of elements in the table
  // is not guaranteed to stay the same (see tie breaker during Robin Hood hashing in Klass::hash_insert).
  //assert(compute_secondary_supers_bitmap(secondary_supers()) == _bitmap, "broken table");


[4] serviceability/dcmd/vm/ClassHierarchyTest.java

test ClassHierarchyTest.jmx(): failure
java.lang.AssertionError: Failed to match line #6: |  |  implements Intf2/0x00006000032b67d0 (inherited intf)

Running DCMD 'VM.class_hierarchy DcmdBaseClass -i -s' through 'JMXExecutor'
---------------- stdout ----------------
java.lang.Object/null
|--DcmdBaseClass/0x00006000032b67d0
|  implements Intf2/0x00006000032b67d0 (declared intf)
|  implements Intf1/0x00006000032b67d0 (inherited intf)
|  |--DcmdTestClass/0x00006000032b67d0
|  |  implements Intf2/0x00006000032b67d0 (inherited intf)
|  |  implements Intf1/0x00006000032b67d0 (inherited intf)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18309#issuecomment-2048252083


More information about the hotspot-dev mailing list