RFR: 8180450: secondary_super_cache does not scale well [v15]

Andrew Haley aph at openjdk.org
Mon Apr 15 12:39:47 UTC 2024


On Mon, 15 Apr 2024 12:33:05 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This PR is a redesign of subtype checking. 
>> 
>> The implementation of subtype checking in the HotSpot JVM is now twenty years old. There have been some performance-related bugs reported, and the only way to fix them is a redesign of the way it works.
>> 
>> So what's changed, so that the old design should be replaced?
>> 
>> Firstly, the computers of today aren't the computers of twenty years ago. It's not merely a matter of speed: the systems are much more parallel, both in the sense of having more cores and each core can run many instructions in parallel. Because of this, the speed ratio between memory accesses and the rate at which we can execute instructions has become wider and wider.
>> 
>> The most severe reported problem is to do with the "secondary supers cache". This is a 1-element per-class cache for interfaces (and arrays of interfaces). Unfortunately, if two threads repeatedly update this cache, the result is that a cache line ping-pongs between cores, causing a severe slowdown. 
>> 
>> Also, the linear search for an interface that is absent means that the entire list of interfaces has to be scanned. This plays badly with newer language features such as JEP 406, pattern matching for switch.
>> 
>> However, the computers of today can help us. The very high instruction-per-cycle rate of a Great Big Out-Of-Order (GBOOO) processor allows us to execute many of the instructions of a hash table lookup in parallel, as long as we avoid dependencies between instructions. 
>> 
>> The solution
>> ------------
>> 
>> We use a hashed lookup of secondary supers. This is a 64-way hash table, with linear probing for collisions. The table is compressed, in that null entries are removed, and the resulting hash table fits into the same secondary supers array as today's unsorted array of secondary supers. This means that existing code in HotSpot that simply does a linear scan of the secondary supers array does not need to be altered.
>> 
>> We add a bitmap field to each Klass object. This bitmap contains an occupancy bit corresponding to each element of the hash table, with a 1 indicating element presence. As well as allowing the hash table to be decompressed, this bimap is used as a simple kind of Bloom Filter. To determine whether a superclass is present, we simply have to check a single bit in the bitmap. If the bit is clear, we know that the superclass is not present. If the bit is set, we have to do a little arithmetic and then consult the hash table.
>> 
>> It works like th...
>
> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Working secondary supers table with -XX:-UsePopCountInstruction

So it turns out that even without a POPCNT intstruction, this algorithm is still faster than the current linear search in all reasonable cases. I've pushed a change that uses a hand-coded population count.


                           soft popcount      current linear search
Lookup.testPositive01      1.537 ± 0.151      11.955 ±  0.011 ns/op
Lookup.testPositive02      1.535 ± 0.251      12.499 ±  0.029 ns/op
Lookup.testPositive03      2.279 ± 0.141      13.041 ±  0.010 ns/op
Lookup.testPositive04      1.819 ± 0.036      13.579 ±  0.051 ns/op
Lookup.testPositive05      1.537 ± 0.187      15.755 ±  0.079 ns/op
Lookup.testPositive06      3.121 ± 0.015      14.672 ±  0.016 ns/op
Lookup.testPositive07      2.270 ± 0.004      15.215 ±  0.006 ns/op
Lookup.testPositive08      4.445 ± 0.225      15.849 ±  2.889 ns/op
Lookup.testPositive09      3.122 ± 0.001      16.286 ±  8.472 ns/op
Lookup.testPositive10      1.820 ± 0.048      17.030 ± 14.481 ns/op
Lookup.testPositive16      7.228 ± 0.176      19.606 ±  1.130 ns/op
Lookup.testPositive20      3.503 ± 0.016      22.186 ±  4.257 ns/op
Lookup.testPositive30      6.672 ± 0.175      27.960 ±  0.381 ns/op
Lookup.testPositive32     12.031 ± 0.619      28.794 ±  0.107 ns/op
Lookup.testPositive40     12.502 ± 0.242      33.415 ±  0.024 ns/op
Lookup.testPositive50     24.355 ± 2.444      39.612 ± 15.678 ns/op
Lookup.testPositive60     41.386 ± 1.143      44.529 ±  1.298 ns/op
Lookup.testPositive63     68.823 ± 0.992      46.187 ±  0.042 ns/op
Lookup.testPositive64     48.325 ± 2.477      46.721 ±  0.198 ns/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18309#issuecomment-2056747930


More information about the hotspot-dev mailing list