RFR: 8332587: RISC-V: secondary_super_cache does not scale well

Hamlin Li mli at openjdk.org
Mon Jun 3 11:12:04 UTC 2024


On Tue, 21 May 2024 08:31:53 GMT, Gui Cao <gcao at openjdk.org> wrote:

> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64.
> This optimization depends on availability of the Zbb extension which has the cpop instruction.
> 
> ### Correctness testing:
> 
> - [x]  Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release)
> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug)
> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers`
> 
> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb:
> Original:
> 
> Benchmark                             Mode  Cnt    Score   Error  Units
> SecondarySuperCacheHits.test  avgt   15  11.375 ± 0.071  ns/op
> SecondarySuperCacheInterContention.test     avgt   15  646.087 ± 32.587  ns/op
> SecondarySuperCacheInterContention.test:t1  avgt   15  600.090 ± 83.779  ns/op
> SecondarySuperCacheInterContention.test:t2  avgt   15  692.084 ± 73.218  ns/op
> SecondarySupersLookup.testNegative00  avgt   15   16.420 ± 0.239  ns/op
> SecondarySupersLookup.testNegative01  avgt   15   18.307 ± 0.260  ns/op
> SecondarySupersLookup.testNegative02  avgt   15   21.695 ± 0.458  ns/op
> SecondarySupersLookup.testNegative03  avgt   15   24.855 ± 0.664  ns/op
> SecondarySupersLookup.testNegative04  avgt   15   27.305 ± 0.522  ns/op
> SecondarySupersLookup.testNegative05  avgt   15   29.719 ± 0.385  ns/op
> SecondarySupersLookup.testNegative06  avgt   15   32.231 ± 0.498  ns/op
> SecondarySupersLookup.testNegative07  avgt   15   33.747 ± 0.603  ns/op
> SecondarySupersLookup.testNegative08  avgt   15   35.856 ± 0.629  ns/op
> SecondarySupersLookup.testNegative09  avgt   15   37.077 ± 0.546  ns/op
> SecondarySupersLookup.testNegative10  avgt   15   39.408 ± 0.465  ns/op
> SecondarySupersLookup.testNegative16  avgt   15   51.041 ± 0.547  ns/op
> SecondarySupersLookup.testNegative20  avgt   15   58.722 ± 0.922  ns/op
> SecondarySupersLookup.testNegative30  avgt   15   77.310 ± 0.654  ns/op
> SecondarySupersLookup.testNegative32  avgt   15   81.116 ± 0.854  ns/op
> SecondarySupersLookup.testNegative40  avgt   15   96.311 ± 0.840  ns/op
> SecondarySupersLookup.testNegative50  avgt   15  115.427 ± 0.838  ns/op
> SecondarySupersLookup.testNegative55  avgt   15  124.371 ± 1.076  ns/op
> SecondarySupersLookup.testNegative56  avgt   15  126.796 ± 0.916  ns/op
> SecondarySupersLookup.testNegative57  avgt   15  127.952 ± 1.202  ns/op
> SecondarySupersLookup.testNegative58  avgt   15  131.956 ± 4.515  ns/op
> SecondarySupersLookup.testNegative59  avgt   15  131.858 ± 1.066  ns/op
> SecondarySupersLookup.testNegative60...

Thanks for explanation. Seems this patch will also use some instructions in zba/zbs extension. Will they also impact the final performance?

Also has some minor comments first.

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3484:

> 3482: }
> 3483: 
> 3484: // Ensure that the inline code and the stub are using the same registers.

Can you add comment about why inline code and the stub should use the same registers?

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3485:

> 3483: 
> 3484: // Ensure that the inline code and the stub are using the same registers.
> 3485: #define LOOKUP_SECONDARY_SUPERS_TABLE_REGISTERS                     \

Can `r_super_klass ` and so on be passed as arguments of macro?

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3530:

> 3528:   // the bit is zero, we are certain that super_klass is not one of
> 3529:   // the secondary supers.
> 3530:   test_bit(t0, r_bitmap, bit);

it uses zbs if possible, I wonder what's the performance data when zbs is diabled or unsupported.

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3549:

> 3547:   assert(Array<Klass*>::length_offset_in_bytes() == 0, "Adjust this code");
> 3548: 
> 3549:   shadd(result, r_array_index, r_array_base, result, LogBytesPerWord);

similar here, it uses zba if possible, so good to know what's the performance data when zba not supported.

src/hotspot/cpu/riscv/riscv.ad line 10127:

> 10125: %{
> 10126:   match(Set result (PartialSubtypeCheck sub (Binary super_reg super_con)));
> 10127:   predicate(UseSecondarySupersTable);

suggestion: move `predicate` before `match`.

src/hotspot/cpu/riscv/vm_version_riscv.hpp line 280:

> 278:   constexpr static bool supports_recursive_lightweight_locking() { return true; }
> 279: 
> 280:   constexpr static bool supports_secondary_supers_table() { return true; }

If it depends on zbb, should it return UseZbb here?

-------------

PR Review: https://git.openjdk.org/jdk/pull/19320#pullrequestreview-2093273846
PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624154904
PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624122884
PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624212710
PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624214685
PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624119194
PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624090583


More information about the hotspot-dev mailing list