RFR: 8332587: RISC-V: secondary_super_cache does not scale well
Gui Cao
gcao at openjdk.org
Mon Jun 3 11:00:00 UTC 2024
On Tue, 21 May 2024 08:31:53 GMT, Gui Cao <gcao at openjdk.org> wrote:
> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64.
> This optimization depends on availability of the Zbb extension which has the cpop instruction.
>
> ### Correctness testing:
>
> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release)
> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug)
> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers`
>
> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb:
> Original:
>
> Benchmark Mode Cnt Score Error Units
> SecondarySuperCacheHits.test avgt 15 11.375 ± 0.071 ns/op
> SecondarySuperCacheInterContention.test avgt 15 646.087 ± 32.587 ns/op
> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ± 83.779 ns/op
> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ± 73.218 ns/op
> SecondarySupersLookup.testNegative00 avgt 15 16.420 ± 0.239 ns/op
> SecondarySupersLookup.testNegative01 avgt 15 18.307 ± 0.260 ns/op
> SecondarySupersLookup.testNegative02 avgt 15 21.695 ± 0.458 ns/op
> SecondarySupersLookup.testNegative03 avgt 15 24.855 ± 0.664 ns/op
> SecondarySupersLookup.testNegative04 avgt 15 27.305 ± 0.522 ns/op
> SecondarySupersLookup.testNegative05 avgt 15 29.719 ± 0.385 ns/op
> SecondarySupersLookup.testNegative06 avgt 15 32.231 ± 0.498 ns/op
> SecondarySupersLookup.testNegative07 avgt 15 33.747 ± 0.603 ns/op
> SecondarySupersLookup.testNegative08 avgt 15 35.856 ± 0.629 ns/op
> SecondarySupersLookup.testNegative09 avgt 15 37.077 ± 0.546 ns/op
> SecondarySupersLookup.testNegative10 avgt 15 39.408 ± 0.465 ns/op
> SecondarySupersLookup.testNegative16 avgt 15 51.041 ± 0.547 ns/op
> SecondarySupersLookup.testNegative20 avgt 15 58.722 ± 0.922 ns/op
> SecondarySupersLookup.testNegative30 avgt 15 77.310 ± 0.654 ns/op
> SecondarySupersLookup.testNegative32 avgt 15 81.116 ± 0.854 ns/op
> SecondarySupersLookup.testNegative40 avgt 15 96.311 ± 0.840 ns/op
> SecondarySupersLookup.testNegative50 avgt 15 115.427 ± 0.838 ns/op
> SecondarySupersLookup.testNegative55 avgt 15 124.371 ± 1.076 ns/op
> SecondarySupersLookup.testNegative56 avgt 15 126.796 ± 0.916 ns/op
> SecondarySupersLookup.testNegative57 avgt 15 127.952 ± 1.202 ns/op
> SecondarySupersLookup.testNegative58 avgt 15 131.956 ± 4.515 ns/op
> SecondarySupersLookup.testNegative59 avgt 15 131.858 ± 1.066 ns/op
> SecondarySupersLookup.testNegative60...
Hi, the following is an implementation using scalar assembly when zbb is not available.
``` diff
diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
index ce6696c18a8..93e1045d2d4 100644
--- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
@@ -3481,6 +3481,36 @@ void MacroAssembler::check_klass_subtype_slow_path(Register sub_klass,
bind(L_fallthrough);
}
+void MacroAssembler::population_count(Register dst, Register src,
+ Register tmp1, Register tmp2) {
+
+ if (UsePopCountInstruction) {
+ cpop(dst, src);
+ } else {
+ assert_different_registers(src, tmp1, tmp2);
+ assert_different_registers(dst, tmp1, tmp2);
+ Label loop, done;
+
+ mv(tmp1, src);
+ // dst = 0;
+ // while(tmp1 != 0) {
+ // dst++;
+ // tmp1 &= (tmp1 - 1);
+ // }
+ mv(dst, zr);
+ beqz(tmp1, done);
+ {
+ bind(loop);
+ addi(dst, dst, 1);
+ mv(tmp2, tmp1);
+ addi(tmp2, tmp2, -1);
+ andr(tmp1, tmp1, tmp2);
+ bnez(tmp1, loop);
+ }
+ bind(done);
+ }
+}
+
// Ensure that the inline code and the stub are using the same registers.
#define LOOKUP_SECONDARY_SUPERS_TABLE_REGISTERS \
do { \
@@ -3533,7 +3563,7 @@ bool MacroAssembler::lookup_secondary_supers_table(Register r_sub_klass,
// Get the first array index that can contain super_klass into r_array_index.
if (bit != 0) {
slli(r_array_index, r_bitmap, (Klass::SECONDARY_SUPERS_TABLE_MASK - bit));
- cpop(r_array_index, r_array_index);
+ population_count(r_array_index, r_array_index, t0, tmp1);
} else {
mv(r_array_index, (u1)1);
}
diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp
index 2575d5ea2ff..3e4930d5605 100644
--- a/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp
+++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp
@@ -322,6 +322,8 @@ class MacroAssembler: public Assembler {
Label* L_success,
Label* L_failure);
+ void population_count(Register dst, Register src, Register tmp1, Register tmp2);
+
// As above, but with a constant super_klass.
// The result is in Register result, not the condition codes.
bool lookup_secondary_supers_table(Register r_sub_klass,
JMH tested on HiFive Unmatched(has not Zbb):
Original
Benchmark Mode Cnt Score Error Units [14/1836]
SecondarySupersLookup.testNegative00 avgt 15 28.625 ± 7.158 ns/op
SecondarySupersLookup.testNegative01 avgt 15 33.860 ± 6.312 ns/op
SecondarySupersLookup.testNegative02 avgt 15 30.887 ± 4.773 ns/op
SecondarySupersLookup.testNegative03 avgt 15 39.477 ± 6.945 ns/op
SecondarySupersLookup.testNegative04 avgt 15 34.976 ± 3.080 ns/op
SecondarySupersLookup.testNegative05 avgt 15 42.025 ± 8.324 ns/op
SecondarySupersLookup.testNegative06 avgt 15 49.359 ± 8.480 ns/op
SecondarySupersLookup.testNegative07 avgt 15 49.996 ± 11.841 ns/op
SecondarySupersLookup.testNegative08 avgt 15 58.468 ± 8.485 ns/op
SecondarySupersLookup.testNegative09 avgt 15 57.198 ± 10.803 ns/op
SecondarySupersLookup.testNegative10 avgt 15 63.531 ± 5.595 ns/op
SecondarySupersLookup.testNegative16 avgt 15 73.716 ± 9.231 ns/op
SecondarySupersLookup.testNegative20 avgt 15 88.823 ± 16.179 ns/op
SecondarySupersLookup.testNegative30 avgt 15 118.832 ± 18.866 ns/op
SecondarySupersLookup.testNegative32 avgt 15 126.538 ± 23.139 ns/op
SecondarySupersLookup.testNegative40 avgt 15 149.722 ± 31.675 ns/op
SecondarySupersLookup.testNegative50 avgt 15 186.958 ± 39.203 ns/op
SecondarySupersLookup.testNegative55 avgt 15 193.787 ± 29.629 ns/op
SecondarySupersLookup.testNegative56 avgt 15 204.451 ± 34.491 ns/op
SecondarySupersLookup.testNegative57 avgt 15 204.104 ± 27.130 ns/op
SecondarySupersLookup.testNegative58 avgt 15 207.017 ± 31.201 ns/op
SecondarySupersLookup.testNegative59 avgt 15 219.159 ± 33.664 ns/op
SecondarySupersLookup.testNegative60 avgt 15 208.726 ± 27.195 ns/op
SecondarySupersLookup.testNegative61 avgt 15 214.557 ± 30.992 ns/op
SecondarySupersLookup.testNegative62 avgt 15 212.104 ± 30.843 ns/op
SecondarySupersLookup.testNegative63 avgt 15 227.805 ± 39.706 ns/op
SecondarySupersLookup.testNegative64 avgt 15 229.951 ± 42.039 ns/op
SecondarySupersLookup.testPositive01 avgt 15 18.498 ± 4.687 ns/op
SecondarySupersLookup.testPositive02 avgt 15 20.130 ± 4.955 ns/op
SecondarySupersLookup.testPositive03 avgt 15 18.576 ± 4.383 ns/op
SecondarySupersLookup.testPositive04 avgt 15 19.202 ± 4.554 ns/op
SecondarySupersLookup.testPositive05 avgt 15 18.923 ± 4.730 ns/op
SecondarySupersLookup.testPositive06 avgt 15 20.494 ± 6.282 ns/op
SecondarySupersLookup.testPositive07 avgt 15 17.679 ± 2.386 ns/op
SecondarySupersLookup.testPositive08 avgt 15 19.396 ± 7.047 ns/op
SecondarySupersLookup.testPositive09 avgt 15 18.163 ± 2.950 ns/op
SecondarySupersLookup.testPositive10 avgt 15 21.135 ± 5.552 ns/op
SecondarySupersLookup.testPositive16 avgt 15 20.117 ± 4.606 ns/op
SecondarySupersLookup.testPositive20 avgt 15 21.209 ± 5.800 ns/op
SecondarySupersLookup.testPositive30 avgt 15 21.388 ± 6.792 ns/op
SecondarySupersLookup.testPositive32 avgt 15 19.720 ± 4.559 ns/op
SecondarySupersLookup.testPositive40 avgt 15 17.354 ± 2.707 ns/op
SecondarySupersLookup.testPositive50 avgt 15 20.825 ± 6.062 ns/op
SecondarySupersLookup.testPositive60 avgt 15 19.910 ± 5.621 ns/op
SecondarySupersLookup.testPositive63 avgt 15 18.989 ± 3.156 ns/op
SecondarySupersLookup.testPositive64 avgt 15 20.298 ± 5.357 ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'
With patch:
Benchmark Mode Cnt Score Error Units
SecondarySupersLookup.testNegative00 avgt 15 21.124 ± 1.601 ns/op
SecondarySupersLookup.testNegative01 avgt 15 25.788 ± 3.713 ns/op
SecondarySupersLookup.testNegative02 avgt 15 25.501 ± 5.616 ns/op
SecondarySupersLookup.testNegative03 avgt 15 22.800 ± 6.454 ns/op
SecondarySupersLookup.testNegative04 avgt 15 21.790 ± 2.629 ns/op
SecondarySupersLookup.testNegative05 avgt 15 25.485 ± 6.082 ns/op
SecondarySupersLookup.testNegative06 avgt 15 24.801 ± 5.387 ns/op
SecondarySupersLookup.testNegative07 avgt 15 24.425 ± 4.686 ns/op
SecondarySupersLookup.testNegative08 avgt 15 24.486 ± 4.044 ns/op
SecondarySupersLookup.testNegative09 avgt 15 23.810 ± 3.838 ns/op
SecondarySupersLookup.testNegative10 avgt 15 25.085 ± 3.756 ns/op
SecondarySupersLookup.testNegative16 avgt 15 22.018 ± 2.924 ns/op
SecondarySupersLookup.testNegative20 avgt 15 23.161 ± 3.271 ns/op
SecondarySupersLookup.testNegative30 avgt 15 23.705 ± 4.669 ns/op
SecondarySupersLookup.testNegative32 avgt 15 25.048 ± 7.125 ns/op
SecondarySupersLookup.testNegative40 avgt 15 24.661 ± 3.541 ns/op
SecondarySupersLookup.testNegative50 avgt 15 22.918 ± 2.879 ns/op
SecondarySupersLookup.testNegative55 avgt 15 250.982 ± 10.224 ns/op
SecondarySupersLookup.testNegative56 avgt 15 251.020 ± 8.432 ns/op
SecondarySupersLookup.testNegative57 avgt 15 255.998 ± 9.054 ns/op
SecondarySupersLookup.testNegative58 avgt 15 257.347 ± 11.340 ns/op
SecondarySupersLookup.testNegative59 avgt 15 277.727 ± 10.007 ns/op
SecondarySupersLookup.testNegative60 avgt 15 304.818 ± 12.092 ns/op
SecondarySupersLookup.testNegative61 avgt 15 308.956 ± 13.060 ns/op
SecondarySupersLookup.testNegative62 avgt 15 309.804 ± 14.715 ns/op
SecondarySupersLookup.testNegative63 avgt 15 416.021 ± 8.051 ns/op
SecondarySupersLookup.testNegative64 avgt 15 425.190 ± 10.966 ns/op
SecondarySupersLookup.testPositive01 avgt 15 18.369 ± 4.490 ns/op
SecondarySupersLookup.testPositive02 avgt 15 21.595 ± 6.626 ns/op
SecondarySupersLookup.testPositive03 avgt 15 19.327 ± 4.973 ns/op
SecondarySupersLookup.testPositive04 avgt 15 19.636 ± 4.759 ns/op
SecondarySupersLookup.testPositive05 avgt 15 17.055 ± 2.329 ns/op
SecondarySupersLookup.testPositive06 avgt 15 18.712 ± 3.333 ns/op
SecondarySupersLookup.testPositive07 avgt 15 20.508 ± 4.213 ns/op
SecondarySupersLookup.testPositive08 avgt 15 19.208 ± 3.761 ns/op
SecondarySupersLookup.testPositive09 avgt 15 18.061 ± 3.619 ns/op
SecondarySupersLookup.testPositive10 avgt 15 17.519 ± 3.322 ns/op
SecondarySupersLookup.testPositive16 avgt 15 19.099 ± 4.358 ns/op
SecondarySupersLookup.testPositive20 avgt 15 20.731 ± 5.230 ns/op
SecondarySupersLookup.testPositive30 avgt 15 18.048 ± 2.994 ns/op
SecondarySupersLookup.testPositive32 avgt 15 18.817 ± 3.856 ns/op
SecondarySupersLookup.testPositive40 avgt 15 17.165 ± 2.536 ns/op
SecondarySupersLookup.testPositive50 avgt 15 20.060 ± 4.473 ns/op
SecondarySupersLookup.testPositive60 avgt 15 17.296 ± 2.411 ns/op
SecondarySupersLookup.testPositive63 avgt 15 19.313 ± 4.133 ns/op
SecondarySupersLookup.testPositive64 avgt 15 21.258 ± 4.909 ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'
JMH tested on LicheePi 4A(has not Zbb):
Original
Benchmark Mode Cnt Score Error Units
SecondarySupersLookup.testNegative00 avgt 15 26.297 ± 0.941 ns/op
SecondarySupersLookup.testNegative01 avgt 15 26.345 ± 1.404 ns/op
SecondarySupersLookup.testNegative02 avgt 15 27.191 ± 1.421 ns/op
SecondarySupersLookup.testNegative03 avgt 15 28.546 ± 1.475 ns/op
SecondarySupersLookup.testNegative04 avgt 15 28.785 ± 1.329 ns/op
SecondarySupersLookup.testNegative05 avgt 15 30.524 ± 1.816 ns/op
SecondarySupersLookup.testNegative06 avgt 15 30.284 ± 0.984 ns/op
SecondarySupersLookup.testNegative07 avgt 15 31.594 ± 1.093 ns/op
SecondarySupersLookup.testNegative08 avgt 15 32.778 ± 1.269 ns/op
SecondarySupersLookup.testNegative09 avgt 15 33.549 ± 0.913 ns/op
SecondarySupersLookup.testNegative10 avgt 15 35.468 ± 1.643 ns/op
SecondarySupersLookup.testNegative16 avgt 15 68.065 ± 1.890 ns/op
SecondarySupersLookup.testNegative20 avgt 15 58.098 ± 2.704 ns/op
SecondarySupersLookup.testNegative30 avgt 15 70.944 ± 2.929 ns/op
SecondarySupersLookup.testNegative32 avgt 15 76.134 ± 3.719 ns/op
SecondarySupersLookup.testNegative40 avgt 15 89.092 ± 4.396 ns/op
SecondarySupersLookup.testNegative50 avgt 15 105.226 ± 4.877 ns/op
SecondarySupersLookup.testNegative55 avgt 15 115.744 ± 6.281 ns/op
SecondarySupersLookup.testNegative56 avgt 15 119.860 ± 5.618 ns/op
SecondarySupersLookup.testNegative57 avgt 15 117.818 ± 5.497 ns/op
SecondarySupersLookup.testNegative58 avgt 15 121.410 ± 6.781 ns/op
SecondarySupersLookup.testNegative59 avgt 15 124.500 ± 7.016 ns/op
SecondarySupersLookup.testNegative60 avgt 15 125.322 ± 5.241 ns/op
SecondarySupersLookup.testNegative61 avgt 15 129.009 ± 4.680 ns/op
SecondarySupersLookup.testNegative62 avgt 15 126.704 ± 5.917 ns/op
SecondarySupersLookup.testNegative63 avgt 15 131.529 ± 5.247 ns/op
SecondarySupersLookup.testNegative64 avgt 15 134.511 ± 4.925 ns/op
SecondarySupersLookup.testPositive01 avgt 15 22.386 ± 0.680 ns/op
SecondarySupersLookup.testPositive02 avgt 15 21.655 ± 0.492 ns/op
SecondarySupersLookup.testPositive03 avgt 15 22.123 ± 0.671 ns/op
SecondarySupersLookup.testPositive04 avgt 15 22.050 ± 0.610 ns/op
SecondarySupersLookup.testPositive05 avgt 15 22.048 ± 0.614 ns/op
SecondarySupersLookup.testPositive06 avgt 15 21.850 ± 0.597 ns/op
SecondarySupersLookup.testPositive07 avgt 15 21.844 ± 0.619 ns/op
SecondarySupersLookup.testPositive08 avgt 15 21.832 ± 0.601 ns/op
SecondarySupersLookup.testPositive09 avgt 15 21.743 ± 0.527 ns/op
SecondarySupersLookup.testPositive10 avgt 15 22.037 ± 0.609 ns/op
SecondarySupersLookup.testPositive16 avgt 15 22.300 ± 0.502 ns/op
SecondarySupersLookup.testPositive20 avgt 15 21.607 ± 0.498 ns/op
SecondarySupersLookup.testPositive30 avgt 15 21.836 ± 0.602 ns/op
SecondarySupersLookup.testPositive32 avgt 15 21.629 ± 0.484 ns/op
SecondarySupersLookup.testPositive40 avgt 15 21.850 ± 0.621 ns/op
SecondarySupersLookup.testPositive50 avgt 15 22.478 ± 0.130 ns/op
SecondarySupersLookup.testPositive60 avgt 15 22.058 ± 0.617 ns/op
SecondarySupersLookup.testPositive63 avgt 15 21.828 ± 0.596 ns/op
SecondarySupersLookup.testPositive64 avgt 15 22.077 ± 0.603 ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'
With patch:
Benchmark Mode Cnt Score Error Units
SecondarySupersLookup.testNegative00 avgt 15 24.305 ± 1.269 ns/op
SecondarySupersLookup.testNegative01 avgt 15 23.552 ± 1.214 ns/op
SecondarySupersLookup.testNegative02 avgt 15 22.720 ± 0.771 ns/op
SecondarySupersLookup.testNegative03 avgt 15 22.713 ± 0.834 ns/op
SecondarySupersLookup.testNegative04 avgt 15 22.924 ± 0.684 ns/op
SecondarySupersLookup.testNegative05 avgt 15 22.614 ± 0.604 ns/op
SecondarySupersLookup.testNegative06 avgt 15 22.387 ± 0.641 ns/op
SecondarySupersLookup.testNegative07 avgt 15 22.201 ± 0.502 ns/op
SecondarySupersLookup.testNegative08 avgt 15 22.391 ± 0.606 ns/op
SecondarySupersLookup.testNegative09 avgt 15 22.462 ± 0.617 ns/op
SecondarySupersLookup.testNegative10 avgt 15 22.525 ± 0.202 ns/op
SecondarySupersLookup.testNegative16 avgt 15 22.439 ± 0.616 ns/op
SecondarySupersLookup.testNegative20 avgt 15 22.963 ± 0.298 ns/op
SecondarySupersLookup.testNegative30 avgt 15 22.642 ± 0.621 ns/op
SecondarySupersLookup.testNegative32 avgt 15 22.306 ± 0.670 ns/op
SecondarySupersLookup.testNegative40 avgt 15 22.663 ± 0.644 ns/op
SecondarySupersLookup.testNegative50 avgt 15 22.001 ± 0.238 ns/op
SecondarySupersLookup.testNegative55 avgt 15 128.558 ± 5.735 ns/op
SecondarySupersLookup.testNegative56 avgt 15 128.633 ± 4.893 ns/op
SecondarySupersLookup.testNegative57 avgt 15 129.143 ± 5.955 ns/op
SecondarySupersLookup.testNegative58 avgt 15 132.434 ± 6.478 ns/op
SecondarySupersLookup.testNegative59 avgt 15 130.243 ± 5.901 ns/op
SecondarySupersLookup.testNegative60 avgt 15 163.505 ± 8.278 ns/op
SecondarySupersLookup.testNegative61 avgt 15 163.934 ± 9.008 ns/op
SecondarySupersLookup.testNegative62 avgt 15 162.247 ± 6.238 ns/op
SecondarySupersLookup.testNegative63 avgt 15 213.133 ± 9.582 ns/op
SecondarySupersLookup.testNegative64 avgt 15 214.724 ± 11.562 ns/op
SecondarySupersLookup.testPositive01 avgt 15 21.622 ± 0.482 ns/op
SecondarySupersLookup.testPositive02 avgt 15 21.842 ± 0.602 ns/op
SecondarySupersLookup.testPositive03 avgt 15 22.274 ± 0.516 ns/op
SecondarySupersLookup.testPositive04 avgt 15 21.833 ± 0.632 ns/op
SecondarySupersLookup.testPositive05 avgt 15 21.842 ± 0.603 ns/op
SecondarySupersLookup.testPositive06 avgt 15 21.630 ± 0.527 ns/op
SecondarySupersLookup.testPositive07 avgt 15 22.054 ± 0.581 ns/op
SecondarySupersLookup.testPositive08 avgt 15 21.872 ± 0.613 ns/op
SecondarySupersLookup.testPositive09 avgt 15 21.839 ± 0.604 ns/op
SecondarySupersLookup.testPositive10 avgt 15 21.619 ± 0.494 ns/op
SecondarySupersLookup.testPositive16 avgt 15 21.624 ± 0.509 ns/op
SecondarySupersLookup.testPositive20 avgt 15 21.828 ± 0.595 ns/op
SecondarySupersLookup.testPositive30 avgt 15 21.861 ± 0.617 ns/op
SecondarySupersLookup.testPositive32 avgt 15 22.141 ± 0.609 ns/op
SecondarySupersLookup.testPositive40 avgt 15 21.632 ± 0.485 ns/op
SecondarySupersLookup.testPositive50 avgt 15 21.856 ± 0.597 ns/op
SecondarySupersLookup.testPositive60 avgt 15 22.068 ± 0.610 ns/op
SecondarySupersLookup.testPositive63 avgt 15 21.647 ± 0.496 ns/op
SecondarySupersLookup.testPositive64 avgt 15 21.847 ± 0.595 ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'
With the above test data, we see that there is a performance decrease from testNegative55 to testNegative64 when zbb is not available. This is because a loop is needed to count the number of 1 when Zbb is not available. Therefore, the current patch is only optimized when Zbb is available.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2144900070
More information about the hotspot-dev
mailing list