RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v3]

Gui Cao gcao at openjdk.org
Fri Jun 7 09:49:18 UTC 2024


On Tue, 4 Jun 2024 08:55:24 GMT, Hamlin Li <mli at openjdk.org> wrote:

> There are a bit regression in cases of testNegative63/64, although these might be rare cases or not very common cases, but it's worth to have a try to improve it if possible. I guess it's related to the implementation for the cases when bitmap is full. When it's full, before go to `repne_scan`, there're some instructions to execute. I wonder if it will help to have another "bitmap full test" just after "bitmap false test" (which is `test_bit(t0, r_bitmap, bit);`). But I'm not sure if it's feasible, maybe worth a try.

Hi, Sorry for being late. Thanks for the suggestion, I gave it a try.
``` diff
diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
index 61e8016db4a..af1649c061f 100644
--- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
@@ -3664,6 +3664,32 @@ bool MacroAssembler::lookup_secondary_supers_table(Register r_sub_klass,
   test_bit(t0, r_bitmap, bit);
   beqz(t0, L_fallthrough);
 
+  // We will consult the secondary-super array.
+  ld(r_array_base, Address(r_sub_klass, in_bytes(Klass::secondary_supers_offset())));
+
+  mv(tmp3, r_bitmap);
+  if (bit != 0) {
+    ror_imm(tmp3, tmp3, bit);
+  }
+
+  Label skip;
+  addi(t0, tmp3, (u1)1);
+  bnez(t0, skip);
+
+    // Load the array length.
+  lwu(r_array_length, Address(r_array_base, Array<Klass*>::length_offset_in_bytes()));
+  // And adjust the array base to point to the data.
+  // NB! Effectively increments current slot index by 1.
+  assert(Array<Klass*>::base_offset_in_bytes() == wordSize, "");
+  addi(r_array_base, r_array_base, Array<Klass*>::base_offset_in_bytes());
+  
+  repne_scan(r_array_base, r_super_klass, r_array_length, t0);
+  bne(r_super_klass, t0, L_fallthrough);
+  mv(result, zr);
+  beqz(result, L_fallthrough);
+  
+  bind(skip);
+
   // Get the first array index that can contain super_klass into r_array_index.
   if (bit != 0) {
     slli(r_array_index, r_bitmap, (Klass::SECONDARY_SUPERS_TABLE_MASK - bit));
@@ -3672,9 +3698,6 @@ bool MacroAssembler::lookup_secondary_supers_table(Register r_sub_klass,
     mv(r_array_index, (u1)1);
   }
 
-  // We will consult the secondary-super array.
-  ld(r_array_base, Address(r_sub_klass, in_bytes(Klass::secondary_supers_offset())));
-
   // The value i in r_array_index is >= 1, so even though r_array_base
   // points to the length, we don't need to adjust it to point to the data.
   assert(Array<Klass*>::base_offset_in_bytes() == wordSize, "Adjust this code");


#### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs

Benchmark                             Mode  Cnt    Score   Error  Units
SecondarySupersLookup.testNegative00  avgt   15   13.262 ± 0.197  ns/op
SecondarySupersLookup.testNegative01  avgt   15   13.273 ± 0.222  ns/op
SecondarySupersLookup.testNegative02  avgt   15   13.264 ± 0.199  ns/op
SecondarySupersLookup.testNegative03  avgt   15   13.275 ± 0.222  ns/op
SecondarySupersLookup.testNegative04  avgt   15   13.264 ± 0.198  ns/op
SecondarySupersLookup.testNegative05  avgt   15   13.259 ± 0.192  ns/op
SecondarySupersLookup.testNegative06  avgt   15   13.260 ± 0.195  ns/op
SecondarySupersLookup.testNegative07  avgt   15   13.275 ± 0.221  ns/op
SecondarySupersLookup.testNegative08  avgt   15   13.261 ± 0.196  ns/op
SecondarySupersLookup.testNegative09  avgt   15   13.267 ± 0.201  ns/op
SecondarySupersLookup.testNegative10  avgt   15   13.272 ± 0.211  ns/op
SecondarySupersLookup.testNegative16  avgt   15   13.271 ± 0.200  ns/op
SecondarySupersLookup.testNegative20  avgt   15   13.271 ± 0.210  ns/op
SecondarySupersLookup.testNegative30  avgt   15   13.277 ± 0.219  ns/op
SecondarySupersLookup.testNegative32  avgt   15   13.280 ± 0.224  ns/op
SecondarySupersLookup.testNegative40  avgt   15   13.285 ± 0.232  ns/op
SecondarySupersLookup.testNegative50  avgt   15   13.288 ± 0.237  ns/op
SecondarySupersLookup.testNegative55  avgt   15   54.940 ± 0.771  ns/op
SecondarySupersLookup.testNegative56  avgt   15   54.934 ± 0.798  ns/op
SecondarySupersLookup.testNegative57  avgt   15   54.909 ± 0.766  ns/op
SecondarySupersLookup.testNegative58  avgt   15   54.679 ± 0.830  ns/op
SecondarySupersLookup.testNegative59  avgt   15   54.941 ± 0.819  ns/op
SecondarySupersLookup.testNegative60  avgt   15   76.957 ± 0.945  ns/op
SecondarySupersLookup.testNegative61  avgt   15   76.956 ± 1.007  ns/op
SecondarySupersLookup.testNegative62  avgt   15   76.938 ± 0.976  ns/op
SecondarySupersLookup.testNegative63  avgt   15  138.371 ± 1.192  ns/op
SecondarySupersLookup.testNegative64  avgt   15  140.137 ± 1.077  ns/op
SecondarySupersLookup.testPositive01  avgt   15   10.734 ± 0.149  ns/op
SecondarySupersLookup.testPositive02  avgt   15   10.730 ± 0.150  ns/op
SecondarySupersLookup.testPositive03  avgt   15   10.727 ± 0.146  ns/op
SecondarySupersLookup.testPositive04  avgt   15   10.735 ± 0.157  ns/op
SecondarySupersLookup.testPositive05  avgt   15   10.730 ± 0.149  ns/op
SecondarySupersLookup.testPositive06  avgt   15   10.735 ± 0.155  ns/op
SecondarySupersLookup.testPositive07  avgt   15   10.730 ± 0.149  ns/op
SecondarySupersLookup.testPositive08  avgt   15   10.730 ± 0.149  ns/op
SecondarySupersLookup.testPositive09  avgt   15   10.731 ± 0.151  ns/op
SecondarySupersLookup.testPositive10  avgt   15   10.728 ± 0.148  ns/op
SecondarySupersLookup.testPositive16  avgt   15   10.733 ± 0.151  ns/op
SecondarySupersLookup.testPositive20  avgt   15   10.728 ± 0.146  ns/op
SecondarySupersLookup.testPositive30  avgt   15   10.737 ± 0.162  ns/op
SecondarySupersLookup.testPositive32  avgt   15   10.733 ± 0.154  ns/op
SecondarySupersLookup.testPositive40  avgt   15   10.729 ± 0.148  ns/op
SecondarySupersLookup.testPositive50  avgt   15   10.732 ± 0.151  ns/op
SecondarySupersLookup.testPositive60  avgt   15   10.729 ± 0.148  ns/op
SecondarySupersLookup.testPositive63  avgt   15   10.730 ± 0.150  ns/op
SecondarySupersLookup.testPositive64  avgt   15   10.735 ± 0.158  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

After the fix it got better with the cases of testNegative63/64. As said above, this case is very rare, Is there a need for this additional complexity?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2154485874


More information about the hotspot-dev mailing list