RFR: 8373595: A new ObjectMonitorTable implementation

Martin Doerr mdoerr at openjdk.org
Mon Jan 26 15:27:35 UTC 2026


On Fri, 23 Jan 2026 10:36:26 GMT, Fredrik Bredberg <fbredberg at openjdk.org> wrote:

> Me and Anton Artemov (@toxaart) investigated a quite large regression that occurred in Pet Clinic that happened if you turned on Compact Object Headers. It was found that a large portion of that regression could be attributed to not finding the monitor in the Object Monitor Cache, and because we couldn't access the Object Monitor Table from C2 generated code, we often had to take the slow path.
> 
> By making the object monitor cache larger and make it use the object's hash value as a key, we managed to mitigate the regression.
> 
> Erik Österlund (@fisk) took that idea and elevated it to the next level, which means that he rewrote the object monitor table code so that we can now search for an object in the global object monitor table from C2 generated code. I.e. from `C2_MacroAssembler::fast_lock()`. As in my and Anton's version, the key is the hash value from the object.
> 
> Erik also provided new barrier code needed for ZGC for x86 and aarch64. Roman Kennke (@rkennke) provided the same for Shenandoah, also for x86 and aarch64.
> 
> We decided to keep the Object Monitor Cache, since we found that for most programs (but not Pet Clinic) the monitor you are looking for is likely found in the first positions of the cache (it's sorted in most recently used order). However we decresed the size from 8 to 2 elements.
> 
> After running extensive performance tests we can say that this has improved the performance in many of them, not only mitigated the regression in Pet Clinic.
> 
> Tests are running okay tier1-7 on supported platforms.
> 
> The rest of the platforms (`ppc`, `riscv` and `s390`) have been smoke tested using QEMU.
> I mainly used this test for smoke testing with QEMU: `-XX:+UnlockDiagnosticVMOptions -XX:+UseObjectMonitorTable ./test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java `

Nice work! And thanks for writing the platform code as well. That's awesome! The PPC64 tests have passed (executed huge number of tests), but the comparison with `below_is_special` should be unsigned. In addition, the code can be optimized:

diff --git a/src/hotspot/cpu/ppc/gc/z/zBarrierSetAssembler_ppc.cpp b/src/hotspot/cpu/ppc/gc/z/zBarrierSetAssembler_ppc.cpp
index c1c818684cd..9012a3767b9 100644
--- a/src/hotspot/cpu/ppc/gc/z/zBarrierSetAssembler_ppc.cpp
+++ b/src/hotspot/cpu/ppc/gc/z/zBarrierSetAssembler_ppc.cpp
@@ -955,8 +955,8 @@ void ZBarrierSetAssembler::try_resolve_weak_handle_in_c2(MacroAssembler* masm, R
   BarrierSetAssembler::try_resolve_weak_handle_in_c2(masm, obj, tmp, slow_path);
 
   // Check if the oop is bad, in which case we need to take the slow path.
-  __ ld(tmp, in_bytes(ZThreadLocalData::mark_bad_mask_offset()), R16_thread);
-  __ and_(tmp, obj, tmp);
+  __ relocate(barrier_Relocation::spec(), ZBarrierRelocationFormatMarkBadMask);
+  __ andi_(tmp, obj, barrier_Relocation::unpatched);
   __ bne(CR0, slow_path);
 
   // Oop is okay, so we uncolor it.
diff --git a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
index f7a59983854..7ff8c08e1bf 100644
--- a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
+++ b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
@@ -2756,47 +2756,41 @@ void MacroAssembler::compiler_fast_lock_object(ConditionRegister flag, Register
       addi(owner_addr, mark, in_bytes(ObjectMonitor::owner_offset()) - monitor_tag);
       mark = noreg;
     } else {
-      const Register cache_addr = tmp3;
-      const Register tmp3_bucket = tmp3;
-      const Register tmp2_hash = tmp2;
       Label monitor_found;
 
-      // Save the mark, we might need it to extract the hash.
-      mr(tmp2_hash, mark);
-
       // Look for the monitor in the om_cache.
-
-      // Load cache address
-      addi(cache_addr, R16_thread, in_bytes(JavaThread::om_cache_oops_offset()));
-
+      const int first_obj_offset = in_bytes(JavaThread::om_cache_oops_offset());
+      const int first_monitor_offset = first_obj_offset + in_bytes(OMCache::oop_to_monitor_difference());
+      const int entry_size = in_bytes(OMCache::oop_to_oop_difference());
       const int num_unrolled = OMCache::CAPACITY;
+
       for (int i = 0; i < num_unrolled; i++) {
-        ld(R0, 0, cache_addr);
-        ld(monitor, in_bytes(OMCache::oop_to_monitor_difference()), cache_addr);
+        ld(R0, first_obj_offset + i * entry_size, R16_thread);
+        ld(monitor, first_monitor_offset + i * entry_size, R16_thread);
         cmpd(CR0, R0, obj);
         beq(CR0, monitor_found);
-        addi(cache_addr, cache_addr, in_bytes(OMCache::oop_to_oop_difference()));
       }
 
       // Look for the monitor in the table.
+      const Register tmp2_hash = tmp2;
+      const Register tmp3_bucket = tmp3;
 
       // Get the hash code.
-      srdi(tmp2_hash, tmp2_hash, markWord::hash_shift);
+      srdi(tmp2_hash, mark, markWord::hash_shift);
 
-      // Get the table and calculate the bucket's address
-      load_const_optimized(tmp3, ObjectMonitorTable::current_table_address(), R0);
-      ld_ptr(tmp3, 0, tmp3);
+      // Get the table and calculate the bucket's address (base and index)
+      int simm16_rest = load_const_optimized(tmp3, ObjectMonitorTable::current_table_address(), R0, true);
+      ld_ptr(tmp3, simm16_rest, tmp3);
       ld(tmp1, in_bytes(ObjectMonitorTable::table_capacity_mask_offset()), tmp3);
       andr(tmp2_hash, tmp2_hash, tmp1);
-      ld(tmp3, in_bytes(ObjectMonitorTable::table_buckets_offset()), tmp3);
-      sldi(tmp2_hash, tmp2_hash, LogBytesPerWord);
-      add(tmp3_bucket, tmp3, tmp2_hash);
+      ld(tmp3_bucket, in_bytes(ObjectMonitorTable::table_buckets_offset()), tmp3);
 
       // Read the monitor from the bucket.
-      ld_ptr(monitor, 0, tmp3_bucket);
+      sldi(tmp2_hash, tmp2_hash, LogBytesPerWord);
+      ldx(monitor, tmp3_bucket, tmp2_hash);
 
       // Check if the monitor in the bucket is special (empty, tombstone or removed).
-      cmpdi(CR0, monitor, ObjectMonitorTable::SpecialPointerValues::below_is_special);
+      cmpldi(CR0, monitor, ObjectMonitorTable::SpecialPointerValues::below_is_special);
       blt(CR0, slow_path);
 
       // Check if object matches.

Please take a look! Some parts may be interesting for other platform, too.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/29383#issuecomment-3800172269


More information about the hotspot-runtime-dev mailing list