RFR: 8373595: A new ObjectMonitorTable implementation
Fei Yang
fyang at openjdk.org
Fri Jan 30 14:47:35 UTC 2026
On Fri, 23 Jan 2026 13:51:42 GMT, Fredrik Bredberg <fbredberg at openjdk.org> wrote:
>> Me and Anton Artemov (@toxaart) investigated a quite large regression that occurred in Pet Clinic that happened if you turned on Compact Object Headers. It was found that a large portion of that regression could be attributed to not finding the monitor in the Object Monitor Cache, and because we couldn't access the Object Monitor Table from C2 generated code, we often had to take the slow path.
>>
>> By making the object monitor cache larger and make it use the object's hash value as a key, we managed to mitigate the regression.
>>
>> Erik Österlund (@fisk) took that idea and elevated it to the next level, which means that he rewrote the object monitor table code so that we can now search for an object in the global object monitor table from C2 generated code. I.e. from `C2_MacroAssembler::fast_lock()`. As in my and Anton's version, the key is the hash value from the object.
>>
>> Erik also provided new barrier code needed for ZGC for x86 and aarch64. Roman Kennke (@rkennke) provided the same for Shenandoah, also for x86 and aarch64.
>>
>> We decided to keep the Object Monitor Cache, since we found that for most programs (but not Pet Clinic) the monitor you are looking for is likely found in the first positions of the cache (it's sorted in most recently used order). However we decresed the size from 8 to 2 elements.
>>
>> After running extensive performance tests we can say that this has improved the performance in many of them, not only mitigated the regression in Pet Clinic.
>>
>> Tests are running okay tier1-7 on supported platforms.
>>
>> The rest of the platforms (`ppc`, `riscv` and `s390`) have been smoke tested using QEMU.
>> I mainly used this test for smoke testing with QEMU: `-XX:+UnlockDiagnosticVMOptions -XX:+UseObjectMonitorTable ./test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java `
>
> @TheRealMDoerr, @RealFYang, @offamitkumar
> Hi guys!
> So here we have a new Object Monitor Table implementation. As stated above I've smoke tested `ppc`, `riscv` and `s390` using QEMU. However I haven't been able to run proper performance tests on "your" platforms, but all the assembler stuff is follows the same scheme as x86 and aarch64, which I have run performance tests on, so hopefully it's good on all platforms. Anyhow, please grab a copy, run it on your favorite platform, and tell me if there's anything wrong with it. Thanks in advance.
@fbredber : Hi, Thanks for the ping! I checked the RISC-V part and seems that several minor improvements could be made by making use of the extra temp register `tmp4`. Also we can make use of `shadd` from the bit-manipulation extension to calculate address of the bucket when available. Please consider the following add-on change for this platform.
diff --git a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
index 4a7669eee1d..0c437b45852 100644
--- a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
@@ -138,9 +138,9 @@ void C2_MacroAssembler::fast_lock(Register obj, Register box,
const int num_unrolled = OMCache::CAPACITY;
for (int i = 0; i < num_unrolled; i++) {
- ld(t0, Address(tmp3_t));
+ ld(tmp4, Address(tmp3_t));
ld(tmp1_monitor, Address(tmp3_t, OMCache::oop_to_monitor_difference()));
- beq(obj, t0, monitor_found);
+ beq(obj, tmp4, monitor_found);
add(tmp3_t, tmp3_t, in_bytes(OMCache::oop_to_oop_difference()));
}
@@ -155,14 +155,13 @@ void C2_MacroAssembler::fast_lock(Register obj, Register box,
ld(tmp1, Address(tmp3_t, ObjectMonitorTable::table_capacity_mask_offset()));
andr(tmp2_hash, tmp2_hash, tmp1);
ld(tmp3_t, Address(tmp3_t, ObjectMonitorTable::table_buckets_offset()));
- slli(tmp2_hash, tmp2_hash, LogBytesPerWord);
- add(tmp3_bucket, tmp3_t, tmp2_hash);
// Read the monitor from the bucket.
+ shadd(tmp3_bucket, tmp2_hash, tmp3_t, tmp4, LogBytesPerWord);
ld(tmp1_monitor, Address(tmp3_bucket));
// Check if the monitor in the bucket is special (empty, tombstone or removed).
- li(tmp2, ObjectMonitorTable::SpecialPointerValues::below_is_special);
+ mv(tmp2, ObjectMonitorTable::SpecialPointerValues::below_is_special);
bltu(tmp1_monitor, tmp2, slow_path);
// Check if object matches.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/29383#issuecomment-3824124034
More information about the shenandoah-dev
mailing list