RFR: 8373595: A new ObjectMonitorTable implementation
Lutz Schmidt
lucy at openjdk.org
Wed Jan 28 09:23:03 UTC 2026
On Fri, 23 Jan 2026 10:36:26 GMT, Fredrik Bredberg <fbredberg at openjdk.org> wrote:
> Me and Anton Artemov (@toxaart) investigated a quite large regression that occurred in Pet Clinic that happened if you turned on Compact Object Headers. It was found that a large portion of that regression could be attributed to not finding the monitor in the Object Monitor Cache, and because we couldn't access the Object Monitor Table from C2 generated code, we often had to take the slow path.
>
> By making the object monitor cache larger and make it use the object's hash value as a key, we managed to mitigate the regression.
>
> Erik Österlund (@fisk) took that idea and elevated it to the next level, which means that he rewrote the object monitor table code so that we can now search for an object in the global object monitor table from C2 generated code. I.e. from `C2_MacroAssembler::fast_lock()`. As in my and Anton's version, the key is the hash value from the object.
>
> Erik also provided new barrier code needed for ZGC for x86 and aarch64. Roman Kennke (@rkennke) provided the same for Shenandoah, also for x86 and aarch64.
>
> We decided to keep the Object Monitor Cache, since we found that for most programs (but not Pet Clinic) the monitor you are looking for is likely found in the first positions of the cache (it's sorted in most recently used order). However we decresed the size from 8 to 2 elements.
>
> After running extensive performance tests we can say that this has improved the performance in many of them, not only mitigated the regression in Pet Clinic.
>
> Tests are running okay tier1-7 on supported platforms.
>
> The rest of the platforms (`ppc`, `riscv` and `s390`) have been smoke tested using QEMU.
> I mainly used this test for smoke testing with QEMU: `-XX:+UnlockDiagnosticVMOptions -XX:+UseObjectMonitorTable ./test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java `
I checked the s390 implementation only: nice work!
I have added some hints to make the generated code more s390-style. Thanks for considering.
Please note: the suggested changes are just dry-coded, not tested!
src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6394:
> 6392: z_bre(monitor_found);
> 6393: add2reg(cache_addr, in_bytes(OMCache::oop_to_oop_difference()));
> 6394: }
This would be an alternative for the above loop:
ByteSize cache_offset = JavaThread::om_cache_oops_offset();
ByteSize monitor_offset = OMCache::oop_to_monitor_difference();
const unsigned int num_unrolled = 2;
for (unsigned int i = 0; i < num_unrolled; i++) {
z_lg(tmp1_monitor, Address(Z_thread, cache_offset + monitor_offset));
z_cg(obj, Address(Z_thread, cache_offset));
z_bre(monitor_found);
cache_offset += in_bytes(OMCache::oop_to_oop_difference());
}
src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6403:
> 6401: z_lg(tmp2, Address(tmp2));
> 6402: z_lg(tmp1, Address(tmp2, ObjectMonitorTable::table_capacity_mask_offset()));
> 6403: z_ngr(hash, tmp1);
The above two instructions could be combined:
` z_ng(hash, Address(tmp2, ObjectMonitorTable::table_capacity_mask_offset()));`
src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6404:
> 6402: z_lg(tmp1, Address(tmp2, ObjectMonitorTable::table_capacity_mask_offset()));
> 6403: z_ngr(hash, tmp1);
> 6404: z_lg(tmp1, Address(tmp2, ObjectMonitorTable::table_buckets_offset()));
Wouldn't it be clearer to use `tmp1_bucket` here?
-------------
Changes requested by lucy (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/29383#pullrequestreview-3715414005
PR Review Comment: https://git.openjdk.org/jdk/pull/29383#discussion_r2735608767
PR Review Comment: https://git.openjdk.org/jdk/pull/29383#discussion_r2735567037
PR Review Comment: https://git.openjdk.org/jdk/pull/29383#discussion_r2735574647
More information about the shenandoah-dev
mailing list