Split Lock Warning with ZGC and -XX:-ClassUnloading on Linux x86_64, JDK 17.0.2
HI ALL: When running JDK 17.0.2 on a Linux x86_64 architecture with ZGC and the JVM option -XX:-ClassUnloading, I encounter split lock warnings from the Linux kernel. This issue appears consistently during garbage collection operations. Here is the specific warning message from the kernel: x86/split lock detection: #AC: ZWorker#0/2154775 took a split_lock trap at address: 0x7f50c6e0433c Upon investigating the assembly at this address, I identified the following instruction: 0x00007f50c6e0433c <+76>: lock cmpxchg %rcx,(%rbx) This is part of the function: Dump of assembler code for function _ZN15ZMarkOopClosure6do_oopEPP7oopDesc: 0x00007f50c6e0433c <+76>: lock cmpxchg %rcx,(%rbx) The split lock warning occurs during the execution of the ZWorker thread, which is responsible for concurrent marking in ZGC. The warning seems to be triggered specifically when class unloading is disabled with -XX:-ClassUnloading. Environment: JDK Version: OpenJDK 17.0.2 GC: ZGC with -XX:-ClassUnloading OS: Linux x86_64 I would like to understand if this behavior is expected when class unloading is disabled or if there are any recommended fixes or workarounds for avoiding the split lock issue during concurrent garbage collection.
Hi, Thanks for reporting this. This most likely triggers in the GC's handling of object pointers (oops) inside JIT:ed methods (nmethods). These are the only pointers that can be misaligned. Typically, we write to these locations without a CAS (code from > JDK 21): ``` // Non-atomic healing helps speed up root scanning. This is safe to do // since we are always healing roots in a safepoint, or under a lock, // which ensures we are never racing with mutators modifying roots while // we are healing them. It's also safe in case multiple GC threads try // to heal the same root if it is aligned, since they would always heal // the root in the same way and it does not matter in which order it // happens. For misalignedoops, there needs to be mutual exclusion. *p = XOop::from_address(good_addr); ``` We do use locks to ensure that only one thread writes to these pointers in the JIT:ed methods. However, when class unloading is turned off we consider the object pointers in the JIT:ed methods to be roots into the object graph. When we walk these specific roots we take another path than the one pasted above, and that path performs a CAS to fix the pointers. It is still guarded by a lock, so there is is still mutual exclusion on this field. So, if I understand things correctly, there's no known problem with this because of the mutual exclusion, except that the split lock detector reports an issue. FWIW, the ZGC code in JDK 17 is the non-generational version of ZGC. In JDK 21 we introduced Generational ZGC, and in that version we have removed the CAS:es in the root processing of JIT:ed methods. Cheers, StefanK On 2024-09-10 14:20, jianping Lu wrote:
HI ALL:
When running JDK 17.0.2 on a Linux x86_64 architecture with ZGC and the JVM option -XX:-ClassUnloading, I encounter split lock warnings from the Linux kernel. This issue appears consistently during garbage collection operations.
Here is the specific warning message from the kernel: x86/split lock detection: #AC: ZWorker#0/2154775 took a split_lock trap at address: 0x7f50c6e0433c
Upon investigating the assembly at this address, I identified the following instruction: 0x00007f50c6e0433c <+76>: lock cmpxchg %rcx,(%rbx)
This is part of the function: Dump of assembler code for function _ZN15ZMarkOopClosure6do_oopEPP7oopDesc: 0x00007f50c6e0433c <+76>: lock cmpxchg %rcx,(%rbx)
The split lock warning occurs during the execution of the ZWorker thread, which is responsible for concurrent marking in ZGC. The warning seems to be triggered specifically when class unloading is disabled with -XX:-ClassUnloading.
Environment: JDK Version: OpenJDK 17.0.2 GC: ZGC with -XX:-ClassUnloading OS: Linux x86_64
I would like to understand if this behavior is expected when class unloading is disabled or if there are any recommended fixes or workarounds for avoiding the split lock issue during concurrent garbage collection.
* Stefan Karlsson:
However, when class unloading is turned off we consider the object pointers in the JIT:ed methods to be roots into the object graph. When we walk these specific roots we take another path than the one pasted above, and that path performs a CAS to fix the pointers. It is still guarded by a lock, so there is is still mutual exclusion on this field. So, if I understand things correctly, there's no known problem with this because of the mutual exclusion, except that the split lock detector reports an issue.
Wouldn't the nmethod experience tearing during execution because it doesn't take the lock? Thanks, Florian
On 2024-09-11 12:34, Florian Weimer wrote:
* Stefan Karlsson:
However, when class unloading is turned off we consider the object pointers in the JIT:ed methods to be roots into the object graph. When we walk these specific roots we take another path than the one pasted above, and that path performs a CAS to fix the pointers. It is still guarded by a lock, so there is is still mutual exclusion on this field. So, if I understand things correctly, there's no known problem with this because of the mutual exclusion, except that the split lock detector reports an issue. Wouldn't the nmethod experience tearing during execution because it doesn't take the lock?
The executing thread uses the same lock to stabilize the oops before the execution can continue. This is referred to as the "nmethod entry barrier". If you want to follow the code more closely, you can see the GC root processing path here: https://github.com/openjdk/jdk17u/blob/f95f7f4a1163a5f62bdcac16199207ed736fd... and the nmethod entry barrier path that the Java thread executes can be found here: https://github.com/openjdk/jdk17u/blob/f95f7f4a1163a5f62bdcac16199207ed736fd... Note that the Java threads call the nmethod entry barrier for two reasons: 1) When it calls into an nmethod 2) When it got blocked in a safepoint and ZGC switched phase to either start marking or start relocation. When it wakes up again, all the oops needs to be stabilized and the nmethod entry barrier code is rerun. (See https://github.com/openjdk/jdk17u/blob/f95f7f4a1163a5f62bdcac16199207ed736fd...) Cheers, StefanK
Thanks, Florian
participants (3)
-
Florian Weimer
-
jianping Lu
-
Stefan Karlsson