RFR(S): 8235390: JfrEmergencyDump::on_vm_shutdown crashes

Wed Dec 18 15:13:25 UTC 2019

Looks good.

Erik

On 2019-12-06 15:07, Markus Gronlund wrote:
> Greetings,
>
> Please help review the following changeset:
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8235390
> Webrev: http://cr.openjdk.java.net/~mgronlun/8235390/webrev/
> Summary:
> With JFR Event Streaming [1], there was a change made to the underlying lock mechanism, used to protect JFR rotations. Previously, there was an internal RotationLock in JfrRecorderService.cpp using a combination of CAS retries and thread relative sleep back offs [2]. With JFR Event Streaming, a need surfaced to coordinate the flush() operation with the EventEmitter::emit_events() operation, not having them run concurrently due to events being prematurely serialized to the stream before the associated constant pools was ready. With this new requirement, the RotationLock could not be internal any longer. The thinking then was to repurpose the existing global JfrStream_lock instead. However, there was a mistake made in having this lock be taken with MutexLocker lock(JfrStream_lock), which by default check for safepoints upon the lock being contended. Since the emergency code path is very sensitive, we can't make any assumptions about a safe state, so a JavaThread coming in might not be able to move to a safepoint (which is the situation reported in the bug [3]), for example if running into a problem at a leaf call site. This would also have been a problem with the previous RotationLock.
>
> The JfrStream_lock should be correctly taken with Mutex::_no_safepoint_check property. However, in doing so, this however leads to a potential, although very unlikely deadlock situation: if there is a JFR rotation in progress, a JavaThread attempting to acquire JfrStream_lock without a safepoint check, will not let the pending safepoint associated with the rotation to complete, because the JavaThread will not report to the safepoint coordinator. There is some additional logic added to prevent this very rare deadlock prone situation to occur. There is still a small window where it still could happen, but there exist a safeguard against the deadlock as the WatcherThread will take down the VM after a timeout associated with error reporting.
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8226511
> [2] https://hg.openjdk.java.net/jdk/jdk/diff/c16ac7a2eba4/src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp#l1.63
> [3] https://bugs.openjdk.java.net/browse/JDK-8235390