RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented
Alternative for solving [JDK-8371014](https://bugs.openjdk.org/browse/JDK-8371014) Also includes a fix for [JDK-8373257](https://bugs.openjdk.org/browse/JDK-8373257) ------------- Commit messages: - reordering - update comment - is_recording() conditional - remove tautology - 8371014 Changes: https://git.openjdk.org/jdk/pull/29094/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29094&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371014 Stats: 292 lines in 15 files changed: 219 ins; 18 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/29094.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29094/head:pull/29094 PR: https://git.openjdk.org/jdk/pull/29094
On Wed, 7 Jan 2026 14:14:19 GMT, Markus Grönlund <mgronlun@openjdk.org> wrote:
Alternative for solving [JDK-8371014](https://bugs.openjdk.org/browse/JDK-8371014)
Also includes a fix for [JDK-8373257](https://bugs.openjdk.org/browse/JDK-8373257)
TestEmergencyDumpAtOOM.java has passed on both, AIX and linux on PPC64. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29094#issuecomment-3719926832
On Wed, 7 Jan 2026 17:23:25 GMT, Martin Doerr <mdoerr@openjdk.org> wrote:
TestEmergencyDumpAtOOM.java has passed on both, AIX and linux on PPC64. Thanks!
Thanks Martin. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29094#issuecomment-3720022380
On Wed, 7 Jan 2026 17:45:55 GMT, Markus Grönlund <mgronlun@openjdk.org> wrote:
TestEmergencyDumpAtOOM.java has passed on both, AIX and linux on PPC64. Thanks!
TestEmergencyDumpAtOOM.java has passed on both, AIX and linux on PPC64. Thanks!
Thanks Martin.
Thanks a lot @mgronlun ! Looks good in general. Can we wait to finish `service.emit_leakprofiler_events()` in JFR recorder thread before the crash at `report_java_out_of_memory()` in debug.cpp? whether `abort()` is called before finishing to dump events by recorder thread. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29094#issuecomment-3721524743
On Thu, 8 Jan 2026 01:32:44 GMT, Yasumasa Suenaga <ysuenaga@openjdk.org> wrote:
TestEmergencyDumpAtOOM.java has passed on both, AIX and linux on PPC64. Thanks!
Thanks Martin.
Thanks a lot @mgronlun ! Looks good in general.
Can we wait to finish `service.emit_leakprofiler_events()` in JFR recorder thread before the crash at `report_java_out_of_memory()` in debug.cpp? whether `abort()` is called before finishing to dump events by recorder thread.
Thanks for your review @YaSuenag. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29094#issuecomment-3745680784
On Wed, 7 Jan 2026 17:45:55 GMT, Markus Grönlund <mgronlun@openjdk.org> wrote:
TestEmergencyDumpAtOOM.java has passed on both, AIX and linux on PPC64. Thanks!
TestEmergencyDumpAtOOM.java has passed on both, AIX and linux on PPC64. Thanks!
Thanks Martin.
Thanks a lot @mgronlun ! Looks good in general.
Can we wait to finish `service.emit_leakprofiler_events()` in JFR recorder thread before the crash at `report_java_out_of_memory()` in debug.cpp? whether `abort()` is called before finishing to dump events by recorder thread.
The solution to is avoid someone calling abort() concurrently until at least one service.emit_leakprofiler_events(); has completed. That's why the invocation is done by all threads coming into report_java_out_of_memory(), not only a cas-selected one. Why? Because it is only by taking the threads from thread state _thread_in_vm to state _thread_blocked (which we manage as part of posting the JFR msg), that a VM operation in service.emit_leakprofiler_events() can proceed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29094#issuecomment-3723611490
On Wed, 7 Jan 2026 14:14:19 GMT, Markus Grönlund <mgronlun@openjdk.org> wrote:
Alternative for solving [JDK-8371014](https://bugs.openjdk.org/browse/JDK-8371014)
Also includes a fix for [JDK-8373257](https://bugs.openjdk.org/browse/JDK-8373257)
Testing: jdk_jfr, stress testing, manual testing with CrashOnOutOfMemoryError, tier1-6
src/hotspot/share/jfr/recorder/repository/jfrEmergencyDump.cpp line 611:
609: if (thread->is_VM_thread()) { 610: const VM_Operation* const operation = VMThread::vm_operation(); 611: if (operation != nullptr && operation->type() == VM_Operation::VMOp_JFROldObject) {
Is it better/possible to directly check the rotation lock instead? Maybe it's possible the thread crashed before starting the vm operation, or the lock is held by something else. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29094#discussion_r2674002541
On Thu, 8 Jan 2026 21:42:16 GMT, Robert Toyonaga <duke@openjdk.org> wrote:
Is it better/possible to directly check the rotation lock instead? Maybe it's possible the thread crashed before starting the vm operation, or the lock is held by something else.
Lock testing is inherently racy, and would also include false negatives (i.e., say the rotation lock is currently held during a normal flush / rotation by the JFR Recorder Thread, then its perfectly fine even for the VM Thread to block waiting for it to be released). It is only the above implication that makes it impossible for the VM Thread to wait on rotation lock release. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29094#discussion_r2675762122
On Wed, 7 Jan 2026 14:14:19 GMT, Markus Grönlund <mgronlun@openjdk.org> wrote:
Alternative for solving [JDK-8371014](https://bugs.openjdk.org/browse/JDK-8371014)
Also includes a fix for [JDK-8373257](https://bugs.openjdk.org/browse/JDK-8373257)
Testing: jdk_jfr, stress testing, manual testing with CrashOnOutOfMemoryError, tier1-6
Thanks a lot for working on this! ------------- Marked as reviewed by ysuenaga (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29094#pullrequestreview-3653157591
On Wed, 7 Jan 2026 14:14:19 GMT, Markus Grönlund <mgronlun@openjdk.org> wrote:
Alternative for solving [JDK-8371014](https://bugs.openjdk.org/browse/JDK-8371014)
Also includes a fix for [JDK-8373257](https://bugs.openjdk.org/browse/JDK-8373257)
Testing: jdk_jfr, stress testing, manual testing with CrashOnOutOfMemoryError, tier1-6
This pull request has now been integrated. Changeset: f23752a7 Author: Markus Grönlund <mgronlun@openjdk.org> URL: https://git.openjdk.org/jdk/commit/f23752a75ee3d3af0853eff9c678d2496bb1cf58 Stats: 292 lines in 15 files changed: 219 ins; 18 del; 55 mod 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented Reviewed-by: ysuenaga ------------- PR: https://git.openjdk.org/jdk/pull/29094
On Wed, 7 Jan 2026 14:14:19 GMT, Markus Grönlund <mgronlun@openjdk.org> wrote:
Alternative for solving [JDK-8371014](https://bugs.openjdk.org/browse/JDK-8371014)
Also includes a fix for [JDK-8373257](https://bugs.openjdk.org/browse/JDK-8373257)
Testing: jdk_jfr, stress testing, manual testing with CrashOnOutOfMemoryError, tier1-6
Thanks for fixing and backporting it! I had taken a quick look and think it is good, but not a full review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29094#issuecomment-3749247796
participants (4)
-
Markus Grönlund
-
Martin Doerr
-
Robert Toyonaga
-
Yasumasa Suenaga