From ysuenaga at openjdk.org Thu Jan 1 09:24:59 2026 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Thu, 1 Jan 2026 09:24:59 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: On Sat, 29 Nov 2025 06:06:16 GMT, Yasumasa Suenaga wrote: > The jtreg test TestEmergencyDumpAtOOM.java runs into the following error on ppc64 platforms. > > JFR emergency dump would be kicked at `VMError::report_and_die()`, then Java thread for JFR would not work due to secondary signal handler for error reporting. > > Passed all of jdk_jfr tests on Linux AMD64. I'm still waiting for second reviewer. @mgronlun Can you take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28563#issuecomment-3703460604 From mbaesken at openjdk.org Fri Jan 2 08:48:02 2026 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 2 Jan 2026 08:48:02 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: <-Vb3RjzTqS7Lo9tMqVYqEbRQ1nYLg20mfcehWEe6Vm4=.e4dba3af-59af-42c9-b0bc-0d6e57122121@github.com> On Sat, 29 Nov 2025 06:06:16 GMT, Yasumasa Suenaga wrote: > The jtreg test TestEmergencyDumpAtOOM.java runs into the following error on ppc64 platforms. > > JFR emergency dump would be kicked at `VMError::report_and_die()`, then Java thread for JFR would not work due to secondary signal handler for error reporting. > > Passed all of jdk_jfr tests on Linux AMD64. Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28563#pullrequestreview-3622449994 From haosun at openjdk.org Fri Jan 2 09:35:58 2026 From: haosun at openjdk.org (Hao Sun) Date: Fri, 2 Jan 2026 09:35:58 GMT Subject: [jdk26] RFR: 8373122: JFR build failure with CDS disabled due to -Werror=unused-function after JDK-8365400 In-Reply-To: References: Message-ID: <2VwhhoYJeHgX4snjrhAmQsQD9kLrIIBhobUh4-M7kNo=.43508487-ee4c-44cd-b31e-fe4624b7da66@github.com> On Wed, 24 Dec 2025 03:46:49 GMT, Hao Sun wrote: > Hi all, > > This pull request contains a backport of commit [e1d81c09](https://github.com/openjdk/jdk/commit/e1d81c0946364a266a006481a8fbbac24c7e6c6a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Hao Sun on 23 Dec 2025 and was reviewed by Markus Gr?nlund, Jie Fu and Francesco Andreuzzi. > > Thanks! Hi, I would appreciate it if you could help review this backport patch again. Thanks. @fandreuz @DamonFool and @mgronlun ------------- PR Comment: https://git.openjdk.org/jdk/pull/28976#issuecomment-3704861285 From fandreuzzi at openjdk.org Fri Jan 2 11:43:55 2026 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Fri, 2 Jan 2026 11:43:55 GMT Subject: [jdk26] RFR: 8373122: JFR build failure with CDS disabled due to -Werror=unused-function after JDK-8365400 In-Reply-To: References: Message-ID: On Wed, 24 Dec 2025 03:46:49 GMT, Hao Sun wrote: > Hi all, > > This pull request contains a backport of commit [e1d81c09](https://github.com/openjdk/jdk/commit/e1d81c0946364a266a006481a8fbbac24c7e6c6a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Hao Sun on 23 Dec 2025 and was reviewed by Markus Gr?nlund, Jie Fu and Francesco Andreuzzi. > > Thanks! Marked as reviewed by fandreuzzi (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28976#pullrequestreview-3622980963 From kbarrett at openjdk.org Sat Jan 3 08:29:29 2026 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 3 Jan 2026 08:29:29 GMT Subject: RFR: 8374445: Fix -Wzero-as-null-pointer-constant warnings in JfrSet Message-ID: Please review this change to fix JfrSet to avoid triggering -Wzero-as-null-pointer-constant warnings when that warning is enabled. The old code uses an entry value with representation 0 to indicate the entry doesn't have a value. It compares an entry value against literal 0 to check for that. If the key type is a pointer type, this involves an implicit 0 => null pointer constant conversion, so we get a warning when that warning is enabled. Instead we initialize entry values to a value-initialized key, and compare against a value-initialized key. This changes the (currently undocumented) requirements on the key type. The key type is no longer required to be trivially constructible (to permit memset-based initialization), but is now required to be value-initializable. That's currently a wash, since all of the in-use key types are fundamental types (traceid (u8) and Klass*). Testing: mach5 tier1-3 (tier3 is where most jfr tests are run) ------------- Commit messages: - fix -Wzero-as-null-poniter-constant warnings in jfrSet.hpp Changes: https://git.openjdk.org/jdk/pull/29022/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29022&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374445 Stats: 10 lines in 1 file changed: 3 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/29022.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29022/head:pull/29022 PR: https://git.openjdk.org/jdk/pull/29022 From mgronlun at openjdk.org Mon Jan 5 09:05:27 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 5 Jan 2026 09:05:27 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: On Sat, 29 Nov 2025 06:06:16 GMT, Yasumasa Suenaga wrote: > The jtreg test TestEmergencyDumpAtOOM.java runs into the following error on ppc64 platforms. > > JFR emergency dump would be kicked at `VMError::report_and_die()`, then Java thread for JFR would not work due to secondary signal handler for error reporting. > > Passed all of jdk_jfr tests on Linux AMD64. This will not work because there is still a race against the JFR Recorder Thread flushing concurrently with LeakProfiler::emit_events(). This can place the checkpoints and events in a segment before the corresponding classes and methods that were tagged as part of emit_events(). This will break the parser, since constant artifacts will not be resolvable (an invariant is that a flushed segment is self-contained). ------------- PR Comment: https://git.openjdk.org/jdk/pull/28563#issuecomment-3709513877 From mgronlun at openjdk.org Mon Jan 5 09:09:03 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 5 Jan 2026 09:09:03 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: On Sat, 29 Nov 2025 06:06:16 GMT, Yasumasa Suenaga wrote: > The jtreg test TestEmergencyDumpAtOOM.java runs into the following error on ppc64 platforms. > > JFR emergency dump would be kicked at `VMError::report_and_die()`, then Java thread for JFR would not work due to secondary signal handler for error reporting. > > Passed all of jdk_jfr tests on Linux AMD64. This is a very tricky problem to solve correctly, because a VM operation has been introduced as part of error reporting and the VM shutdown sequence. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28563#issuecomment-3709521319 From mdoerr at openjdk.org Mon Jan 5 10:41:16 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 5 Jan 2026 10:41:16 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: On Sat, 29 Nov 2025 06:06:16 GMT, Yasumasa Suenaga wrote: > The jtreg test TestEmergencyDumpAtOOM.java runs into the following error on ppc64 platforms. > > JFR emergency dump would be kicked at `VMError::report_and_die()`, then Java thread for JFR would not work due to secondary signal handler for error reporting. > > Passed all of jdk_jfr tests on Linux AMD64. Would it be a better solution to avoid replacing the signal handler? We could keep the Java compatible handler and change it such that it calls `crash_handler` only for the thread which is reporting the error. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28563#issuecomment-3709875107 From mgronlun at openjdk.org Mon Jan 5 11:04:54 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 5 Jan 2026 11:04:54 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References:

Message-ID: <429t5MAqdyAQ7wFsxgYdUa2YPZm3GCI7SU0KwGDzcCQ=.46240518-0447-4e08-97d2-7522ebc1aacf@github.com> On Mon, 5 Jan 2026 10:37:51 GMT, Martin Doerr wrote: > Would it be a better solution to avoid replacing the signal handler? We could keep the Java compatible handler and change it such that it calls `crash_handler` only for the thread which is reporting the error. I am thinking about some alternatives. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28563#issuecomment-3709957825 From krk at openjdk.org Mon Jan 5 14:55:52 2026 From: krk at openjdk.org (Kerem Kat) Date: Mon, 5 Jan 2026 14:55:52 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v7] In-Reply-To: References:

Message-ID: On Thu, 18 Dec 2025 10:11:20 GMT, Thomas Stuefe wrote: >> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. >> >> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. >> >> This RFE changes the algorithm to be non-recursive. >> >> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. >> >> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. >> >> Testing: >> >> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out >> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical >> - Ran locally all jtreg tests in jdk/jfr >> - GHAs > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > do strides for arrays This could fix https://bugs.openjdk.org/browse/JDK-8371630, which I couldn't reproduce outside Oracle enviroments detailed in the ticket. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3710759355 From duke at openjdk.org Tue Jan 6 23:49:26 2026 From: duke at openjdk.org (Robert Toyonaga) Date: Tue, 6 Jan 2026 23:49:26 GMT Subject: RFR: 8367949: JFR: MethodTrace double-counts methods that catch their own exceptions In-Reply-To: References: Message-ID: On Sun, 21 Dec 2025 16:22:25 GMT, Erik Gahlin wrote: > Could I have a review of a PR that changes how the instrumentation of the MethodTrace and MethodTiming events is implemented, so they handle exceptions in a better way? > > For constructors, the current implementation is still used in certain corner cases. A proper implementation would require data-flow analysis, but for all practical purposes this code should work fine. > > Testing: jdk/jdk/jfr > > Thanks > Erik src/jdk.jfr/share/classes/jdk/jfr/internal/tracing/Transform.java line 176: > 174: } > 175: TryBlock last = tryBlocks.getLast(); > 176: if (tryBlocks.getLast().end == null) { Suggestion: if (last.end == null) { Is it important to read `tryBlocks.getLast()` again here? test/jdk/jdk/jfr/event/tracing/TestConstructors.java line 116: > 114: } > 115: try { > 116: new Zebra(true); This results in `Zebra(int)` getting traced but not `Zebra(boolean)` because an exception is thrown and the `Zebra(boolean)` constructor call [is outside the `try` block](https://github.com/openjdk/jdk/pull/28947/files#diff-68a37600bc91d54808ea1ca427ade6af8a600889877f262e20782c550eded410R160). Is this intended? Shouldn't a method be traced every time it is called? In contrast, `new Zebra(false);` causes both `Zebra(int)` and `Zebra(boolean)` to be traced. Additionally, with the old approach, `new Cat();` would not cause `Cat()` to be traced at all, since its callee, `methodThatThrows()`, prevents execution ever reaching `Cat()`'s `return` statement. I did a quick check on this by hardcoding `simplifiedInstrumentation = true`. Now, with the new approach in this PR, `new Cat();` causes `Cat()` to be traced exactly once. This makes sense to me, but is different than before. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28947#discussion_r2666613894 PR Review Comment: https://git.openjdk.org/jdk/pull/28947#discussion_r2666618915 From egahlin at openjdk.org Wed Jan 7 00:21:33 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 7 Jan 2026 00:21:33 GMT Subject: RFR: 8367949: JFR: MethodTrace double-counts methods that catch their own exceptions In-Reply-To: References:

Message-ID: <3GdoIv47UZL2mViNWedMrfbXGorNe_mDLJEVg7lJ0VQ=.cbb57a8b-eed8-411d-b83f-1f52c9f3f84c@github.com> On Tue, 6 Jan 2026 23:46:40 GMT, Robert Toyonaga wrote: >> Could I have a review of a PR that changes how the instrumentation of the MethodTrace and MethodTiming events is implemented, so they handle exceptions in a better way? >> >> For constructors, the current implementation is still used in certain corner cases. A proper implementation would require data-flow analysis, but for all practical purposes this code should work fine. >> >> Testing: jdk/jdk/jfr >> >> Thanks >> Erik > > test/jdk/jdk/jfr/event/tracing/TestConstructors.java line 116: > >> 114: } >> 115: try { >> 116: new Zebra(true); > > This results in `Zebra(int)` getting traced but not `Zebra(boolean)` because the `Zebra(int)` constructor call throws but [is outside the `try` block](https://github.com/openjdk/jdk/pull/28947/files#diff-68a37600bc91d54808ea1ca427ade6af8a600889877f262e20782c550eded410R160) so execution never reaches the `catch` block that applies tracing. Is this intended? Shouldn't a method be traced every time it is called? In contrast, `new Zebra(false);` causes both `Zebra(int)` and `Zebra(boolean)` to be traced. > > Additionally, with the old approach, `new Cat();` would not cause `Cat()` to be traced at all, since its callee, `methodThatThrows()`, prevents execution ever reaching `Cat()`'s `return` statement. I did a quick check on this by hardcoding `simplifiedInstrumentation = true`. Now, with the new approach in this PR, `new Cat();` causes `Cat()` to be traced exactly once. This makes sense to me, but is different than before. We can't place a try block around a call to super(...) or this(...). This is why two try blocks are used, one before and one after the call to this(...) or super(...). With try blocks, we can now also track when an exception occurs in a callee. This is a behavioral change, but I believe it is for the better. I was aware of this limitation when I did the initial implementation, but I didn't think it was worth the added complexity that try blocks bring. What I didn?t realize at the time was the double-count issue, so now that we have the mechanics for try blocks, I decided to fix exception in a callee as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28947#discussion_r2666661004 From haosun at openjdk.org Wed Jan 7 00:49:00 2026 From: haosun at openjdk.org (Hao Sun) Date: Wed, 7 Jan 2026 00:49:00 GMT Subject: [jdk26] RFR: 8373122: JFR build failure with CDS disabled due to -Werror=unused-function after JDK-8365400 In-Reply-To: References: Message-ID: On Wed, 24 Dec 2025 03:46:49 GMT, Hao Sun wrote: > Hi all, > > This pull request contains a backport of commit [e1d81c09](https://github.com/openjdk/jdk/commit/e1d81c0946364a266a006481a8fbbac24c7e6c6a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Hao Sun on 23 Dec 2025 and was reviewed by Markus Gr?nlund, Jie Fu and Francesco Andreuzzi. > > Thanks! I would appreciate it if someone could help review this backport patch. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28976#issuecomment-3716887426 From jiefu at openjdk.org Wed Jan 7 00:55:02 2026 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 7 Jan 2026 00:55:02 GMT Subject: [jdk26] RFR: 8373122: JFR build failure with CDS disabled due to -Werror=unused-function after JDK-8365400 In-Reply-To: References: Message-ID: On Wed, 24 Dec 2025 03:46:49 GMT, Hao Sun wrote: > Hi all, > > This pull request contains a backport of commit [e1d81c09](https://github.com/openjdk/jdk/commit/e1d81c0946364a266a006481a8fbbac24c7e6c6a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Hao Sun on 23 Dec 2025 and was reviewed by Markus Gr?nlund, Jie Fu and Francesco Andreuzzi. > > Thanks! LGTM ------------- Marked as reviewed by jiefu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28976#pullrequestreview-3632992405 From haosun at openjdk.org Wed Jan 7 01:08:48 2026 From: haosun at openjdk.org (Hao Sun) Date: Wed, 7 Jan 2026 01:08:48 GMT Subject: [jdk26] RFR: 8373122: JFR build failure with CDS disabled due to -Werror=unused-function after JDK-8365400 In-Reply-To: References:

Message-ID: <58efiiX1cRmDco5VZyabSIFcgxVYoA30NAUNkfqtmi8=.928a57dd-f690-4b90-85a2-a9e18e459767@github.com> On Fri, 2 Jan 2026 11:40:12 GMT, Francesco Andreuzzi wrote: >> Hi all, >> >> This pull request contains a backport of commit [e1d81c09](https://github.com/openjdk/jdk/commit/e1d81c0946364a266a006481a8fbbac24c7e6c6a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Hao Sun on 23 Dec 2025 and was reviewed by Markus Gr?nlund, Jie Fu and Francesco Andreuzzi. >> >> Thanks! > > Marked as reviewed by fandreuzzi (Committer). Thanks a lot for your reviews. @fandreuz @DamonFool ------------- PR Comment: https://git.openjdk.org/jdk/pull/28976#issuecomment-3716921157 From haosun at openjdk.org Wed Jan 7 01:08:50 2026 From: haosun at openjdk.org (Hao Sun) Date: Wed, 7 Jan 2026 01:08:50 GMT Subject: [jdk26] Integrated: 8373122: JFR build failure with CDS disabled due to -Werror=unused-function after JDK-8365400 In-Reply-To: References: Message-ID: On Wed, 24 Dec 2025 03:46:49 GMT, Hao Sun wrote: > Hi all, > > This pull request contains a backport of commit [e1d81c09](https://github.com/openjdk/jdk/commit/e1d81c0946364a266a006481a8fbbac24c7e6c6a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Hao Sun on 23 Dec 2025 and was reviewed by Markus Gr?nlund, Jie Fu and Francesco Andreuzzi. > > Thanks! This pull request has now been integrated. Changeset: 3103fa08 Author: Hao Sun URL: https://git.openjdk.org/jdk/commit/3103fa08bba95ec2c60458d1c5f128243e5ff5bc Stats: 28 lines in 1 file changed: 14 ins; 14 del; 0 mod 8373122: JFR build failure with CDS disabled due to -Werror=unused-function after JDK-8365400 Reviewed-by: fandreuzzi, jiefu Backport-of: e1d81c0946364a266a006481a8fbbac24c7e6c6a ------------- PR: https://git.openjdk.org/jdk/pull/28976 From ozanctn at amazon.com Wed Jan 7 09:37:43 2026 From: ozanctn at amazon.com (Cetin, Ozan) Date: Wed, 7 Jan 2026 09:37:43 +0000 Subject: [jdk21] JDK-8337994 REDO backport failure analysis - Missing prerequisite changes from JDK-8316241 Message-ID: Hi, I've been investigating the test failures that caused JDK-8346108 (the revert of JDK-8337994 REDO in JDK21). This is related to the native memory leak when not recording any JFR events (JDK-8335121). Summary Based on our investigation, we believe the JDK-8337994 (REDO) backport to JDK21 failed because it appears to depend on API changes introduced in the original JDK-8316241 fix that were never backported to JDK21. Our theory is that the REDO fix assumes the existence of infrastructure that only exists in later mainline releases. Root Cause Analysis The Missing Prerequisite The original JDK-8316241 fix (commit b2a39c576706622b624314c89fa6d10d0b422f86) introduced several key changes to jfrTypeSetUtils.hpp/.cpp: 1. API Change: should_do_loader_klass(const Klass* k) ? should_do_cld_klass(const Klass* k, bool leakp) 2. New Data Structure: Added _klass_loader_leakp_set for separate tracking of leakp (leak profiler) path klasses 3. New Function: get_cld_klass(CldPtr cld, bool leakp) in jfrTypeSet.cpp that properly enqueues CLD klasses via JfrTraceId::load() What Happens Without These Changes The REDO fix attempts to use get_cld_klass() which calls should_do_cld_klass(klass, leakp), but in the JDK21 backport: * JDK21 still has the old API: should_do_loader_klass(const Klass* k) (no leakp parameter) * JDK21 lacks _klass_loader_leakp_set for separate tracking * The get_cld_klass() function doesn't exist in the JDK21 codebase This causes the assert(IS_SERIALIZED(class_loader_klass)) to fail in write_cld() because the CLD's class_loader_klass is never properly enqueued for serialization during the leakp path. Test Failure Mechanism (TestChunkIntegrity.java) 1. TestClassLoader loads MyClass 2. Event commits with clazz = MyClass 3. JFR rotation writes MyClass to chunk 4. MyClass's CLD references TestClassLoader Klass 5. BUG: TestClassLoader Klass not serialized (leakp path broken) 6. Chunk written with broken reference 7. In slowdebug: assert(IS_SERIALIZED(class_loader_klass)) fails 8. In release: "Events don't match" when comparing chunks The Fix I've been able to get a local jdk21 build passing all tests (including slowdebug) by backporting JDK-8316241 and resolving the resulting conflicts. The key changes are: 1. jfrTypeSetUtils.hpp // OLD (JDK21 current) bool should_do_loader_klass(const Klass* k); // NEW (with leakp support) bool should_do_cld_klass(const Klass* k, bool leakp); 2. jfrTypeSetUtils.cpp // Added _klass_loader_leakp_set member GrowableArray* _klass_loader_leakp_set; // Updated implementation bool JfrArtifactSet::should_do_cld_klass(const Klass* k, bool leakp) { assert(k != nullptr, "invariant"); assert(_klass_loader_set != nullptr, "invariant"); assert(_klass_loader_leakp_set != nullptr, "invariant"); return not_in_set(leakp ? _klass_loader_leakp_set : _klass_loader_set, k); } 3. jfrTypeSet.cpp - Added get_cld_klass() static inline KlassPtr get_cld_klass(CldPtr cld, bool leakp) { if (cld == nullptr) { return nullptr; } assert(leakp ? IS_LEAKP(cld) : used(cld), "invariant"); KlassPtr cld_klass = cld->class_loader_klass(); if (cld_klass == nullptr) { return nullptr; } if (should_do_cld_klass(cld_klass, leakp)) { if (current_epoch()) { // KEY FIX: Enqueue the klass for serialization JfrTraceId::load(cld_klass); } else { artifact_tag(cld_klass, leakp); } return cld_klass; } return nullptr; } Proposed Action Based on this, it appears that backporting JDK-8337994 (REDO) alone may not be sufficient, and that some or all the prerequisite infrastructure changes from JDK-8316241 may also need to be backported. Additionally, there may be other upstream commits (such as 8323631) in JDK24 that were made on top of JDK-8316241 that could also be required for the fix to not cause other possible errors. We would appreciate guidance on identifying any additional changes that might need to be included in the backport. If this direction makes sense, I'm happy to prepare a proper patch for review. References * JDK-8335121: Native memory leak when JFR is enabled but no events are emitted * JDK-8316241: Test jdk/jdk/jfr/jvm/TestChunkIntegrity.java failed (original fix) * JDK-8337994: [REDO] Native memory leak when not recording any events * JDK-8346108: Revert of REDO in JDK21u due to test failures Best Regards, Ozan -------------- next part -------------- An HTML attachment was scrubbed... URL: From egahlin at openjdk.org Wed Jan 7 10:08:11 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 7 Jan 2026 10:08:11 GMT Subject: RFR: 8367949: JFR: MethodTrace double-counts methods that catch their own exceptions [v2] In-Reply-To: References: Message-ID: > Could I have a review of a PR that changes how the instrumentation of the MethodTrace and MethodTiming events is implemented, so they handle exceptions in a better way? > > For constructors, the current implementation is still used in certain corner cases. A proper implementation would require data-flow analysis, but for all practical purposes this code should work fine. > > Testing: jdk/jdk/jfr > > Thanks > Erik Erik Gahlin has updated the pull request incrementally with one additional commit since the last revision: Formatting + reuse of local variable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28947/files - new: https://git.openjdk.org/jdk/pull/28947/files/6b2473dc..f97e2ad3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28947&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28947&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28947/head:pull/28947 PR: https://git.openjdk.org/jdk/pull/28947 From fandreuzzi at openjdk.org Wed Jan 7 12:42:14 2026 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Wed, 7 Jan 2026 12:42:14 GMT Subject: RFR: 8374713: PredicatedConcurrentWriteOp is unused Message-ID: Trivial removal of `PredicatedConcurrentWriteOp`, which is not used anymore after [8284161](https://bugs.openjdk.org/browse/JDK-8284161). ------------- Commit messages: - nn Changes: https://git.openjdk.org/jdk/pull/29088/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29088&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374713 Stats: 12 lines in 1 file changed: 0 ins; 12 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29088.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29088/head:pull/29088 PR: https://git.openjdk.org/jdk/pull/29088 From duke at openjdk.org Wed Jan 7 14:08:30 2026 From: duke at openjdk.org (Robert Toyonaga) Date: Wed, 7 Jan 2026 14:08:30 GMT Subject: RFR: 8367949: JFR: MethodTrace double-counts methods that catch their own exceptions [v2] In-Reply-To: References:

<3GdoIv47UZL2mViNWedMrfbXGorNe_mDLJEVg7lJ0VQ=.cbb57a8b-eed8-411d-b83f-1f52c9f3f84c@github.com> Message-ID: On Wed, 7 Jan 2026 00:15:10 GMT, Erik Gahlin wrote: >> test/jdk/jdk/jfr/event/tracing/TestConstructors.java line 116: >> >>> 114: } >>> 115: try { >>> 116: new Zebra(true); >> >> This results in `Zebra(int)` getting traced but not `Zebra(boolean)` because the `Zebra(int)` constructor call throws but [is outside the `try` block](https://github.com/openjdk/jdk/pull/28947/files#diff-68a37600bc91d54808ea1ca427ade6af8a600889877f262e20782c550eded410R160) so execution never reaches the `catch` block that applies tracing. Is this intended? Shouldn't a method be traced every time it is called? In contrast, `new Zebra(false);` causes both `Zebra(int)` and `Zebra(boolean)` to be traced. >> >> Additionally, with the old approach, `new Cat();` would not cause `Cat()` to be traced at all, since its callee, `methodThatThrows()`, prevents execution ever reaching `Cat()`'s `return` statement. I did a quick check on this by hardcoding `simplifiedInstrumentation = true`. Now, with the new approach in this PR, `new Cat();` causes `Cat()` to be traced exactly once. This makes sense to me, but is different than before. > > We can't place a try block around a call to super(...) or this(...). This is why two try blocks are used, one before and one after the call to this(...) or super(...). > > With try blocks, we can now also track when an exception occurs in a callee. This is a behavioral change, but I believe it is for the better. I was aware of this limitation when I did the initial implementation, but I didn't think it was worth the added complexity that try blocks bring. What I didn?t realize at the time was the double-count issue, so now that we have the mechanics for try blocks, I fixed exception in a callee as well. Okay this makes sense to me. Thank you for your explanation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28947#discussion_r2668595080 From mgronlun at openjdk.org Wed Jan 7 14:23:52 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 7 Jan 2026 14:23:52 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: On Sat, 29 Nov 2025 06:06:16 GMT, Yasumasa Suenaga wrote: > The jtreg test TestEmergencyDumpAtOOM.java runs into the following error on ppc64 platforms. > > JFR emergency dump would be kicked at `VMError::report_and_die()`, then Java thread for JFR would not work due to secondary signal handler for error reporting. > > Passed all of jdk_jfr tests on Linux AMD64. Alternative implementation suggestion PR (in draft state) https://github.com/openjdk/jdk/pull/29094 Includes also a solution to [JDK-8373257](https://bugs.openjdk.org/browse/JDK-8373257) @tstuefe Please take a look, and also if you can, submit for testing on your platforms. Markus ------------- PR Comment: https://git.openjdk.org/jdk/pull/28563#issuecomment-3719105625 From mgronlun at openjdk.org Wed Jan 7 17:31:12 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 7 Jan 2026 17:31:12 GMT Subject: RFR: 8367949: JFR: MethodTrace double-counts methods that catch their own exceptions [v2] In-Reply-To: References:

Message-ID: On Wed, 7 Jan 2026 10:08:11 GMT, Erik Gahlin wrote: >> Could I have a review of a PR that changes how the instrumentation of the MethodTrace and MethodTiming events is implemented, so they handle exceptions in a better way? >> >> For constructors, the current implementation is still used in certain corner cases. A proper implementation would require data-flow analysis, but for all practical purposes this code should work fine. >> >> Testing: jdk/jdk/jfr >> >> Thanks >> Erik > > Erik Gahlin has updated the pull request incrementally with one additional commit since the last revision: > > Formatting + reuse of local variable Marked as reviewed by mgronlun (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28947#pullrequestreview-3635976864 From mgronlun at openjdk.org Wed Jan 7 17:38:46 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 7 Jan 2026 17:38:46 GMT Subject: RFR: 8374445: Fix -Wzero-as-null-pointer-constant warnings in JfrSet In-Reply-To: References: Message-ID: On Sat, 3 Jan 2026 08:21:15 GMT, Kim Barrett wrote: > Please review this change to fix JfrSet to avoid triggering > -Wzero-as-null-pointer-constant warnings when that warning is enabled. > > The old code uses an entry value with representation 0 to indicate the entry > doesn't have a value. It compares an entry value against literal 0 to check > for that. If the key type is a pointer type, this involves an implicit 0 => > null pointer constant conversion, so we get a warning when that warning is > enabled. > > Instead we initialize entry values to a value-initialized key, and compare > against a value-initialized key. This changes the (currently undocumented) > requirements on the key type. The key type is no longer required to be > trivially constructible (to permit memset-based initialization), but is now > required to be value-initializable. That's currently a wash, since all of the > in-use key types are fundamental types (traceid (u8) and Klass*). > > Testing: mach5 tier1-3 (tier3 is where most jfr tests are run) Will review this later Kim, sorry for the delay (26 stuff). ------------- PR Comment: https://git.openjdk.org/jdk/pull/29022#issuecomment-3719971111 From mgronlun at openjdk.org Wed Jan 7 17:45:22 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 7 Jan 2026 17:45:22 GMT Subject: RFR: 8374713: PredicatedConcurrentWriteOp is unused In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 12:34:33 GMT, Francesco Andreuzzi wrote: > Trivial removal of `PredicatedConcurrentWriteOp`, which is not used anymore after [8284161](https://bugs.openjdk.org/browse/JDK-8284161). I would prefer to keep this building block so I don't have to devise it again, when / if needed next time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29088#issuecomment-3720004639 From ysuenaga at openjdk.org Thu Jan 8 01:38:00 2026 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Thu, 8 Jan 2026 01:38:00 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References:

Message-ID: On Wed, 7 Jan 2026 17:41:51 GMT, Markus Gr?nlund wrote: > I would prefer to keep this building block so I don't have to devise it again, when / if needed next time. Sure, I'll close this PR then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29088#issuecomment-3722778900 From fandreuzzi at openjdk.org Thu Jan 8 08:34:07 2026 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Thu, 8 Jan 2026 08:34:07 GMT Subject: Withdrawn: 8374713: PredicatedConcurrentWriteOp is unused In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 12:34:33 GMT, Francesco Andreuzzi wrote: > Trivial removal of `PredicatedConcurrentWriteOp`, which is not used anymore after [8284161](https://bugs.openjdk.org/browse/JDK-8284161). This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/29088 From mgronlun at openjdk.org Thu Jan 8 09:47:20 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 8 Jan 2026 09:47:20 GMT Subject: RFR: 8374713: PredicatedConcurrentWriteOp is unused In-Reply-To: References:

Message-ID: <2RGUAJjq1XAngtw2vGN9xRo6XDW8DbNq5xmr6KGij3A=.538c42fc-f2b4-4c52-963e-aea497163cb5@github.com> On Thu, 8 Jan 2026 08:30:37 GMT, Francesco Andreuzzi wrote: > > I would prefer to keep this building block so I don't have to devise it again, when / if needed next time. > > Sure, I'll close this PR then. Thanks @fandreuz ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29088#issuecomment-3723052062 From egahlin at openjdk.org Thu Jan 8 11:26:54 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Thu, 8 Jan 2026 11:26:54 GMT Subject: RFR: 8367949: JFR: MethodTrace double-counts methods that catch their own exceptions [v3] In-Reply-To: References: Message-ID: > Could I have a review of a PR that changes how the instrumentation of the MethodTrace and MethodTiming events is implemented, so they handle exceptions in a better way? > > For constructors, the current implementation is still used in certain corner cases. A proper implementation would require data-flow analysis, but for all practical purposes this code should work fine. > > Testing: jdk/jdk/jfr > > Thanks > Erik Erik Gahlin has updated the pull request incrementally with one additional commit since the last revision: Use simplified instrumentation for java.lang.Object:: ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28947/files - new: https://git.openjdk.org/jdk/pull/28947/files/f97e2ad3..4897a25e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28947&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28947&range=01-02 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28947/head:pull/28947 PR: https://git.openjdk.org/jdk/pull/28947 From mgronlun at openjdk.org Thu Jan 8 12:24:06 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 8 Jan 2026 12:24:06 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References:

Message-ID: On Wed, 7 Jan 2026 14:20:34 GMT, Markus Gr?nlund wrote: >> The jtreg test TestEmergencyDumpAtOOM.java runs into the following error on ppc64 platforms. >> >> JFR emergency dump would be kicked at `VMError::report_and_die()`, then Java thread for JFR would not work due to secondary signal handler for error reporting. >> >> Passed all of jdk_jfr tests on Linux AMD64. > > Alternative implementation suggestion PR (in draft state) https://github.com/openjdk/jdk/pull/29094 > > Includes also a solution to [JDK-8373257](https://bugs.openjdk.org/browse/JDK-8373257) > @tstuefe > > Please take a look, and also if you can, submit for testing on your platforms. > > Markus > Thanks a lot @mgronlun ! I think JDK-8371014 (and JDK-8373257) should be tackled in #29094 . So should I close this PR? Yes, I think we should do it in https://github.com/openjdk/jdk/pull/29094. You can close this one, and I will officially publish https://github.com/openjdk/jdk/pull/29094. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28563#issuecomment-3723622934 From mgronlun at openjdk.org Thu Jan 8 12:29:48 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 8 Jan 2026 12:29:48 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented Message-ID: Alternative for solving [JDK-8371014](https://bugs.openjdk.org/browse/JDK-8371014) Also includes a fix for [JDK-8373257](https://bugs.openjdk.org/browse/JDK-8373257) ------------- Commit messages: - reordering - update comment - is_recording() conditional - remove tautology - 8371014 Changes: https://git.openjdk.org/jdk/pull/29094/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29094&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371014 Stats: 292 lines in 15 files changed: 219 ins; 18 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/29094.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29094/head:pull/29094 PR: https://git.openjdk.org/jdk/pull/29094 From mdoerr at openjdk.org Thu Jan 8 12:29:49 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 8 Jan 2026 12:29:49 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: <8LD4JmIZnVSwmhLeVZROok-0h-nCD1TxlaSRHe586-E=.99a49bfa-444f-4ddf-b206-0a75fe1dad23@github.com> On Wed, 7 Jan 2026 14:14:19 GMT, Markus Gr?nlund wrote: > Alternative for solving [JDK-8371014](https://bugs.openjdk.org/browse/JDK-8371014) > > Also includes a fix for [JDK-8373257](https://bugs.openjdk.org/browse/JDK-8373257) TestEmergencyDumpAtOOM.java has passed on both, AIX and linux on PPC64. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29094#issuecomment-3719926832 From mgronlun at openjdk.org Thu Jan 8 12:29:51 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 8 Jan 2026 12:29:51 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: <8LD4JmIZnVSwmhLeVZROok-0h-nCD1TxlaSRHe586-E=.99a49bfa-444f-4ddf-b206-0a75fe1dad23@github.com> References: <8LD4JmIZnVSwmhLeVZROok-0h-nCD1TxlaSRHe586-E=.99a49bfa-444f-4ddf-b206-0a75fe1dad23@github.com> Message-ID: On Wed, 7 Jan 2026 17:23:25 GMT, Martin Doerr wrote: > TestEmergencyDumpAtOOM.java has passed on both, AIX and linux on PPC64. Thanks! Thanks Martin. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29094#issuecomment-3720022380 From ysuenaga at openjdk.org Thu Jan 8 12:29:52 2026 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Thu, 8 Jan 2026 12:29:52 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: <8LD4JmIZnVSwmhLeVZROok-0h-nCD1TxlaSRHe586-E=.99a49bfa-444f-4ddf-b206-0a75fe1dad23@github.com> Message-ID: On Wed, 7 Jan 2026 17:45:55 GMT, Markus Gr?nlund wrote: >> TestEmergencyDumpAtOOM.java has passed on both, AIX and linux on PPC64. Thanks! > >> TestEmergencyDumpAtOOM.java has passed on both, AIX and linux on PPC64. Thanks! > > Thanks Martin. Thanks a lot @mgronlun ! Looks good in general. Can we wait to finish `service.emit_leakprofiler_events()` in JFR recorder thread before the crash at `report_java_out_of_memory()` in debug.cpp? whether `abort()` is called before finishing to dump events by recorder thread. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29094#issuecomment-3721524743 From mgronlun at openjdk.org Thu Jan 8 12:29:53 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 8 Jan 2026 12:29:53 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: <8LD4JmIZnVSwmhLeVZROok-0h-nCD1TxlaSRHe586-E=.99a49bfa-444f-4ddf-b206-0a75fe1dad23@github.com> Message-ID: On Wed, 7 Jan 2026 17:45:55 GMT, Markus Gr?nlund wrote: >> TestEmergencyDumpAtOOM.java has passed on both, AIX and linux on PPC64. Thanks! > >> TestEmergencyDumpAtOOM.java has passed on both, AIX and linux on PPC64. Thanks! > > Thanks Martin. > Thanks a lot @mgronlun ! Looks good in general. > > Can we wait to finish `service.emit_leakprofiler_events()` in JFR recorder thread before the crash at `report_java_out_of_memory()` in debug.cpp? whether `abort()` is called before finishing to dump events by recorder thread. The solution to is avoid someone calling abort() concurrently until at least one service.emit_leakprofiler_events(); has completed. That's why the invocation is done by all threads coming into report_java_out_of_memory(), not only a cas-selected one. Why? Because it is only by taking the threads from thread state _thread_in_vm to state _thread_blocked (which we manage as part of posting the JFR msg), that a VM operation in service.emit_leakprofiler_events() can proceed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29094#issuecomment-3723611490 From mgronlun at openjdk.org Thu Jan 8 13:47:06 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 8 Jan 2026 13:47:06 GMT Subject: RFR: 8367949: JFR: MethodTrace double-counts methods that catch their own exceptions [v3] In-Reply-To: References:

Message-ID: On Thu, 8 Jan 2026 11:26:54 GMT, Erik Gahlin wrote: >> Could I have a review of a PR that changes how the instrumentation of the MethodTrace and MethodTiming events is implemented, so they handle exceptions in a better way? >> >> For constructors, the current implementation is still used in certain corner cases. A proper implementation would require data-flow analysis, but for all practical purposes this code should work fine. >> >> Testing: jdk/jdk/jfr >> >> Thanks >> Erik > > Erik Gahlin has updated the pull request incrementally with one additional commit since the last revision: > > Use simplified instrumentation for java.lang.Object:: Marked as reviewed by mgronlun (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28947#pullrequestreview-3639493888 From egahlin at openjdk.org Thu Jan 8 16:38:54 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Thu, 8 Jan 2026 16:38:54 GMT Subject: Integrated: 8367949: JFR: MethodTrace double-counts methods that catch their own exceptions In-Reply-To: References: Message-ID: On Sun, 21 Dec 2025 16:22:25 GMT, Erik Gahlin wrote: > Could I have a review of a PR that changes how the instrumentation of the MethodTrace and MethodTiming events is implemented, so they handle exceptions in a better way? > > For constructors, the current implementation is still used in certain corner cases. A proper implementation would require data-flow analysis, but for all practical purposes this code should work fine. > > Testing: jdk/jdk/jfr > > Thanks > Erik This pull request has now been integrated. Changeset: fa2eb626 Author: Erik Gahlin URL: https://git.openjdk.org/jdk/commit/fa2eb626478806dc64fe03d8729f53f7ed26a172 Stats: 363 lines in 5 files changed: 333 ins; 1 del; 29 mod 8367949: JFR: MethodTrace double-counts methods that catch their own exceptions Reviewed-by: mgronlun ------------- PR: https://git.openjdk.org/jdk/pull/28947 From egahlin at openjdk.org Thu Jan 8 17:20:55 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Thu, 8 Jan 2026 17:20:55 GMT Subject: RFR: 8372321: TestBackToBackSensitive fails intermittently after JDK-8365972 Message-ID: Could I have a review of a PR that attempts to harden a test? Sometimes, ClassLoaderStatistics events are dropped, probably due to a bug in the RecordingStream class when starting multiple recordings simultaneously. This is not a bug related to back-to-back chunks, so I decided to use an EventFileStream instead. I also use TestClassLoader for verification purposes. Using PlatformClassLoader shouldn't be a problem, but it seems more prudent to have an actual object/class on the heap for the class loader that needs to be checked. Testing: jdk/jdk/jfr Thanks Erik ------------- Commit messages: - Use EventStream - Remove empty line - Initial Changes: https://git.openjdk.org/jdk/pull/29117/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29117&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372321 Stats: 47 lines in 1 file changed: 21 ins; 14 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/29117.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29117/head:pull/29117 PR: https://git.openjdk.org/jdk/pull/29117 From mgronlun at openjdk.org Thu Jan 8 19:01:30 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 8 Jan 2026 19:01:30 GMT Subject: RFR: 8372321: TestBackToBackSensitive fails intermittently after JDK-8365972 In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 14:28:54 GMT, Erik Gahlin wrote: > Could I have a review of a PR that attempts to harden a test? > > Sometimes, ClassLoaderStatistics events are dropped, probably due to a bug in the RecordingStream class when starting multiple recordings simultaneously. This is not a bug related to back-to-back chunks, so I decided to use an EventFileStream instead. > > I also use TestClassLoader for verification purposes. Using PlatformClassLoader shouldn't be a problem, but it seems more prudent to have an actual object/class on the heap for the class loader that needs to be checked. > > Testing: jdk/jdk/jfr > > Thanks > Erik Lets try this. ------------- Marked as reviewed by mgronlun (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29117#pullrequestreview-3640803287 From duke at openjdk.org Thu Jan 8 21:48:11 2026 From: duke at openjdk.org (Robert Toyonaga) Date: Thu, 8 Jan 2026 21:48:11 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 14:14:19 GMT, Markus Gr?nlund wrote: > Alternative for solving [JDK-8371014](https://bugs.openjdk.org/browse/JDK-8371014) > > Also includes a fix for [JDK-8373257](https://bugs.openjdk.org/browse/JDK-8373257) > > Testing: jdk_jfr, stress testing, manual testing with CrashOnOutOfMemoryError, tier1-6 src/hotspot/share/jfr/recorder/repository/jfrEmergencyDump.cpp line 611: > 609: if (thread->is_VM_thread()) { > 610: const VM_Operation* const operation = VMThread::vm_operation(); > 611: if (operation != nullptr && operation->type() == VM_Operation::VMOp_JFROldObject) { Is it better/possible to directly check the rotation lock instead? Maybe it's possible the thread crashed before starting the vm operation, or the lock is held by something else. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29094#discussion_r2674002541 From ysuenaga at openjdk.org Fri Jan 9 01:19:25 2026 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Fri, 9 Jan 2026 01:19:25 GMT Subject: Withdrawn: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: On Sat, 29 Nov 2025 06:06:16 GMT, Yasumasa Suenaga wrote: > The jtreg test TestEmergencyDumpAtOOM.java runs into the following error on ppc64 platforms. > > JFR emergency dump would be kicked at `VMError::report_and_die()`, then Java thread for JFR would not work due to secondary signal handler for error reporting. > > Passed all of jdk_jfr tests on Linux AMD64. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/28563 From mbaesken at openjdk.org Fri Jan 9 08:09:05 2026 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 9 Jan 2026 08:09:05 GMT Subject: RFR: 8372321: TestBackToBackSensitive fails intermittently after JDK-8365972 In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 14:28:54 GMT, Erik Gahlin wrote: > Could I have a review of a PR that attempts to harden a test? > > Sometimes, ClassLoaderStatistics events are dropped, probably due to a bug in the RecordingStream class when starting multiple recordings simultaneously. This is not a bug related to back-to-back chunks, so I decided to use an EventFileStream instead. > > I also use TestClassLoader for verification purposes. Using PlatformClassLoader shouldn't be a problem, but it seems more prudent to have an actual object/class on the heap for the class loader that needs to be checked. > > Testing: jdk/jdk/jfr > > Thanks > Erik If you want , I can put the change into our CI and let it there for a few days, to check if the errors we faced are gone with the change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29117#issuecomment-3727642942 From mgronlun at openjdk.org Fri Jan 9 11:07:17 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 9 Jan 2026 11:07:17 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References:

Message-ID: On Thu, 8 Jan 2026 21:42:16 GMT, Robert Toyonaga wrote: > Is it better/possible to directly check the rotation lock instead? Maybe it's possible the thread crashed before starting the vm operation, or the lock is held by something else. Lock testing is inherently racy, and would also include false negatives (i.e., say the rotation lock is currently held during a normal flush / rotation by the JFR Recorder Thread, then its perfectly fine even for the VM Thread to block waiting for it to be released). It is only the above implication that makes it impossible for the VM Thread to wait on rotation lock release. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29094#discussion_r2675762122 From mgronlun at openjdk.org Sun Jan 11 12:45:02 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 11 Jan 2026 12:45:02 GMT Subject: RFR: 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant Message-ID: Greetings, When sampling threads in state _thread_in_native, there is a missing memory barrier when UseSystemMemoryBarrier is used, because it must be emitted manually. Testing: jdk_jfr Thanks Markus PS "threads_lock" local variable was renamed to "lock" not to confuse with the global Threads_lock. ------------- Commit messages: - 8373485 Changes: https://git.openjdk.org/jdk/pull/29155/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29155&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373485 Stats: 11 lines in 1 file changed: 7 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/29155.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29155/head:pull/29155 PR: https://git.openjdk.org/jdk/pull/29155 From egahlin at openjdk.org Sun Jan 11 16:18:52 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Sun, 11 Jan 2026 16:18:52 GMT Subject: RFR: 8372321: TestBackToBackSensitive fails intermittently after JDK-8365972 In-Reply-To: References:

Message-ID: On Fri, 9 Jan 2026 08:06:47 GMT, Matthias Baesken wrote: > If you want , I can put the change into our CI and let it there for a few days, to check if the errors we faced are gone with the change. We have been able to reproduce this in our CI as well. Before the fix, I got about 10 failures in 1000 runs. After the fix, there were zero failures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29117#issuecomment-3734920165 From fabrice.bibonne at courriel.eco Sun Jan 11 18:23:04 2026 From: fabrice.bibonne at courriel.eco (Fabrice Bibonne) Date: Sun, 11 Jan 2026 19:23:04 +0100 Subject: Using JFR both with ZGC degrades application throughput Message-ID: Hi all, I would like to report a case where starting jfr for an application running with zgc causes a significant throughput degradation (compared to when JFR is not started). My context : I was writing a little web app to illustrate a case where the use of ZGC gives a better throughput than with G1. I benchmarked with grafana k6 my application running with G1 and my application running with ZGC : the runs with ZGC gave better throughputs. I wanted to go a bit further in explanation so I began again my benchmarks with JFR to be able to illustrate GC gains in JMC. When I ran my web app with ZGC+JFR, I noticed a significant throughput degradation in my benchmark (which was not the case with G1+JFR). Although I did not measure an increase in overhead as such, I still wanted to report this issue because the degradation in throughput with JFR is such that it would not be usable as is on a production service. I wrote a little application (not a web one) to reproduce the problem : the application calls a little conversion service 200 times with random numbers in parallel (to be like a web app in charge and to pressure GC). The conversion service (a method named `convertNumberToWords`) convert the number in a String looking for the String in a Map with the number as th key. In order to instantiate and destroy many objects at each call, the map is built parsing a huge String at each call. Application ends after 200 calls. Here are the step to reproduce : 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact (be aware to be on branch jfr+zgc_impact) 2. Compile it (you must include numbers200k.zip in resources : it contains a 36 Mo text files whose contents are used to create the huge String variable) 3. in the root of repository : 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR` 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -XX:StartFlightRecording -classpath target/classes poc.java.perf.write.TestPerf #ZGC with JFR` 4. The real time of the second run (with JFR) will be considerably higher than that of the first I ran these tests on my laptop : - Dell Inc. Latitude 5591 - openSUSE Tumbleweed 20260108 - Kernel : 6.18.3-1-default (64-bit) - 12 ? Intel(R) Core(tm) i7-8850H CPU @ 2.60GHz - RAM 16 Gio - openjdk version "25.0.1" 2025-10-21 - OpenJDK Runtime Environment (build 25.0.1+8-27) - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) - many tabs opened in firefox ! I also ran it in a container (eclipse-temurin:25) on my laptop and with a windows laptop and came to the same conclusions : here are the measurements from the container : | Run with | Real time (s) | |-----------|---------------| | ZGC alone | 7.473 | | ZGC + jfr | 25.075 | | G1 alone | 10.195 | | G1 + jfr | 10.450 | After all these tests I tried to run the app with an other profiler tool in order to understand where is the issue. I join the flamegraph when running jfr+zgc : for the worker threads of the ForkJoinPool of Stream, stack traces of a majority of samples have the same top lines : - PosixSemaphore::wait - ZPageAllocator::alloc_page_stall - ZPageAllocator::alloc_page_inner - ZPageAllocator::alloc_page So many thread seem to spent their time waiting in the method ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic tasks threads has also a few samples where it waits at ZPageAllocator::alloc_page_stall. I hope this will help you to find the issue. Thank you very much for reading this email until the end. I hope this is the good place for such a feedback. Let me know if I must report my problem elsewhere. Be free to ask me more questions if you need. Thank you all for this amazing tool ! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: withJfr.html Type: application/octet-stream Size: 76123 bytes Desc: not available URL: From shade at openjdk.org Mon Jan 12 09:46:40 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Jan 2026 09:46:40 GMT Subject: RFR: 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant In-Reply-To: References: Message-ID: On Sun, 11 Jan 2026 12:39:06 GMT, Markus Gr?nlund wrote: > Greetings, > > When sampling threads in state _thread_in_native, there is a missing memory barrier when UseSystemMemoryBarrier is used, because it must be emitted manually. > > Testing: jdk_jfr > > Thanks > Markus > > PS "threads_lock" local variable was renamed to "lock" not to confuse with the global Threads_lock. Man, this is confusing. Looks to me thread states are guarded specially. Looking at `Handshake::execute`, I see the pattern is: // Separate the arming of the poll in add_operation() above from // the read of JavaThread state in the try_process() call below. if (UseSystemMemoryBarrier) { SystemMemoryBarrier::emit(); } else { OrderAccess::fence(); } This follows `HandshakeState::add_operation` -> `SafepointMechanism::arm_local_poll_release`. `arm_local_poll_release` is what `JfrSampleThread::sample_native_thread` also does. So, the fix should follow what `Handshake` does. I think you are trying to do the same, but piggy-back on `OA::fence()` already done in `JfrMutexTryLock` when `-UseSystemMemoryBarrier`? ------------- PR Review: https://git.openjdk.org/jdk/pull/29155#pullrequestreview-3649901629 From erik.gahlin at oracle.com Mon Jan 12 09:56:06 2026 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Mon, 12 Jan 2026 09:56:06 +0000 Subject: Using JFR both with ZGC degrades application throughput In-Reply-To: References: Message-ID: Hi Fabrice, Thanks for reporting! Could you post the source code for the reproducer here? The 36 MB file could probably be replaced with a String::repeat expression. JFR does use some memory, which could impact available heap and performance, although the degradation you?re seeing seems awfully high. Thanks Erik ________________________________________ From: hotspot-jfr-dev on behalf of Fabrice Bibonne Sent: Sunday, January 11, 2026 7:23 PM To: hotspot-jfr-dev at openjdk.org Subject: Using JFR both with ZGC degrades application throughput Hi all, I would like to report a case where starting jfr for an application running with zgc causes a significant throughput degradation (compared to when JFR is not started). My context : I was writing a little web app to illustrate a case where the use of ZGC gives a better throughput than with G1. I benchmarked with grafana k6 my application running with G1 and my application running with ZGC : the runs with ZGC gave better throughputs. I wanted to go a bit further in explanation so I began again my benchmarks with JFR to be able to illustrate GC gains in JMC. When I ran my web app with ZGC+JFR, I noticed a significant throughput degradation in my benchmark (which was not the case with G1+JFR). Although I did not measure an increase in overhead as such, I still wanted to report this issue because the degradation in throughput with JFR is such that it would not be usable as is on a production service. I wrote a little application (not a web one) to reproduce the problem : the application calls a little conversion service 200 times with random numbers in parallel (to be like a web app in charge and to pressure GC). The conversion service (a method named `convertNumberToWords`) convert the number in a String looking for the String in a Map with the number as th key. In order to instantiate and destroy many objects at each call, the map is built parsing a huge String at each call. Application ends after 200 calls. Here are the step to reproduce : 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact (be aware to be on branch jfr+zgc_impact) 2. Compile it (you must include numbers200k.zip in resources : it contains a 36 Mo text files whose contents are used to create the huge String variable) 3. in the root of repository : 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR` 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -XX:StartFlightRecording -classpath target/classes poc.java.perf.write.TestPerf #ZGC with JFR` 4. The real time of the second run (with JFR) will be considerably higher than that of the first I ran these tests on my laptop : - Dell Inc. Latitude 5591 - openSUSE Tumbleweed 20260108 - Kernel : 6.18.3-1-default (64-bit) - 12 ? Intel? Core? i7-8850H CPU @ 2.60GHz - RAM 16 Gio - openjdk version "25.0.1" 2025-10-21 - OpenJDK Runtime Environment (build 25.0.1+8-27) - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) - many tabs opened in firefox ! I also ran it in a container (eclipse-temurin:25) on my laptop and with a windows laptop and came to the same conclusions : here are the measurements from the container : | Run with | Real time (s) | |-----------|---------------| | ZGC alone | 7.473 | | ZGC + jfr | 25.075 | | G1 alone | 10.195 | | G1 + jfr | 10.450 | After all these tests I tried to run the app with an other profiler tool in order to understand where is the issue. I join the flamegraph when running jfr+zgc : for the worker threads of the ForkJoinPool of Stream, stack traces of a majority of samples have the same top lines : - PosixSemaphore::wait - ZPageAllocator::alloc_page_stall - ZPageAllocator::alloc_page_inner - ZPageAllocator::alloc_page So many thread seem to spent their time waiting in the method ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic tasks threads has also a few samples where it waits at ZPageAllocator::alloc_page_stall. I hope this will help you to find the issue. Thank you very much for reading this email until the end. I hope this is the good place for such a feedback. Let me know if I must report my problem elsewhere. Be free to ask me more questions if you need. Thank you all for this amazing tool ! From mgronlun at openjdk.org Mon Jan 12 10:48:37 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 12 Jan 2026 10:48:37 GMT Subject: RFR: 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant In-Reply-To: References:

Message-ID: On Mon, 12 Jan 2026 09:43:14 GMT, Aleksey Shipilev wrote: > Man, this is confusing. Looks to me thread states are guarded specially. Looking at `Handshake::execute`, I see the pattern is: > > ``` > // Separate the arming of the poll in add_operation() above from > // the read of JavaThread state in the try_process() call below. > if (UseSystemMemoryBarrier) { > SystemMemoryBarrier::emit(); > } else { > OrderAccess::fence(); > } > ``` > > This follows `HandshakeState::add_operation` -> `SafepointMechanism::arm_local_poll_release`. `arm_local_poll_release` is what `JfrSampleThread::sample_native_thread` also does. So, the fix should follow what `Handshake` does. I think you are trying to do the same, but piggy-back on `OA::fence()` already done in `JfrMutexTryLock` when `-UseSystemMemoryBarrier`? Exactly right. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29155#issuecomment-3737916202 From egahlin at openjdk.org Mon Jan 12 11:35:06 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Mon, 12 Jan 2026 11:35:06 GMT Subject: Integrated: 8372321: TestBackToBackSensitive fails intermittently after JDK-8365972 In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 14:28:54 GMT, Erik Gahlin wrote: > Could I have a review of a PR that attempts to harden a test? > > Sometimes, ClassLoaderStatistics events are dropped, probably due to a bug in the RecordingStream class when starting multiple recordings simultaneously. This is not a bug related to back-to-back chunks, so I decided to use an EventFileStream instead. > > I also use TestClassLoader for verification purposes. Using PlatformClassLoader shouldn't be a problem, but it seems more prudent to have an actual object/class on the heap for the class loader that needs to be checked. > > Testing: jdk/jdk/jfr > > Thanks > Erik This pull request has now been integrated. Changeset: 556bddfd Author: Erik Gahlin URL: https://git.openjdk.org/jdk/commit/556bddfd9439d1bad698ab5134317ce263a36b04 Stats: 47 lines in 1 file changed: 21 ins; 14 del; 12 mod 8372321: TestBackToBackSensitive fails intermittently after JDK-8365972 Reviewed-by: mgronlun ------------- PR: https://git.openjdk.org/jdk/pull/29117 From thomas.schatzl at oracle.com Mon Jan 12 13:18:47 2026 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 12 Jan 2026 14:18:47 +0100 Subject: Using JFR both with ZGC degrades application throughput In-Reply-To: References: Message-ID: Hi, while not being able to answer the question about why using JFR takes so much additional time, when reading about your benchmark setup the following things came to my mind: * -XX:+UseCompressedOops for ZGC does nothing (ZGC does not support compressed oops at all), and G1 will automatically use it. You can leave it off. * G1 having a significantly worse throughput than ZGC is very rare: even then the extent you show is quite large. Taking some of content together (4g heap, Maps, huge string variables) indicates that you might have run into a well-known pathology of G1 with large objects: the application might waste up to 50% of your application due to these humongous objects [0]. G1 might work better in JDK 26 too as some enhancement to some particular case has been added. More is being worked on. TL;DR: Your application might run much better with a large(r) G1HeapRegionSize setting. Or just upgrading to JDK 26. * While ZGC does not have that in some cases extreme memory wastage for large allocations, there is still some. Adding JFR might just push it over the edge (the stack you showed are about finding a new empty page/region for allocation, failing to do so, doing a GC, stalling and waiting). Hth, Thomas [0] https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html On 11.01.26 19:23, Fabrice Bibonne wrote: > Hi all, > > ?I would like to report a case where starting jfr for an application > running with zgc causes a significant throughput degradation (compared > to when JFR is not started). > > ?My context : I was writing a little web app to illustrate a case where > the use of ZGC gives a better throughput than with G1. I benchmarked > with grafana k6 my application running with G1 and my application > running with ZGC ?: the runs with ZGC gave better throughputs. I wanted > to go a bit further in explanation so I began again my benchmarks with > JFR to be able to illustrate GC gains in JMC. When I ran my web app with > ZGC+JFR, I noticed a significant throughput degradation in my benchmark > (which was not the case with G1+JFR). > > ?Although I did not measure an increase in overhead as such, I still > wanted to report this issue because the degradation in throughput with > JFR is such that it would not be usable as is on a production service. > > I wrote a little application (not a web one) to reproduce the problem : > the application calls a little conversion service 200 times with random > numbers in parallel (to be like a web app in charge and to pressure GC). > The conversion service (a method named `convertNumberToWords`) convert > the number in a String looking for the String in a Map with the number > as th key. In order to instantiate and destroy many objects at each > call, the map is built parsing a huge String at each call. Application > ends after 200 calls. > > Here are the step to reproduce : > 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact > (be aware to be on branch jfr+zgc_impact) > 2. Compile it (you must include numbers200k.zip in resources : it > contains a 36 Mo text files whose contents are used to create the huge > String variable) > 3. in the root of repository : > 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath > target/classes poc.java.perf.write.TestPerf #ZGC without JFR` > 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops - > XX:StartFlightRecording -classpath target/classes > poc.java.perf.write.TestPerf #ZGC with JFR` > 4. The real time of the second run (with JFR) will be considerably > higher than that of the first > > I ran these tests on my laptop : > - Dell Inc. Latitude 5591 > - openSUSE Tumbleweed 20260108 > - Kernel : 6.18.3-1-default (64-bit) > - 12 ? Intel? Core? i7-8850H CPU @ 2.60GHz > - RAM 16 Gio > - openjdk version "25.0.1" 2025-10-21 > - OpenJDK Runtime Environment (build 25.0.1+8-27) > - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) > - many tabs opened in firefox ! > > I also ran it in a container (eclipse-temurin:25) on my laptop and with > a windows laptop and came to the same conclusions : here are the > measurements from the container : > > | Run with ?| Real time (s) | > |-----------|---------------| > | ZGC alone | 7.473 ? ? ? ? | > | ZGC + jfr | 25.075 ? ? ? ?| > | G1 alone ?| 10.195 ? ? ? ?| > | G1 + jfr ?| 10.450 ? ? ? ?| > > > After all these tests I tried to run the app with an other profiler tool > in order to understand where is the issue. I join the flamegraph when > running jfr+zgc : for the worker threads of the ForkJoinPool of Stream, > stack traces of a majority of samples have the same top lines : > - PosixSemaphore::wait > - ZPageAllocator::alloc_page_stall > - ZPageAllocator::alloc_page_inner > - ZPageAllocator::alloc_page > > So many thread seem to spent their time waiting in the method > ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic > tasks threads has also a few samples where it waits at > ZPageAllocator::alloc_page_stall. I hope this will help you to find the > issue. > > Thank you very much for reading this email until the end. I hope this is > the good place for such a feedback. Let me know if I must report my > problem elsewhere. Be free to ask me more questions if you need. > > Thank you all for this amazing tool ! > > From mgronlun at openjdk.org Mon Jan 12 13:52:58 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 12 Jan 2026 13:52:58 GMT Subject: RFR: 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant In-Reply-To: References: Message-ID: On Sun, 11 Jan 2026 12:39:06 GMT, Markus Gr?nlund wrote: > Greetings, > > When sampling threads in state _thread_in_native, there is a missing memory barrier when UseSystemMemoryBarrier is used, because it must be emitted manually. > > Testing: jdk_jfr > > Thanks > Markus > > PS "threads_lock" local variable was renamed to "lock" not to confuse with the global Threads_lock. I am going to have to move (back) the Threads_lock acquisition (as part of https://bugs.openjdk.org/browse/JDK-8373106 to where it was placed originally, before JFR Cooperative Sampling). Hence, I will update this to exactly mirror the Handshake pattern. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29155#issuecomment-3738645122 From mgronlun at openjdk.org Mon Jan 12 14:00:23 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 12 Jan 2026 14:00:23 GMT Subject: RFR: 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant [v2] In-Reply-To: References: Message-ID: > Greetings, > > When sampling threads in state _thread_in_native, there is a missing memory barrier when UseSystemMemoryBarrier is used, because it must be emitted manually. > > Testing: jdk_jfr > > Thanks > Markus > > PS "threads_lock" local variable was renamed to "lock" not to confuse with the global Threads_lock. Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: explicit fences ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29155/files - new: https://git.openjdk.org/jdk/pull/29155/files/9b8ed440..8e951ac8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29155&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29155&range=00-01 Stats: 5 lines in 1 file changed: 2 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29155.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29155/head:pull/29155 PR: https://git.openjdk.org/jdk/pull/29155 From shade at openjdk.org Mon Jan 12 14:19:50 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Jan 2026 14:19:50 GMT Subject: RFR: 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant [v2] In-Reply-To: References:

Message-ID: On Mon, 12 Jan 2026 14:00:23 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> When sampling threads in state _thread_in_native, there is a missing memory barrier when UseSystemMemoryBarrier is used, because it must be emitted manually. >> >> Testing: jdk_jfr >> >> Thanks >> Markus >> >> PS "threads_lock" local variable was renamed to "lock" not to confuse with the global Threads_lock. > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > explicit fences All right, that reads better, thanks. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29155#pullrequestreview-3650994267 From fabrice.bibonne at courriel.eco Mon Jan 12 15:59:09 2026 From: fabrice.bibonne at courriel.eco (Fabrice Bibonne) Date: Mon, 12 Jan 2026 16:59:09 +0100 Subject: Using JFR both with ZGC degrades application throughput In-Reply-To: References: Message-ID: <80f97dba0b628057de3b7cd2ef4c3bea@courriel.eco> Here is a unique source code file for the reproducer (the big String is generated when starting as you suggested). It changes a little the results but the run with zgc + jfr is still taking lot of time. Thanks you for having a look. Fabrice Le 2026-01-12 10:56, Erik Gahlin a ?crit : > Hi Fabrice, > > Thanks for reporting! > > Could you post the source code for the reproducer here? The 36 MB file > could probably be replaced with a String::repeat expression. > > JFR does use some memory, which could impact available heap and > performance, although the degradation you're seeing seems awfully high. > > Thanks > Erik > > ________________________________________ > From: hotspot-jfr-dev on behalf of > Fabrice Bibonne > Sent: Sunday, January 11, 2026 7:23 PM > To: hotspot-jfr-dev at openjdk.org > Subject: Using JFR both with ZGC degrades application throughput > > Hi all, > > I would like to report a case where starting jfr for an application > running with zgc causes a significant throughput degradation (compared > to when JFR is not started). > > My context : I was writing a little web app to illustrate a case where > the use of ZGC gives a better throughput than with G1. I benchmarked > with grafana k6 my application running with G1 and my application > running with ZGC : the runs with ZGC gave better throughputs. I wanted > to go a bit further in explanation so I began again my benchmarks with > JFR to be able to illustrate GC gains in JMC. When I ran my web app > with ZGC+JFR, I noticed a significant throughput degradation in my > benchmark (which was not the case with G1+JFR). > > Although I did not measure an increase in overhead as such, I still > wanted to report this issue because the degradation in throughput with > JFR is such that it would not be usable as is on a production service. > > I wrote a little application (not a web one) to reproduce the problem : > the application calls a little conversion service 200 times with random > numbers in parallel (to be like a web app in charge and to pressure > GC). The conversion service (a method named `convertNumberToWords`) > convert the number in a String looking for the String in a Map with the > number as th key. In order to instantiate and destroy many objects at > each call, the map is built parsing a huge String at each call. > Application ends after 200 calls. > > Here are the step to reproduce : > 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact > (be aware to be on branch jfr+zgc_impact) > 2. Compile it (you must include numbers200k.zip in resources : it > contains a 36 Mo text files whose contents are used to create the huge > String variable) > 3. in the root of repository : > 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath > target/classes poc.java.perf.write.TestPerf #ZGC without JFR` > 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops > -XX:StartFlightRecording -classpath target/classes > poc.java.perf.write.TestPerf #ZGC with JFR` > 4. The real time of the second run (with JFR) will be considerably > higher than that of the first > > I ran these tests on my laptop : > - Dell Inc. Latitude 5591 > - openSUSE Tumbleweed 20260108 > - Kernel : 6.18.3-1-default (64-bit) > - 12 ? Intel(R) Core(tm) i7-8850H CPU @ 2.60GHz > - RAM 16 Gio > - openjdk version "25.0.1" 2025-10-21 > - OpenJDK Runtime Environment (build 25.0.1+8-27) > - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) > - many tabs opened in firefox ! > > I also ran it in a container (eclipse-temurin:25) on my laptop and with > a windows laptop and came to the same conclusions : here are the > measurements from the container : > > | Run with | Real time (s) | > |-----------|---------------| > | ZGC alone | 7.473 | > | ZGC + jfr | 25.075 | > | G1 alone | 10.195 | > | G1 + jfr | 10.450 | > > After all these tests I tried to run the app with an other profiler > tool in order to understand where is the issue. I join the flamegraph > when running jfr+zgc : for the worker threads of the ForkJoinPool of > Stream, stack traces of a majority of samples have the same top lines : > - PosixSemaphore::wait > - ZPageAllocator::alloc_page_stall > - ZPageAllocator::alloc_page_inner > - ZPageAllocator::alloc_page > > So many thread seem to spent their time waiting in the method > ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic > tasks threads has also a few samples where it waits at > ZPageAllocator::alloc_page_stall. I hope this will help you to find the > issue. > > Thank you very much for reading this email until the end. I hope this > is the good place for such a feedback. Let me know if I must report my > problem elsewhere. Be free to ask me more questions if you need. > > Thank you all for this amazing tool ! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: TestPerf.java Type: text/x-c Size: 2744 bytes Desc: not available URL: From mgronlun at openjdk.org Mon Jan 12 21:38:18 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 12 Jan 2026 21:38:18 GMT Subject: RFR: 8373106: JFR suspend/resume deadlock on macOS in pthreads library Message-ID: Greetings, this change effectively reverts [JDK-8358429](https://bugs.openjdk.org/browse/JDK-8358429), which was an attempt to minimize the time the Threads_lock is held during JFR sampling. That change was premised on the, at the time, two known reasons for why we held the Threads_lock during the entire sampling interval. After this change, subtle deadlocks happened on macOS, very intermittently, in the pthreads library, in that a suspended thread could be the owner of an internal process lock, a process lock that was then needed when sending pthread_kill signal to resume it. By rolling back to holding the Threads_lock for the entire duration of the sampling interval (like we have done for many many years in the era before JFR Cooperative Sampling), we prevent JavaThreads from calling os::create_thread(). I have decided to rollback the solution to the version we know work, instead of attempting a more granular solution, perhaps using sigprocmask() to create a critical section around pthread_create in os_bsd.cpp. This is something we might want to do later, but more time is then needed for falsifying / verifying the correct fix. Testing: jdk_jfr, stress testing Thanks Markus PS Indirect barriers removed are explicitly re-inserted as per [JDK-8373485](https://bugs.openjdk.org/browse/JDK-8373485) ------------- Commit messages: - 8373106 Changes: https://git.openjdk.org/jdk/pull/29178/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29178&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373106 Stats: 62 lines in 1 file changed: 12 ins; 18 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/29178.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29178/head:pull/29178 PR: https://git.openjdk.org/jdk/pull/29178 From ysuenaga at openjdk.org Mon Jan 12 23:44:59 2026 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Mon, 12 Jan 2026 23:44:59 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 14:14:19 GMT, Markus Gr?nlund wrote: > Alternative for solving [JDK-8371014](https://bugs.openjdk.org/browse/JDK-8371014) > > Also includes a fix for [JDK-8373257](https://bugs.openjdk.org/browse/JDK-8373257) > > Testing: jdk_jfr, stress testing, manual testing with CrashOnOutOfMemoryError, tier1-6 Thanks a lot for working on this! ------------- Marked as reviewed by ysuenaga (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29094#pullrequestreview-3653157591 From fabrice.bibonne at courriel.eco Tue Jan 13 04:36:58 2026 From: fabrice.bibonne at courriel.eco (Fabrice Bibonne) Date: Tue, 13 Jan 2026 05:36:58 +0100 Subject: Using JFR both with ZGC degrades application throughput In-Reply-To: References:

Message-ID: <504c670a6250f5c2ee5e27a8bed97980@courriel.eco> Thank you for your advise, I just give a few precisions in a few lines : * for `-XX:+UseCompressedOops`, I must admit I do not know this option : I add it because JDK Mission control warned me about it in "Automated analysis result" after a fisrt try (<>) * it is true that application waste time in GC pauses (46,6% of time with G1) : I wanted an example app which uses GC a lot. Maybe this is a little too much compared to real apps (even if for some of them, we may wonder...). * the stack I showed about finding a new empty page/region allocation is present in both cases (with jfr and without jfr). But in the case with jfr, it is much more wider : it takes much more samples. Best regards, Fabrice Le 2026-01-12 14:18, Thomas Schatzl a ?crit : > Hi, > > while not being able to answer the question about why using JFR takes > so much additional time, when reading about your benchmark setup the > following things came to my mind: > > * -XX:+UseCompressedOops for ZGC does nothing (ZGC does not support > compressed oops at all), and G1 will automatically use it. You can > leave it off. > > * G1 having a significantly worse throughput than ZGC is very rare: > even then the extent you show is quite large. Taking some of content > together (4g heap, Maps, huge string variables) indicates that you > might have run into a well-known pathology of G1 with large objects: > the application might waste up to 50% of your application due to these > humongous objects [0 [1]]. > G1 might work better in JDK 26 too as some enhancement to some > particular case has been added. More is being worked on. > > TL;DR: Your application might run much better with a large(r) > G1HeapRegionSize setting. Or just upgrading to JDK 26. > > * While ZGC does not have that in some cases extreme memory wastage for > large allocations, there is still some. Adding JFR might just push it > over the edge (the stack you showed are about finding a new empty > page/region for allocation, failing to do so, doing a GC, stalling and > waiting). > > Hth, > Thomas > > [0] https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html > > On 11.01.26 19:23, Fabrice Bibonne wrote: > >> Hi all, >> >> I would like to report a case where starting jfr for an application >> running with zgc causes a significant throughput degradation (compared >> to when JFR is not started). >> >> My context : I was writing a little web app to illustrate a case where >> the use of ZGC gives a better throughput than with G1. I benchmarked >> with grafana k6 my application running with G1 and my application >> running with ZGC : the runs with ZGC gave better throughputs. I >> wanted to go a bit further in explanation so I began again my >> benchmarks with JFR to be able to illustrate GC gains in JMC. When I >> ran my web app with ZGC+JFR, I noticed a significant throughput >> degradation in my benchmark (which was not the case with G1+JFR). >> >> Although I did not measure an increase in overhead as such, I still >> wanted to report this issue because the degradation in throughput with >> JFR is such that it would not be usable as is on a production service. >> >> I wrote a little application (not a web one) to reproduce the problem >> : the application calls a little conversion service 200 times with >> random numbers in parallel (to be like a web app in charge and to >> pressure GC). The conversion service (a method named >> `convertNumberToWords`) convert the number in a String looking for the >> String in a Map with the number as th key. In order to instantiate and >> destroy many objects at each call, the map is built parsing a huge >> String at each call. Application ends after 200 calls. >> >> Here are the step to reproduce : >> 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact >> (be aware to be on branch jfr+zgc_impact) >> 2. Compile it (you must include numbers200k.zip in resources : it >> contains a 36 Mo text files whose contents are used to create the huge >> String variable) >> 3. in the root of repository : >> 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops >> -classpath target/classes poc.java.perf.write.TestPerf #ZGC without >> JFR` >> 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops - >> XX:StartFlightRecording -classpath target/classes >> poc.java.perf.write.TestPerf #ZGC with JFR` >> 4. The real time of the second run (with JFR) will be considerably >> higher than that of the first >> >> I ran these tests on my laptop : >> - Dell Inc. Latitude 5591 >> - openSUSE Tumbleweed 20260108 >> - Kernel : 6.18.3-1-default (64-bit) >> - 12 ? Intel(R) Core(tm) i7-8850H CPU @ 2.60GHz >> - RAM 16 Gio >> - openjdk version "25.0.1" 2025-10-21 >> - OpenJDK Runtime Environment (build 25.0.1+8-27) >> - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) >> - many tabs opened in firefox ! >> >> I also ran it in a container (eclipse-temurin:25) on my laptop and >> with a windows laptop and came to the same conclusions : here are the >> measurements from the container : >> >> | Run with | Real time (s) | >> |-----------|---------------| >> | ZGC alone | 7.473 | >> | ZGC + jfr | 25.075 | >> | G1 alone | 10.195 | >> | G1 + jfr | 10.450 | >> >> After all these tests I tried to run the app with an other profiler >> tool in order to understand where is the issue. I join the flamegraph >> when running jfr+zgc : for the worker threads of the ForkJoinPool of >> Stream, stack traces of a majority of samples have the same top lines >> : >> - PosixSemaphore::wait >> - ZPageAllocator::alloc_page_stall >> - ZPageAllocator::alloc_page_inner >> - ZPageAllocator::alloc_page >> >> So many thread seem to spent their time waiting in the method >> ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic >> tasks threads has also a few samples where it waits at >> ZPageAllocator::alloc_page_stall. I hope this will help you to find >> the issue. >> >> Thank you very much for reading this email until the end. I hope this >> is the good place for such a feedback. Let me know if I must report my >> problem elsewhere. Be free to ask me more questions if you need. >> >> Thank you all for this amazing tool ! Links: ------ [1] https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Tue Jan 13 10:06:07 2026 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 13 Jan 2026 11:06:07 +0100 Subject: Using JFR both with ZGC degrades application throughput In-Reply-To: <504c670a6250f5c2ee5e27a8bed97980@courriel.eco> References:

<504c670a6250f5c2ee5e27a8bed97980@courriel.eco> Message-ID: <948f3b50-80a2-4d89-a341-d9908a07f862@oracle.com> Hi, On 13.01.26 05:36, Fabrice Bibonne wrote: > Thank you for your advise, I just give a few precisions in a few lines? : > > * for `-XX:+UseCompressedOops`, I must admit I do not know this option : > I add it because JDK Mission control warned me about it in "Automated > analysis result" after a fisrt try (< [...].Use the JVM argument '-XX:+UseCompressedOops' to enable this > feature>>) Maybe JMC should not provide this hint for ZGC then (not directed towards you). > > * it is true that application waste time in GC pauses (46,6% of time > with G1) : I wanted an example app which uses GC a lot. Maybe this is a > little too much compared to real apps (even if for some of them, we may > wonder...). What I am saying is that while the results are as they are for you, I suspect that the result is not representative for G1 as it exercises a pathology that could (and unfortunately must if it is really the case) be resolved by a single command line switch by the user. The G1 GC algorithm would need prior knowledge of the application it is running to automatically resolve this. Having had a look at G1 behavior, the reason for the low performance is likely due to heap sizing heuristics issues, G1 does not expand the heap as aggressively as ZGC. The upside obviously is that it uses (much) less memory. ;) More technical explanation: * in presence of full collection ([0], in the process of being fixed right now), G1 does not expand the heap, running with maybe half of what ZGC uses. This is due to the behavior of the application. * even if fixing the bug via some command line options (-XX:MaxHeapFreeRatio=100), the short runtime of the application (i.e. even with that fix applied, it takes to long to get to the same heap size as ZGC for reasons we can discuss if you want. * the mentioned issue with the large objects, i.e. G1 wasting too much memory also contributes. Interestingly, I only observed this on slower systems, these issues do not show on faster ones, e.g. on some x64 workstation (limited to 10 threads). On that workstation, G1 is 2x faster than ZGC with the settings you gave already. However on some smallish Aarch64 VM it is around the same performance (slightly slower). This is probably what you are seeing on your laptop (which may also experience aggressive throttling without precautions). TL;DR: If you set minimum heap size, and region size, G1 is 2x faster than ZGC (with -Xms4g -Xmx4g -XX:G1HeapRegionSize=8m) on that slower aarch64 machine too here. (Fwiw, for maximum throughput we recommend to set minimum and maximum heap size to the same value irrespective of the garbage collector, see the recommendations in our performance guide [1]. It also describes the issue with humongous objects. We are working on improving both issues right now). Another observation is that with ZGC, although overall throughput is faster than with G1 in your original example, its allocation stalls are in the range of hundreds of milliseconds, while G1 pauses are at most at 50ms. So the "experience" with that original web app may be better with G1 even if it is slower overall :P (We do not recommend running latency oriented programs at that cpu load level either way, but just noticing). > * the stack I showed about finding a new empty page/region allocation is > present in both cases (with jfr and without jfr). But in the case with > jfr, it is much more wider : it takes much more samples. Problems tend to exacerbate themselves, i.e. after a certain threshold of allocation rate beyond what it can sustain, performance can quickly (non-linearly) detoriate, e.g. because of the need to use of different slower algorithms. Without JFR I am already seeing that almost all GCs are caused by allocation stalls. Adding to that will not help. When looking around in ZGC logs a bit, with StartFlightRecording there seems to be much more so-called in-place object movement (i.e. instead of copying live objects to a new place, and then freeing the old now empty space, the objects are moved "down" the heap to fill gaps), which is a lot more expensive. This shows in garbage collection pauses, changing from hundreds of ms to seconds. As mentioned above, it looks like just that little extra memory usage causes ZGC to go into some very slow mode to free memory and avoid OOME. Hth, Thomas [0] https://bugs.openjdk.org/browse/JDK-8238686 [1] https://docs.oracle.com/en/java/javase/25/gctuning/garbage-first-garbage-collector-tuning.html > > Best regards, > > Fabrice > > > Le 2026-01-12 14:18, Thomas Schatzl a ?crit?: > >> Hi, >> >> ? while not being able to answer the question about why using JFR >> takes so much additional time, when reading about your benchmark setup >> the following things came to my mind: >> >> * -XX:+UseCompressedOops for ZGC does nothing (ZGC does not support >> compressed oops at all), and G1 will automatically use it. You can >> leave it off. >> >> * G1 having a significantly worse throughput than ZGC is very rare: >> even then the extent you show is quite large. Taking some of content >> together (4g heap, Maps, huge string variables) indicates that you >> might have run into a well-known pathology of G1 with large objects: >> the application might waste up to 50% of your application due to these >> humongous objects [0 > regions-x-large.html>]. >> G1 might work better in JDK 26 too as some enhancement to some >> particular case has been added. More is being worked on. >> >> TL;DR: Your application might run much better with a large(r) >> G1HeapRegionSize setting. Or just upgrading to JDK 26. >> >> * While ZGC does not have that in some cases extreme memory wastage >> for large allocations, there is still some. Adding JFR might just push >> it over the edge (the stack you showed are about finding a new empty >> page/region for allocation, failing to do so, doing a GC, stalling and >> waiting). >> >> Hth, >> ? Thomas >> >> [0] https://tschatzl.github.io/2021/11/15/heap-regions-x-large.html >> >> >> On 11.01.26 19:23, Fabrice Bibonne wrote: >>> Hi all, >>> >>> ??I would like to report a case where starting jfr for an application >>> running with zgc causes a significant throughput degradation >>> (compared to when JFR is not started). >>> >>> ??My context : I was writing a little web app to illustrate a case >>> where the use of ZGC gives a better throughput than with G1. I >>> benchmarked with grafana k6 my application running with G1 and my >>> application running with ZGC ?: the runs with ZGC gave better >>> throughputs. I wanted to go a bit further in explanation so I began >>> again my benchmarks with JFR to be able to illustrate GC gains in >>> JMC. When I ran my web app with ZGC+JFR, I noticed a significant >>> throughput degradation in my benchmark (which was not the case with >>> G1+JFR). >>> >>> ??Although I did not measure an increase in overhead as such, I still >>> wanted to report this issue because the degradation in throughput >>> with JFR is such that it would not be usable as is on a production >>> service. >>> >>> I wrote a little application (not a web one) to reproduce the >>> problem : the application calls a little conversion service 200 times >>> with random numbers in parallel (to be like a web app in charge and >>> to pressure GC). The conversion service (a method named >>> `convertNumberToWords`) convert the number in a String looking for >>> the String in a Map with the number as th key. In order to >>> instantiate and destroy many objects at each call, the map is built >>> parsing a huge String at each call. Application ends after 200 calls. >>> >>> Here are the step to reproduce : >>> 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact >>> (be >>> aware to be on branch jfr+zgc_impact) >>> 2. Compile it (you must include numbers200k.zip in resources : it >>> contains a 36 Mo text files whose contents are used to create the >>> huge String variable) >>> 3. in the root of repository : >>> 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops - >>> classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR` >>> 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops - >>> XX:StartFlightRecording -classpath target/classes >>> poc.java.perf.write.TestPerf #ZGC with JFR` >>> 4. The real time of the second run (with JFR) will be considerably >>> higher than that of the first >>> >>> I ran these tests on my laptop : >>> - Dell Inc. Latitude 5591 >>> - openSUSE Tumbleweed 20260108 >>> - Kernel : 6.18.3-1-default (64-bit) >>> - 12 ? Intel? Core? i7-8850H CPU @ 2.60GHz >>> - RAM 16 Gio >>> - openjdk version "25.0.1" 2025-10-21 >>> - OpenJDK Runtime Environment (build 25.0.1+8-27) >>> - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) >>> - many tabs opened in firefox ! >>> >>> I also ran it in a container (eclipse-temurin:25) on my laptop and >>> with a windows laptop and came to the same conclusions : here are the >>> measurements from the container : >>> >>> | Run with ?| Real time (s) | >>> |-----------|---------------| >>> | ZGC alone | 7.473 ? ? ? ? | >>> | ZGC + jfr | 25.075 ? ? ? ?| >>> | G1 alone ?| 10.195 ? ? ? ?| >>> | G1 + jfr ?| 10.450 ? ? ? ?| >>> >>> >>> After all these tests I tried to run the app with an other profiler >>> tool in order to understand where is the issue. I join the flamegraph >>> when running jfr+zgc : for the worker threads of the ForkJoinPool of >>> Stream, stack traces of a majority of samples have the same top lines : >>> - PosixSemaphore::wait >>> - ZPageAllocator::alloc_page_stall >>> - ZPageAllocator::alloc_page_inner >>> - ZPageAllocator::alloc_page >>> >>> So many thread seem to spent their time waiting in the method >>> ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic >>> tasks threads has also a few samples where it waits at >>> ZPageAllocator::alloc_page_stall. I hope this will help you to find >>> the issue. >>> >>> Thank you very much for reading this email until the end. I hope this >>> is the good place for such a feedback. Let me know if I must report >>> my problem elsewhere. Be free to ask me more questions if you need. >>> >>> Thank you all for this amazing tool ! >>> >>> From egahlin at openjdk.org Tue Jan 13 10:33:43 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 13 Jan 2026 10:33:43 GMT Subject: RFR: 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant [v2] In-Reply-To: References:

Message-ID: <0QURWowNF1yaO7n1MYUHNS-MRVgvXrqX4GXP64R3vQE=.15dd2eea-5b51-41ef-9b19-5c9fd2da670b@github.com> On Mon, 12 Jan 2026 14:16:51 GMT, Aleksey Shipilev wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> explicit fences > > All right, that reads better, thanks. Thanks for your reviews @shipilev and @egahlin! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29155#issuecomment-3743867509 From mgronlun at openjdk.org Tue Jan 13 11:48:07 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 13 Jan 2026 11:48:07 GMT Subject: Integrated: 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant In-Reply-To: References: Message-ID: On Sun, 11 Jan 2026 12:39:06 GMT, Markus Gr?nlund wrote: > Greetings, > > When sampling threads in state _thread_in_native, there is a missing memory barrier when UseSystemMemoryBarrier is used, because it must be emitted manually. > > Testing: jdk_jfr > > Thanks > Markus > > PS "threads_lock" local variable was renamed to "lock" not to confuse with the global Threads_lock. This pull request has now been integrated. Changeset: 543a9722 Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/543a972222118155e4c72c6f2d32d154c5dfd442 Stats: 11 lines in 1 file changed: 7 ins; 0 del; 4 mod 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant Reviewed-by: shade, egahlin ------------- PR: https://git.openjdk.org/jdk/pull/29155 From mgronlun at openjdk.org Tue Jan 13 12:05:41 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 13 Jan 2026 12:05:41 GMT Subject: RFR: 8373106: JFR suspend/resume deadlock on macOS in pthreads library [v2] In-Reply-To: References: Message-ID: > Greetings, > > this change effectively reverts [JDK-8358429](https://bugs.openjdk.org/browse/JDK-8358429), which was an attempt to minimize the time the Threads_lock is held during JFR sampling. That change was premised on the, at the time, two known reasons for why we held the Threads_lock during the entire sampling interval. > > After this change, subtle deadlocks happened on macOS, very intermittently, in the pthreads library, in that a suspended thread could be the owner of an internal process lock, a process lock that was then needed when sending pthread_kill signal to resume it. > > By rolling back to holding the Threads_lock for the entire duration of the sampling interval (like we have done for many many years in the era before JFR Cooperative Sampling), we prevent JavaThreads from calling os::create_thread(). > > I have decided to rollback the solution to the version we know work, instead of attempting a more granular solution, perhaps using sigprocmask() to create a critical section around pthread_create in os_bsd.cpp. This is something we might want to do later, but more time is then needed for falsifying / verifying the correct fix. > > Testing: jdk_jfr, stress testing > > Thanks > Markus > > PS Indirect barriers removed are explicitly re-inserted as per [JDK-8373485](https://bugs.openjdk.org/browse/JDK-8373485) Markus Gr?nlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - remove extraneous assertion - Merge branch 'master' into 8373106 - 8373106 ------------- Changes: https://git.openjdk.org/jdk/pull/29178/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29178&range=01 Stats: 62 lines in 1 file changed: 12 ins; 18 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/29178.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29178/head:pull/29178 PR: https://git.openjdk.org/jdk/pull/29178 From mgronlun at openjdk.org Tue Jan 13 12:24:38 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 13 Jan 2026 12:24:38 GMT Subject: [jdk26] RFR: 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant Message-ID: <34IxtvJSmrvH3eMEUTonmlsiz5eFXCmnAXymcvgo4jw=.51316cb5-4b87-49a7-9bf4-145cf984c940@github.com> 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant ------------- Commit messages: - Backport 543a972222118155e4c72c6f2d32d154c5dfd442 Changes: https://git.openjdk.org/jdk/pull/29189/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29189&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373485 Stats: 11 lines in 1 file changed: 7 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/29189.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29189/head:pull/29189 PR: https://git.openjdk.org/jdk/pull/29189 From shade at openjdk.org Tue Jan 13 12:24:39 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Jan 2026 12:24:39 GMT Subject: [jdk26] RFR: 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant In-Reply-To: <34IxtvJSmrvH3eMEUTonmlsiz5eFXCmnAXymcvgo4jw=.51316cb5-4b87-49a7-9bf4-145cf984c940@github.com> References: <34IxtvJSmrvH3eMEUTonmlsiz5eFXCmnAXymcvgo4jw=.51316cb5-4b87-49a7-9bf4-145cf984c940@github.com> Message-ID: On Tue, 13 Jan 2026 12:14:17 GMT, Markus Gr?nlund wrote: > 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29189#pullrequestreview-3655493911 From egahlin at openjdk.org Tue Jan 13 13:21:54 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 13 Jan 2026 13:21:54 GMT Subject: RFR: 8373106: JFR suspend/resume deadlock on macOS in pthreads library [v2] In-Reply-To: References:

Message-ID: On Tue, 13 Jan 2026 12:05:41 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> this change effectively reverts [JDK-8358429](https://bugs.openjdk.org/browse/JDK-8358429), which was an attempt to minimize the time the Threads_lock is held during JFR sampling. That change was premised on the, at the time, two known reasons for why we held the Threads_lock during the entire sampling interval. >> >> After this change, subtle deadlocks happened on macOS, very intermittently, in the pthreads library, in that a suspended thread could be the owner of an internal process lock, a process lock that was then needed when sending pthread_kill signal to resume it. >> >> By rolling back to holding the Threads_lock for the entire duration of the sampling interval (like we have done for many many years in the era before JFR Cooperative Sampling), we prevent JavaThreads from calling os::create_thread(). >> >> I have decided to rollback the solution to the version we know work, instead of attempting a more granular solution, perhaps using sigprocmask() to create a critical section around pthread_create in os_bsd.cpp. This is something we might want to do later, but more time is then needed for falsifying / verifying the correct fix. >> >> Testing: jdk_jfr, stress testing >> >> Thanks >> Markus >> >> PS Indirect barriers removed are explicitly re-inserted as per [JDK-8373485](https://bugs.openjdk.org/browse/JDK-8373485) > > Markus Gr?nlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - remove extraneous assertion > - Merge branch 'master' into 8373106 > - 8373106 Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29178#pullrequestreview-3655758172 From egahlin at openjdk.org Tue Jan 13 13:30:16 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 13 Jan 2026 13:30:16 GMT Subject: [jdk26] RFR: 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant In-Reply-To: <34IxtvJSmrvH3eMEUTonmlsiz5eFXCmnAXymcvgo4jw=.51316cb5-4b87-49a7-9bf4-145cf984c940@github.com> References: <34IxtvJSmrvH3eMEUTonmlsiz5eFXCmnAXymcvgo4jw=.51316cb5-4b87-49a7-9bf4-145cf984c940@github.com> Message-ID: On Tue, 13 Jan 2026 12:14:17 GMT, Markus Gr?nlund wrote: > 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29189#pullrequestreview-3655791568 From mgronlun at openjdk.org Tue Jan 13 13:39:45 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 13 Jan 2026 13:39:45 GMT Subject: [jdk26] RFR: 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant In-Reply-To: References: <34IxtvJSmrvH3eMEUTonmlsiz5eFXCmnAXymcvgo4jw=.51316cb5-4b87-49a7-9bf4-145cf984c940@github.com> Message-ID: On Tue, 13 Jan 2026 12:18:46 GMT, Aleksey Shipilev wrote: >> 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant > > Marked as reviewed by shade (Reviewer). Thanks @shipilev and @egahlin for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29189#issuecomment-3744387302 From mgronlun at openjdk.org Tue Jan 13 13:43:07 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 13 Jan 2026 13:43:07 GMT Subject: [jdk26] Integrated: 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant In-Reply-To: <34IxtvJSmrvH3eMEUTonmlsiz5eFXCmnAXymcvgo4jw=.51316cb5-4b87-49a7-9bf4-145cf984c940@github.com> References: <34IxtvJSmrvH3eMEUTonmlsiz5eFXCmnAXymcvgo4jw=.51316cb5-4b87-49a7-9bf4-145cf984c940@github.com> Message-ID: <1pwZMVoxLuPdhjOc72ousIFh0iH0gJwhzAXWQlDIX-Q=.6ea1592b-ca97-4fbf-a1f3-62f5254e93af@github.com> On Tue, 13 Jan 2026 12:14:17 GMT, Markus Gr?nlund wrote: > 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant This pull request has now been integrated. Changeset: 58be8702 Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/58be8702d8b2434b810e8f142d631827ddf758a0 Stats: 11 lines in 1 file changed: 7 ins; 0 del; 4 mod 8373485: JFR Crash during sampling: assert(jt->has_last_Java_frame()) failed: invariant Reviewed-by: shade, egahlin Backport-of: 543a972222118155e4c72c6f2d32d154c5dfd442 ------------- PR: https://git.openjdk.org/jdk/pull/29189 From egahlin at openjdk.org Tue Jan 13 14:06:13 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 13 Jan 2026 14:06:13 GMT Subject: [jdk26] RFR: 8372321: TestBackToBackSensitive fails intermittently after JDK-8365972 Message-ID: <24Tldr_YSbD5O2w5hXET_zCFQb4G-ZwnVM9C4VdPykA=.3f29601b-14ac-47a0-b62f-744fb1095b69@github.com> 8372321: TestBackToBackSensitive fails intermittently after JDK-8365972 ------------- Commit messages: - Backport 556bddfd9439d1bad698ab5134317ce263a36b04 Changes: https://git.openjdk.org/jdk/pull/29191/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29191&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372321 Stats: 47 lines in 1 file changed: 21 ins; 14 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/29191.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29191/head:pull/29191 PR: https://git.openjdk.org/jdk/pull/29191 From mgronlun at openjdk.org Tue Jan 13 14:30:03 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 13 Jan 2026 14:30:03 GMT Subject: [jdk26] RFR: 8372321: TestBackToBackSensitive fails intermittently after JDK-8365972 In-Reply-To: <24Tldr_YSbD5O2w5hXET_zCFQb4G-ZwnVM9C4VdPykA=.3f29601b-14ac-47a0-b62f-744fb1095b69@github.com> References: <24Tldr_YSbD5O2w5hXET_zCFQb4G-ZwnVM9C4VdPykA=.3f29601b-14ac-47a0-b62f-744fb1095b69@github.com> Message-ID: On Tue, 13 Jan 2026 13:51:44 GMT, Erik Gahlin wrote: > 8372321: TestBackToBackSensitive fails intermittently after JDK-8365972 Marked as reviewed by mgronlun (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29191#pullrequestreview-3656094767 From mgronlun at openjdk.org Tue Jan 13 18:06:39 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 13 Jan 2026 18:06:39 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: <8LD4JmIZnVSwmhLeVZROok-0h-nCD1TxlaSRHe586-E=.99a49bfa-444f-4ddf-b206-0a75fe1dad23@github.com>

Message-ID: On Thu, 8 Jan 2026 01:32:44 GMT, Yasumasa Suenaga wrote: >>> TestEmergencyDumpAtOOM.java has passed on both, AIX and linux on PPC64. Thanks! >> >> Thanks Martin. > > Thanks a lot @mgronlun ! Looks good in general. > > Can we wait to finish `service.emit_leakprofiler_events()` in JFR recorder thread before the crash at `report_java_out_of_memory()` in debug.cpp? whether `abort()` is called before finishing to dump events by recorder thread. Thanks for your review @YaSuenag. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29094#issuecomment-3745680784 From mgronlun at openjdk.org Tue Jan 13 18:08:33 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 13 Jan 2026 18:08:33 GMT Subject: Integrated: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 14:14:19 GMT, Markus Gr?nlund wrote: > Alternative for solving [JDK-8371014](https://bugs.openjdk.org/browse/JDK-8371014) > > Also includes a fix for [JDK-8373257](https://bugs.openjdk.org/browse/JDK-8373257) > > Testing: jdk_jfr, stress testing, manual testing with CrashOnOutOfMemoryError, tier1-6 This pull request has now been integrated. Changeset: f23752a7 Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/f23752a75ee3d3af0853eff9c678d2496bb1cf58 Stats: 292 lines in 15 files changed: 219 ins; 18 del; 55 mod 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented Reviewed-by: ysuenaga ------------- PR: https://git.openjdk.org/jdk/pull/29094 From mgronlun at openjdk.org Tue Jan 13 19:40:00 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 13 Jan 2026 19:40:00 GMT Subject: RFR: 8373106: JFR suspend/resume deadlock on macOS in pthreads library [v2] In-Reply-To: References:

Message-ID: On Tue, 13 Jan 2026 13:18:28 GMT, Erik Gahlin wrote: >> Markus Gr?nlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - remove extraneous assertion >> - Merge branch 'master' into 8373106 >> - 8373106 > > Marked as reviewed by egahlin (Reviewer). Thanks @egahlin for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29178#issuecomment-3746145756 From mgronlun at openjdk.org Tue Jan 13 19:43:47 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 13 Jan 2026 19:43:47 GMT Subject: Integrated: 8373106: JFR suspend/resume deadlock on macOS in pthreads library In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 21:29:26 GMT, Markus Gr?nlund wrote: > Greetings, > > this change effectively reverts [JDK-8358429](https://bugs.openjdk.org/browse/JDK-8358429), which was an attempt to minimize the time the Threads_lock is held during JFR sampling. That change was premised on the, at the time, two known reasons for why we held the Threads_lock during the entire sampling interval. > > After this change, subtle deadlocks happened on macOS, very intermittently, in the pthreads library, in that a suspended thread could be the owner of an internal process lock, a process lock that was then needed when sending pthread_kill signal to resume it. > > By rolling back to holding the Threads_lock for the entire duration of the sampling interval (like we have done for many many years in the era before JFR Cooperative Sampling), we prevent JavaThreads from calling os::create_thread(). > > I have decided to rollback the solution to the version we know work, instead of attempting a more granular solution, perhaps using sigprocmask() to create a critical section around pthread_create in os_bsd.cpp. This is something we might want to do later, but more time is then needed for falsifying / verifying the correct fix. > > Testing: jdk_jfr, stress testing > > Thanks > Markus > > PS Indirect barriers removed are explicitly re-inserted as per [JDK-8373485](https://bugs.openjdk.org/browse/JDK-8373485) This pull request has now been integrated. Changeset: b070367b Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/b070367bdf980ef1c257cab485927db39b544241 Stats: 62 lines in 1 file changed: 12 ins; 18 del; 32 mod 8373106: JFR suspend/resume deadlock on macOS in pthreads library Reviewed-by: egahlin ------------- PR: https://git.openjdk.org/jdk/pull/29178 From mgronlun at openjdk.org Tue Jan 13 19:58:48 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 13 Jan 2026 19:58:48 GMT Subject: [jdk26] RFR: 8373106: JFR suspend/resume deadlock on macOS in pthreads library Message-ID: 8373106: JFR suspend/resume deadlock on macOS in pthreads library ------------- Commit messages: - Backport b070367bdf980ef1c257cab485927db39b544241 Changes: https://git.openjdk.org/jdk/pull/29207/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29207&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373106 Stats: 62 lines in 1 file changed: 12 ins; 18 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/29207.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29207/head:pull/29207 PR: https://git.openjdk.org/jdk/pull/29207 From egahlin at openjdk.org Tue Jan 13 21:25:00 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 13 Jan 2026 21:25:00 GMT Subject: [jdk26] RFR: 8373106: JFR suspend/resume deadlock on macOS in pthreads library In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 19:50:57 GMT, Markus Gr?nlund wrote: > 8373106: JFR suspend/resume deadlock on macOS in pthreads library Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29207#pullrequestreview-3657990642 From mgronlun at openjdk.org Tue Jan 13 21:29:38 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 13 Jan 2026 21:29:38 GMT Subject: [jdk26] RFR: 8373106: JFR suspend/resume deadlock on macOS in pthreads library In-Reply-To: References:

Message-ID: On Tue, 13 Jan 2026 21:21:40 GMT, Erik Gahlin wrote: >> 8373106: JFR suspend/resume deadlock on macOS in pthreads library > > Marked as reviewed by egahlin (Reviewer). Thanks @egahlin for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29207#issuecomment-3746647223 From mgronlun at openjdk.org Tue Jan 13 21:31:29 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 13 Jan 2026 21:31:29 GMT Subject: [jdk26] Integrated: 8373106: JFR suspend/resume deadlock on macOS in pthreads library In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 19:50:57 GMT, Markus Gr?nlund wrote: > 8373106: JFR suspend/resume deadlock on macOS in pthreads library This pull request has now been integrated. Changeset: a45364a2 Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/a45364a28b058739eb58bea24a219d7816d042e6 Stats: 62 lines in 1 file changed: 12 ins; 18 del; 32 mod 8373106: JFR suspend/resume deadlock on macOS in pthreads library Reviewed-by: egahlin Backport-of: b070367bdf980ef1c257cab485927db39b544241 ------------- PR: https://git.openjdk.org/jdk/pull/29207 From egahlin at openjdk.org Wed Jan 14 00:16:11 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 14 Jan 2026 00:16:11 GMT Subject: [jdk26] Integrated: 8372321: TestBackToBackSensitive fails intermittently after JDK-8365972 In-Reply-To: <24Tldr_YSbD5O2w5hXET_zCFQb4G-ZwnVM9C4VdPykA=.3f29601b-14ac-47a0-b62f-744fb1095b69@github.com> References: <24Tldr_YSbD5O2w5hXET_zCFQb4G-ZwnVM9C4VdPykA=.3f29601b-14ac-47a0-b62f-744fb1095b69@github.com> Message-ID: On Tue, 13 Jan 2026 13:51:44 GMT, Erik Gahlin wrote: > 8372321: TestBackToBackSensitive fails intermittently after JDK-8365972 This pull request has now been integrated. Changeset: 1bf35d7b Author: Erik Gahlin URL: https://git.openjdk.org/jdk/commit/1bf35d7bd0a8771e8656800a613de6a01057fc38 Stats: 47 lines in 1 file changed: 21 ins; 14 del; 12 mod 8372321: TestBackToBackSensitive fails intermittently after JDK-8365972 Reviewed-by: mgronlun Backport-of: 556bddfd9439d1bad698ab5134317ce263a36b04 ------------- PR: https://git.openjdk.org/jdk/pull/29191 From mgronlun at openjdk.org Wed Jan 14 09:04:48 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 14 Jan 2026 09:04:48 GMT Subject: [jdk26] RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 18:14:09 GMT, Markus Gr?nlund wrote: > 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented Can you please also review the backport to 26 @YaSuenag ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29203#issuecomment-3748504204 From ysuenaga at openjdk.org Wed Jan 14 09:53:31 2026 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Wed, 14 Jan 2026 09:53:31 GMT Subject: [jdk26] RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: <4cVV_QKVhvDsK5TIJ6xmux4INP9msOEl0W4MyQU8SV4=.68ccb9b0-5968-467b-a5f4-9c902cdd3fb0@github.com> On Tue, 13 Jan 2026 18:14:09 GMT, Markus Gr?nlund wrote: > 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented Marked as reviewed by ysuenaga (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29203#pullrequestreview-3659821661 From mgronlun at openjdk.org Wed Jan 14 11:02:56 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 14 Jan 2026 11:02:56 GMT Subject: [jdk26] RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: <4cVV_QKVhvDsK5TIJ6xmux4INP9msOEl0W4MyQU8SV4=.68ccb9b0-5968-467b-a5f4-9c902cdd3fb0@github.com> References: <4cVV_QKVhvDsK5TIJ6xmux4INP9msOEl0W4MyQU8SV4=.68ccb9b0-5968-467b-a5f4-9c902cdd3fb0@github.com> Message-ID: <5Fl8Ynrvx4YIKLdJs85xf-6qk42Fkwt5XWsZXhF4caU=.e17f5327-6df9-4072-9d70-325aa5ef2249@github.com> On Wed, 14 Jan 2026 09:51:04 GMT, Yasumasa Suenaga wrote: >> 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented > > Marked as reviewed by ysuenaga (Reviewer). Thanks for your review @YaSuenag! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29203#issuecomment-3748984614 From mgronlun at openjdk.org Wed Jan 14 11:07:20 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 14 Jan 2026 11:07:20 GMT Subject: [jdk26] Integrated: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 18:14:09 GMT, Markus Gr?nlund wrote: > 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented This pull request has now been integrated. Changeset: f3bdee89 Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/f3bdee89ed1acd8a61989dd580f11ff184166520 Stats: 292 lines in 15 files changed: 219 ins; 18 del; 55 mod 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented Reviewed-by: ysuenaga Backport-of: f23752a75ee3d3af0853eff9c678d2496bb1cf58 ------------- PR: https://git.openjdk.org/jdk/pull/29203 From mdoerr at openjdk.org Wed Jan 14 12:07:27 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 14 Jan 2026 12:07:27 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 14:14:19 GMT, Markus Gr?nlund wrote: > Alternative for solving [JDK-8371014](https://bugs.openjdk.org/browse/JDK-8371014) > > Also includes a fix for [JDK-8373257](https://bugs.openjdk.org/browse/JDK-8373257) > > Testing: jdk_jfr, stress testing, manual testing with CrashOnOutOfMemoryError, tier1-6 Thanks for fixing and backporting it! I had taken a quick look and think it is good, but not a full review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29094#issuecomment-3749247796 From mgronlun at openjdk.org Wed Jan 14 15:19:27 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 14 Jan 2026 15:19:27 GMT Subject: RFR: 8374445: Fix -Wzero-as-null-pointer-constant warnings in JfrSet In-Reply-To: References: Message-ID: On Sat, 3 Jan 2026 08:21:15 GMT, Kim Barrett wrote: > Please review this change to fix JfrSet to avoid triggering > -Wzero-as-null-pointer-constant warnings when that warning is enabled. > > The old code uses an entry value with representation 0 to indicate the entry > doesn't have a value. It compares an entry value against literal 0 to check > for that. If the key type is a pointer type, this involves an implicit 0 => > null pointer constant conversion, so we get a warning when that warning is > enabled. > > Instead we initialize entry values to a value-initialized key, and compare > against a value-initialized key. This changes the (currently undocumented) > requirements on the key type. The key type is no longer required to be > trivially constructible (to permit memset-based initialization), but is now > required to be value-initializable. That's currently a wash, since all of the > in-use key types are fundamental types (traceid (u8) and Klass*). > > Testing: mach5 tier1-3 (tier3 is where most jfr tests are run) src/hotspot/share/jfr/utilities/jfrSet.hpp line 72: > 70: } > 71: for (unsigned i = 0; i < table_size; ++i) { > 72: ::new (&table[i]) K{}; Is this the new (placement pun intended) way to do a placement new, using the outer scope operator ::? Or is it because we don't know what Hotspot type this is? src/hotspot/share/jfr/utilities/jfrSet.hpp line 142: > 140: for (unsigned i = 0; i < old_table_size; ++i) { > 141: const K k = old_table[i]; > 142: if (k != K{}) { Are these K{}'s compile constant expressions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29022#discussion_r2690861905 PR Review Comment: https://git.openjdk.org/jdk/pull/29022#discussion_r2690859072 From markus.gronlund at oracle.com Wed Jan 14 18:15:21 2026 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Wed, 14 Jan 2026 18:15:21 +0000 Subject: Using JFR both with ZGC degrades application throughput In-Reply-To: <80f97dba0b628057de3b7cd2ef4c3bea@courriel.eco> References: <80f97dba0b628057de3b7cd2ef4c3bea@courriel.eco> Message-ID: Hi Fabrice, Thank you very much for reporting this and also for providing a great reproducer. We have made some progress towards understanding the problem space, at least. To help you continue with your demonstrations, explanations, and comparisons, I only need you to do the following: In the jdk/lib/jfr directory, there are two files that control the default and profile sets of JFR events: default.jfc and profile.jfc, respectively. false false 0 ns Turn off the jdk.OldObjectSample event by setting enabled to false. This effectively turns off JFRs capability to monitor memory leaks in the background. With this small change, you should be back on track for proper comparisons, also when using JFR. Let me know if you have any questions. We will be thinking about how to solve this properly. Cheers for now Regards Markus Confidential- Oracle Internal From: hotspot-jfr-dev On Behalf Of Fabrice Bibonne Sent: Monday, 12 January 2026 16:59 To: hotspot-jfr-dev at openjdk.org Subject: Re: Using JFR both with ZGC degrades application throughput Here is a unique source code file for the reproducer (the big String is generated when starting as you suggested). It changes a little the results but the run with zgc + jfr is still taking lot of time. Thanks you for having a look. Fabrice Le 2026-01-12 10:56, Erik Gahlin a ?crit : Hi Fabrice, Thanks for reporting! Could you post the source code for the reproducer here? The 36 MB file could probably be replaced with a String::repeat expression. JFR does use some memory, which could impact available heap and performance, although the degradation you?re seeing seems awfully high. Thanks Erik ________________________________________ From: hotspot-jfr-dev > on behalf of Fabrice Bibonne > Sent: Sunday, January 11, 2026 7:23 PM To: hotspot-jfr-dev at openjdk.org Subject: Using JFR both with ZGC degrades application throughput Hi all, I would like to report a case where starting jfr for an application running with zgc causes a significant throughput degradation (compared to when JFR is not started). My context : I was writing a little web app to illustrate a case where the use of ZGC gives a better throughput than with G1. I benchmarked with grafana k6 my application running with G1 and my application running with ZGC : the runs with ZGC gave better throughputs. I wanted to go a bit further in explanation so I began again my benchmarks with JFR to be able to illustrate GC gains in JMC. When I ran my web app with ZGC+JFR, I noticed a significant throughput degradation in my benchmark (which was not the case with G1+JFR). Although I did not measure an increase in overhead as such, I still wanted to report this issue because the degradation in throughput with JFR is such that it would not be usable as is on a production service. I wrote a little application (not a web one) to reproduce the problem : the application calls a little conversion service 200 times with random numbers in parallel (to be like a web app in charge and to pressure GC). The conversion service (a method named `convertNumberToWords`) convert the number in a String looking for the String in a Map with the number as th key. In order to instantiate and destroy many objects at each call, the map is built parsing a huge String at each call. Application ends after 200 calls. Here are the step to reproduce : 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact (be aware to be on branch jfr+zgc_impact) 2. Compile it (you must include numbers200k.zip in resources : it contains a 36 Mo text files whose contents are used to create the huge String variable) 3. in the root of repository : 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR` 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -XX:StartFlightRecording -classpath target/classes poc.java.perf.write.TestPerf #ZGC with JFR` 4. The real time of the second run (with JFR) will be considerably higher than that of the first I ran these tests on my laptop : - Dell Inc. Latitude 5591 - openSUSE Tumbleweed 20260108 - Kernel : 6.18.3-1-default (64-bit) - 12 ? Intel? Core? i7-8850H CPU @ 2.60GHz - RAM 16 Gio - openjdk version "25.0.1" 2025-10-21 - OpenJDK Runtime Environment (build 25.0.1+8-27) - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) - many tabs opened in firefox ! I also ran it in a container (eclipse-temurin:25) on my laptop and with a windows laptop and came to the same conclusions : here are the measurements from the container : | Run with | Real time (s) | |-----------|---------------| | ZGC alone | 7.473 | | ZGC + jfr | 25.075 | | G1 alone | 10.195 | | G1 + jfr | 10.450 | After all these tests I tried to run the app with an other profiler tool in order to understand where is the issue. I join the flamegraph when running jfr+zgc : for the worker threads of the ForkJoinPool of Stream, stack traces of a majority of samples have the same top lines : - PosixSemaphore::wait - ZPageAllocator::alloc_page_stall - ZPageAllocator::alloc_page_inner - ZPageAllocator::alloc_page So many thread seem to spent their time waiting in the method ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic tasks threads has also a few samples where it waits at ZPageAllocator::alloc_page_stall. I hope this will help you to find the issue. Thank you very much for reading this email until the end. I hope this is the good place for such a feedback. Let me know if I must report my problem elsewhere. Be free to ask me more questions if you need. Thank you all for this amazing tool ! -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.gronlund at oracle.com Wed Jan 14 18:30:15 2026 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Wed, 14 Jan 2026 18:30:15 +0000 Subject: Using JFR both with ZGC degrades application throughput In-Reply-To: References: <80f97dba0b628057de3b7cd2ef4c3bea@courriel.eco> Message-ID: Hi again, I just remembered we have improved our ergonomics over the years. Therefore, there is a much easier way for you to do this without configuring anything in the .jfc files: you can simply override event settings on the command line. [1] -XX:StartFlightRecording:jdk.OldObjectSample#enabled=false Way easier! Cheers Markus [1] https://egahlin.github.io/2022/05/31/improved-ergonomics.html Confidential- Oracle Internal From: hotspot-jfr-dev On Behalf Of Markus Gronlund Sent: Wednesday, 14 January 2026 19:15 To: Fabrice Bibonne Cc: hotspot-jfr-dev at openjdk.org Subject: RE: Using JFR both with ZGC degrades application throughput Hi Fabrice, Thank you very much for reporting this and also for providing a great reproducer. We have made some progress towards understanding the problem space, at least. To help you continue with your demonstrations, explanations, and comparisons, I only need you to do the following: In the jdk/lib/jfr directory, there are two files that control the default and profile sets of JFR events: default.jfc and profile.jfc, respectively. false false 0 ns Turn off the jdk.OldObjectSample event by setting enabled to false. This effectively turns off JFRs capability to monitor memory leaks in the background. With this small change, you should be back on track for proper comparisons, also when using JFR. Let me know if you have any questions. We will be thinking about how to solve this properly. Cheers for now Regards Markus Confidential- Oracle Internal From: hotspot-jfr-dev > On Behalf Of Fabrice Bibonne Sent: Monday, 12 January 2026 16:59 To: hotspot-jfr-dev at openjdk.org Subject: Re: Using JFR both with ZGC degrades application throughput Here is a unique source code file for the reproducer (the big String is generated when starting as you suggested). It changes a little the results but the run with zgc + jfr is still taking lot of time. Thanks you for having a look. Fabrice Le 2026-01-12 10:56, Erik Gahlin a ?crit : Hi Fabrice, Thanks for reporting! Could you post the source code for the reproducer here? The 36 MB file could probably be replaced with a String::repeat expression. JFR does use some memory, which could impact available heap and performance, although the degradation you?re seeing seems awfully high. Thanks Erik ________________________________________ From: hotspot-jfr-dev > on behalf of Fabrice Bibonne > Sent: Sunday, January 11, 2026 7:23 PM To: hotspot-jfr-dev at openjdk.org Subject: Using JFR both with ZGC degrades application throughput Hi all, I would like to report a case where starting jfr for an application running with zgc causes a significant throughput degradation (compared to when JFR is not started). My context : I was writing a little web app to illustrate a case where the use of ZGC gives a better throughput than with G1. I benchmarked with grafana k6 my application running with G1 and my application running with ZGC : the runs with ZGC gave better throughputs. I wanted to go a bit further in explanation so I began again my benchmarks with JFR to be able to illustrate GC gains in JMC. When I ran my web app with ZGC+JFR, I noticed a significant throughput degradation in my benchmark (which was not the case with G1+JFR). Although I did not measure an increase in overhead as such, I still wanted to report this issue because the degradation in throughput with JFR is such that it would not be usable as is on a production service. I wrote a little application (not a web one) to reproduce the problem : the application calls a little conversion service 200 times with random numbers in parallel (to be like a web app in charge and to pressure GC). The conversion service (a method named `convertNumberToWords`) convert the number in a String looking for the String in a Map with the number as th key. In order to instantiate and destroy many objects at each call, the map is built parsing a huge String at each call. Application ends after 200 calls. Here are the step to reproduce : 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact (be aware to be on branch jfr+zgc_impact) 2. Compile it (you must include numbers200k.zip in resources : it contains a 36 Mo text files whose contents are used to create the huge String variable) 3. in the root of repository : 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -classpath target/classes poc.java.perf.write.TestPerf #ZGC without JFR` 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops -XX:StartFlightRecording -classpath target/classes poc.java.perf.write.TestPerf #ZGC with JFR` 4. The real time of the second run (with JFR) will be considerably higher than that of the first I ran these tests on my laptop : - Dell Inc. Latitude 5591 - openSUSE Tumbleweed 20260108 - Kernel : 6.18.3-1-default (64-bit) - 12 ? Intel? Core? i7-8850H CPU @ 2.60GHz - RAM 16 Gio - openjdk version "25.0.1" 2025-10-21 - OpenJDK Runtime Environment (build 25.0.1+8-27) - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) - many tabs opened in firefox ! I also ran it in a container (eclipse-temurin:25) on my laptop and with a windows laptop and came to the same conclusions : here are the measurements from the container : | Run with | Real time (s) | |-----------|---------------| | ZGC alone | 7.473 | | ZGC + jfr | 25.075 | | G1 alone | 10.195 | | G1 + jfr | 10.450 | After all these tests I tried to run the app with an other profiler tool in order to understand where is the issue. I join the flamegraph when running jfr+zgc : for the worker threads of the ForkJoinPool of Stream, stack traces of a majority of samples have the same top lines : - PosixSemaphore::wait - ZPageAllocator::alloc_page_stall - ZPageAllocator::alloc_page_inner - ZPageAllocator::alloc_page So many thread seem to spent their time waiting in the method ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic tasks threads has also a few samples where it waits at ZPageAllocator::alloc_page_stall. I hope this will help you to find the issue. Thank you very much for reading this email until the end. I hope this is the good place for such a feedback. Let me know if I must report my problem elsewhere. Be free to ask me more questions if you need. Thank you all for this amazing tool ! -------------- next part -------------- An HTML attachment was scrubbed... URL: From fabrice.bibonne at courriel.eco Thu Jan 15 05:44:28 2026 From: fabrice.bibonne at courriel.eco (Fabrice Bibonne) Date: Thu, 15 Jan 2026 06:44:28 +0100 Subject: Using JFR both with ZGC degrades application throughput In-Reply-To: References: <80f97dba0b628057de3b7cd2ef4c3bea@courriel.eco> Message-ID: Hi, Yes turning off jdk.OldObjectSample event solved the issue : the real time execution of my sample with zgc and JFR recording with jdk.OldObjectSample turned off is now very close to that without JFR recording. Thank you very much. Best regards. Fabrice Le 2026-01-14 19:30, Markus Gronlund a ?crit : > Hi again, > > I just remembered we have improved our ergonomics over the years. > > Therefore, there is a much easier way for you to do this without > configuring anything in the .jfc files: you can simply override event > settings on the command line. [1] > > -XX:StartFlightRecording:jdk.OldObjectSample#enabled=false > > Way easier! > > Cheers > > Markus > > [1] https://egahlin.github.io/2022/05/31/improved-ergonomics.html [2] > > Confidential- Oracle Internal > > From: hotspot-jfr-dev On Behalf Of > Markus Gronlund > Sent: Wednesday, 14 January 2026 19:15 > To: Fabrice Bibonne > Cc: hotspot-jfr-dev at openjdk.org > Subject: RE: Using JFR both with ZGC degrades application throughput > > Hi Fabrice, > > Thank you very much for reporting this and also for providing a great > reproducer. > > We have made some progress towards understanding the problem space, at > least. > > To help you continue with your demonstrations, explanations, and > comparisons, I only need you to do the following: > > In the jdk/lib/jfr directory, there are two files that control the > default and profile sets of JFR events: default.jfc and profile.jfc, > respectively. > > > > false > > control="old-objects-stack-trace">false > > 0 ns > > > > Turn off the jdk.OldObjectSample event by setting enabled to false. > > This effectively turns off JFRs capability to monitor memory leaks in > the background. > > With this small change, you should be back on track for proper > comparisons, also when using JFR. > > Let me know if you have any questions. We will be thinking about how to > solve this properly. > > Cheers for now > > Regards > > Markus > > Confidential- Oracle Internal > > From: hotspot-jfr-dev On Behalf Of > Fabrice Bibonne > Sent: Monday, 12 January 2026 16:59 > To: hotspot-jfr-dev at openjdk.org > Subject: Re: Using JFR both with ZGC degrades application throughput > > Here is a unique source code file for the reproducer (the big String is > generated when starting as you suggested). It changes a little the > results but the run with zgc + jfr is still taking lot of time. > > Thanks you for having a look. > > Fabrice > > Le 2026-01-12 10:56, Erik Gahlin a ?crit : > >> Hi Fabrice, >> >> Thanks for reporting! >> >> Could you post the source code for the reproducer here? The 36 MB file >> could probably be replaced with a String::repeat expression. >> >> JFR does use some memory, which could impact available heap and >> performance, although the degradation you're seeing seems awfully >> high. >> >> Thanks >> Erik >> >> ________________________________________ >> From: hotspot-jfr-dev on behalf of >> Fabrice Bibonne >> Sent: Sunday, January 11, 2026 7:23 PM >> To: hotspot-jfr-dev at openjdk.org >> Subject: Using JFR both with ZGC degrades application throughput >> >> Hi all, >> >> I would like to report a case where starting jfr for an application >> running with zgc causes a significant throughput degradation (compared >> to when JFR is not started). >> >> My context : I was writing a little web app to illustrate a case where >> the use of ZGC gives a better throughput than with G1. I benchmarked >> with grafana k6 my application running with G1 and my application >> running with ZGC : the runs with ZGC gave better throughputs. I >> wanted to go a bit further in explanation so I began again my >> benchmarks with JFR to be able to illustrate GC gains in JMC. When I >> ran my web app with ZGC+JFR, I noticed a significant throughput >> degradation in my benchmark (which was not the case with G1+JFR). >> >> Although I did not measure an increase in overhead as such, I still >> wanted to report this issue because the degradation in throughput with >> JFR is such that it would not be usable as is on a production service. >> >> I wrote a little application (not a web one) to reproduce the problem >> : the application calls a little conversion service 200 times with >> random numbers in parallel (to be like a web app in charge and to >> pressure GC). The conversion service (a method named >> `convertNumberToWords`) convert the number in a String looking for the >> String in a Map with the number as th key. In order to instantiate and >> destroy many objects at each call, the map is built parsing a huge >> String at each call. Application ends after 200 calls. >> >> Here are the step to reproduce : >> 1. Clone https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact >> [1] (be aware to be on branch jfr+zgc_impact) >> 2. Compile it (you must include numbers200k.zip in resources : it >> contains a 36 Mo text files whose contents are used to create the huge >> String variable) >> 3. in the root of repository : >> 3a. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops >> -classpath target/classes poc.java.perf.write.TestPerf #ZGC without >> JFR` >> 3b. Run `time java -Xmx4g -XX:+UseZGC -XX:+UseCompressedOops >> -XX:StartFlightRecording -classpath target/classes >> poc.java.perf.write.TestPerf #ZGC with JFR` >> 4. The real time of the second run (with JFR) will be considerably >> higher than that of the first >> >> I ran these tests on my laptop : >> - Dell Inc. Latitude 5591 >> - openSUSE Tumbleweed 20260108 >> - Kernel : 6.18.3-1-default (64-bit) >> - 12 ? Intel(R) Core(tm) i7-8850H CPU @ 2.60GHz >> - RAM 16 Gio >> - openjdk version "25.0.1" 2025-10-21 >> - OpenJDK Runtime Environment (build 25.0.1+8-27) >> - OpenJDK 64-Bit Server VM (build 25.0.1+8-27, mixed mode, sharing) >> - many tabs opened in firefox ! >> >> I also ran it in a container (eclipse-temurin:25) on my laptop and >> with a windows laptop and came to the same conclusions : here are the >> measurements from the container : >> >> | Run with | Real time (s) | >> |-----------|---------------| >> | ZGC alone | 7.473 | >> | ZGC + jfr | 25.075 | >> | G1 alone | 10.195 | >> | G1 + jfr | 10.450 | >> >> After all these tests I tried to run the app with an other profiler >> tool in order to understand where is the issue. I join the flamegraph >> when running jfr+zgc : for the worker threads of the ForkJoinPool of >> Stream, stack traces of a majority of samples have the same top lines >> : >> - PosixSemaphore::wait >> - ZPageAllocator::alloc_page_stall >> - ZPageAllocator::alloc_page_inner >> - ZPageAllocator::alloc_page >> >> So many thread seem to spent their time waiting in the method >> ZPageAllocator::alloc_page_stall when the JFR is on. The JFR periodic >> tasks threads has also a few samples where it waits at >> ZPageAllocator::alloc_page_stall. I hope this will help you to find >> the issue. >> >> Thank you very much for reading this email until the end. I hope this >> is the good place for such a feedback. Let me know if I must report my >> problem elsewhere. Be free to ask me more questions if you need. >> >> Thank you all for this amazing tool ! Links: ------ [1] https://framagit.org/FBibonne/poc-java/-/tree/jfr+zgc_impact [2] https://egahlin.github.io/2022/05/31/improved-ergonomics.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuefe at openjdk.org Thu Jan 15 07:51:36 2026 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 15 Jan 2026 07:51:36 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v7] In-Reply-To: References:

Message-ID: On Thu, 18 Dec 2025 10:11:20 GMT, Thomas Stuefe wrote: >> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. >> >> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. >> >> This RFE changes the algorithm to be non-recursive. >> >> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. >> >> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. >> >> Testing: >> >> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out >> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical >> - Ran locally all jtreg tests in jdk/jfr >> - GHAs > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > do strides for arrays I close this PR in favor of opening a fresh one later using a different approach ------------- PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3753292176 From stuefe at openjdk.org Thu Jan 15 07:51:38 2026 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 15 Jan 2026 07:51:38 GMT Subject: Withdrawn: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 15:54:04 GMT, Thomas Stuefe wrote: > A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. > > We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. > > This RFE changes the algorithm to be non-recursive. > > Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. > > Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. > > Testing: > > - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out > - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical > - Ran locally all jtreg tests in jdk/jfr > - GHAs This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/28659 From kbarrett at openjdk.org Thu Jan 15 16:24:45 2026 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 15 Jan 2026 16:24:45 GMT Subject: RFR: 8374445: Fix -Wzero-as-null-pointer-constant warnings in JfrSet In-Reply-To: References:

Message-ID: On Wed, 14 Jan 2026 15:11:59 GMT, Markus Gr?nlund wrote: >> Please review this change to fix JfrSet to avoid triggering >> -Wzero-as-null-pointer-constant warnings when that warning is enabled. >> >> The old code uses an entry value with representation 0 to indicate the entry >> doesn't have a value. It compares an entry value against literal 0 to check >> for that. If the key type is a pointer type, this involves an implicit 0 => >> null pointer constant conversion, so we get a warning when that warning is >> enabled. >> >> Instead we initialize entry values to a value-initialized key, and compare >> against a value-initialized key. This changes the (currently undocumented) >> requirements on the key type. The key type is no longer required to be >> trivially constructible (to permit memset-based initialization), but is now >> required to be value-initializable. That's currently a wash, since all of the >> in-use key types are fundamental types (traceid (u8) and Klass*). >> >> Testing: mach5 tier1-3 (tier3 is where most jfr tests are run) > > src/hotspot/share/jfr/utilities/jfrSet.hpp line 72: > >> 70: } >> 71: for (unsigned i = 0; i < table_size; ++i) { >> 72: ::new (&table[i]) K{}; > > Is this the new (placement pun intended) way to do a placement new, using the outer scope operator ::? Or is it because we don't know what Hotspot type this is? It's the same old way one should always do it. If one wants global placement new, one should say so. An unqualified `new` expression does a class-based lookup of `operator new`, so if the class has one (and lots of ours do), that will be used. We don't want that here, regardless of the type of `K`. As it happens, for all current uses `K` is a fundamental type, so it doesn't matter. But it's clearer and future proof to be explicit. > src/hotspot/share/jfr/utilities/jfrSet.hpp line 142: > >> 140: for (unsigned i = 0; i < old_table_size; ++i) { >> 141: const K k = old_table[i]; >> 142: if (k != K{}) { > > Are these K{}'s compile constant expressions? For the types currently used, yes. This is a "value-initialized" (C++17 11.6/8) temporary. For fundamental types, that's a zero-initialized temporary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29022#discussion_r2695057339 PR Review Comment: https://git.openjdk.org/jdk/pull/29022#discussion_r2695088728 From kbarrett at openjdk.org Thu Jan 15 16:50:14 2026 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 15 Jan 2026 16:50:14 GMT Subject: RFR: 8374445: Fix -Wzero-as-null-pointer-constant warnings in JfrSet [v2] In-Reply-To: References: Message-ID: <90e6X30EBPUPpUf2EebvJD4TLqA0NZ0bE5vF_ib69Fs=.65981273-2420-428c-ab0d-ae8ee5548a85@github.com> > Please review this change to fix JfrSet to avoid triggering > -Wzero-as-null-pointer-constant warnings when that warning is enabled. > > The old code uses an entry value with representation 0 to indicate the entry > doesn't have a value. It compares an entry value against literal 0 to check > for that. If the key type is a pointer type, this involves an implicit 0 => > null pointer constant conversion, so we get a warning when that warning is > enabled. > > Instead we initialize entry values to a value-initialized key, and compare > against a value-initialized key. This changes the (currently undocumented) > requirements on the key type. The key type is no longer required to be > trivially constructible (to permit memset-based initialization), but is now > required to be value-initializable. That's currently a wash, since all of the > in-use key types are fundamental types (traceid (u8) and Klass*). > > Testing: mach5 tier1-3 (tier3 is where most jfr tests are run) Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into jfrset-zero-as-null-pointer-warnings - fix -Wzero-as-null-poniter-constant warnings in jfrSet.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29022/files - new: https://git.openjdk.org/jdk/pull/29022/files/d2ee55ab..d54334e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29022&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29022&range=00-01 Stats: 55051 lines in 1117 files changed: 27415 ins; 10797 del; 16839 mod Patch: https://git.openjdk.org/jdk/pull/29022.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29022/head:pull/29022 PR: https://git.openjdk.org/jdk/pull/29022 From mgronlun at openjdk.org Thu Jan 15 17:12:28 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 15 Jan 2026 17:12:28 GMT Subject: RFR: 8374445: Fix -Wzero-as-null-pointer-constant warnings in JfrSet [v2] In-Reply-To: <90e6X30EBPUPpUf2EebvJD4TLqA0NZ0bE5vF_ib69Fs=.65981273-2420-428c-ab0d-ae8ee5548a85@github.com> References: <90e6X30EBPUPpUf2EebvJD4TLqA0NZ0bE5vF_ib69Fs=.65981273-2420-428c-ab0d-ae8ee5548a85@github.com> Message-ID: On Thu, 15 Jan 2026 16:50:14 GMT, Kim Barrett wrote: >> Please review this change to fix JfrSet to avoid triggering >> -Wzero-as-null-pointer-constant warnings when that warning is enabled. >> >> The old code uses an entry value with representation 0 to indicate the entry >> doesn't have a value. It compares an entry value against literal 0 to check >> for that. If the key type is a pointer type, this involves an implicit 0 => >> null pointer constant conversion, so we get a warning when that warning is >> enabled. >> >> Instead we initialize entry values to a value-initialized key, and compare >> against a value-initialized key. This changes the (currently undocumented) >> requirements on the key type. The key type is no longer required to be >> trivially constructible (to permit memset-based initialization), but is now >> required to be value-initializable. That's currently a wash, since all of the >> in-use key types are fundamental types (traceid (u8) and Klass*). >> >> Testing: mach5 tier1-3 (tier3 is where most jfr tests are run) > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into jfrset-zero-as-null-pointer-warnings > - fix -Wzero-as-null-poniter-constant warnings in jfrSet.hpp Look good, thanks Kim. ------------- Marked as reviewed by mgronlun (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29022#pullrequestreview-3666675208 From mgronlun at openjdk.org Thu Jan 15 17:12:41 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 15 Jan 2026 17:12:41 GMT Subject: RFR: 8374445: Fix -Wzero-as-null-pointer-constant warnings in JfrSet [v2] In-Reply-To: References:

Message-ID: