From mbaesken at openjdk.org Mon Feb 2 12:12:05 2026 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Feb 2026 12:12:05 GMT Subject: RFR: 8376889: Enhance JfrRecorder::on_create_vm_3() assert output Message-ID: There is an assert in JfrRecorder::on_create_vm_3 checking for JVMTI_PHASE_LIVE, but in case another phase is found it is not printed. This should be improved. ------------- Commit messages: - JDK-8376889 Changes: https://git.openjdk.org/jdk/pull/29521/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29521&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8376889 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29521/head:pull/29521 PR: https://git.openjdk.org/jdk/pull/29521 From mdoerr at openjdk.org Mon Feb 2 20:34:23 2026 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Feb 2026 20:34:23 GMT Subject: RFR: 8376889: Enhance JfrRecorder::on_create_vm_3() assert output In-Reply-To: References: Message-ID: On Mon, 2 Feb 2026 12:05:25 GMT, Matthias Baesken wrote: > There is an assert in JfrRecorder::on_create_vm_3 checking for JVMTI_PHASE_LIVE, but in case another phase is found it is not printed. > This should be improved. LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29521#pullrequestreview-3741600343 From asteiner at openjdk.org Tue Feb 3 08:42:52 2026 From: asteiner at openjdk.org (Andreas Steiner) Date: Tue, 3 Feb 2026 08:42:52 GMT Subject: RFR: 8376889: Enhance JfrRecorder::on_create_vm_3() assert output In-Reply-To: References: Message-ID: On Mon, 2 Feb 2026 12:05:25 GMT, Matthias Baesken wrote: > There is an assert in JfrRecorder::on_create_vm_3 checking for JVMTI_PHASE_LIVE, but in case another phase is found it is not printed. > This should be improved. Marked as reviewed by asteiner (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/29521#pullrequestreview-3743861061 From mgronlun at openjdk.org Tue Feb 3 08:42:51 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 3 Feb 2026 08:42:51 GMT Subject: RFR: 8376889: Enhance JfrRecorder::on_create_vm_3() assert output In-Reply-To: References: Message-ID: On Mon, 2 Feb 2026 12:05:25 GMT, Matthias Baesken wrote: > There is an assert in JfrRecorder::on_create_vm_3 checking for JVMTI_PHASE_LIVE, but in case another phase is found it is not printed. > This should be improved. Marked as reviewed by mgronlun (Reviewer). Ideally, a mapping of the enum to its string representation would be better. But I checked, and none exists, and I don't think its worth creating only for this extremely strange corner case. ------------- PR Review: https://git.openjdk.org/jdk/pull/29521#pullrequestreview-3743860439 PR Comment: https://git.openjdk.org/jdk/pull/29521#issuecomment-3839900536 From mbaesken at openjdk.org Tue Feb 3 12:02:42 2026 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 3 Feb 2026 12:02:42 GMT Subject: RFR: 8376889: Enhance JfrRecorder::on_create_vm_3() assert output In-Reply-To: References: Message-ID: On Mon, 2 Feb 2026 12:05:25 GMT, Matthias Baesken wrote: > There is an assert in JfrRecorder::on_create_vm_3 checking for JVMTI_PHASE_LIVE, but in case another phase is found it is not printed. > This should be improved. Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29521#issuecomment-3840883252 From mbaesken at openjdk.org Tue Feb 3 12:02:43 2026 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 3 Feb 2026 12:02:43 GMT Subject: Integrated: 8376889: Enhance JfrRecorder::on_create_vm_3() assert output In-Reply-To: References: Message-ID: <_ZFX5Zmund0CIygD_vhv1MY5-4-dGPwx-Ibt974cRiU=.2c86cde8-cb52-4c57-98e1-d5cc0af6bf41@github.com> On Mon, 2 Feb 2026 12:05:25 GMT, Matthias Baesken wrote: > There is an assert in JfrRecorder::on_create_vm_3 checking for JVMTI_PHASE_LIVE, but in case another phase is found it is not printed. > This should be improved. This pull request has now been integrated. Changeset: a5b4c079 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/a5b4c0795d88db3d02d31fb4740612c6a53f7204 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8376889: Enhance JfrRecorder::on_create_vm_3() assert output Reviewed-by: mdoerr, mgronlun, asteiner ------------- PR: https://git.openjdk.org/jdk/pull/29521 From egahlin at openjdk.org Tue Feb 3 14:08:54 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 3 Feb 2026 14:08:54 GMT Subject: RFR: 8373096: JFR: Path-to-gc-roots search should be non-recursive [v6] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 14:57:04 GMT, Thomas Stuefe wrote: >> This is a continuation - second attempt - of https://github.com/openjdk/jdk/pull/28659. >> >> ---- >> >> A customer reported a native stack overflow when producing a JFR recording with path-to-gc-roots=true. This happens regularly, see similar cases in JBS (e.g. https://bugs.openjdk.org/browse/JDK-8371630, https://bugs.openjdk.org/browse/JDK-8282427 etc). >> >> We limit the maximum graph search depth (DFSClosure::max_dfs_depth) to prevent stack overflows. That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. >> >> This patch rewrites the DFS heap tracer to be non-recursive. This is mostly textbook stuff, but the devil is in the details. Nevertheless, the algorithm should be a straightforward read. >> >> ### Memory usage of old vs new algorithm: >> >> The new algorithm uses, on average, a bit less memory than the old one. The old algorithm did cost ((avg stackframe size in bytes) * depth). As we have seen, e.g., in JDK-8371630, a depth of 3200 can max out ~1MB of stack space. >> >> The new algorithm costs ((avg number of outgoing refs per instanceKlass oop) * depth * 16. For a depth of 3200, we get typical probe stack sizes of 100KB..200KB. But we also cap probestack size, similar to how we cap the max. graph depth. >> >> In any case, these numbers are nothing to worry about. For a more in-depth explanation about memory cost, please see the comment in dfsClosure.cpp. >> >> ### Possible improvements/simplifications in the future: >> >> DFS works perfectly well alone now. It no longer depends on stack size, and its memory usage is typically smaller than BFS. IMHO, it would be perfectly fine to get rid of BFS and rely solely on the non-recursive DFS. The benefit would be a decrease in complexity and fewer tests to run and maintain. It should also be easy to convert into a parallelized version later. >> >> I kept the _max_dfs_depth_ parameter for now, but tbh it is no longer very useful. Before, it prevented stack overflows. Now, it is just an indirect way to limit probe stack size. But we also explicitly cap the probe stack size, so _max_dfs_depth_ is redundant. Removing it would require changing the statically allocated reference stack to be dynamically allocated, but that should not be difficult. >> >> ### Observable differences >> >> There is one observable side effect to the changed a... > > Thomas Stuefe has updated the pull request incrementally with three additional commits since the last revision: > > - remove unnecessary copyright change > - remove debug output > - Erics test suggestions There is still a log_info in DFSClosure destructor that should be log_debug. I wonder if you have looked at performance. For example, in which order is it best to check for nullptr, oop has been visited or whether the stack is full? You added a _num_objects_processed variable, but it?s never used, so you might want to remove it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29382#issuecomment-3841536590 From egahlin at openjdk.org Tue Feb 3 14:25:46 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 3 Feb 2026 14:25:46 GMT Subject: RFR: 8373096: JFR: Path-to-gc-roots search should be non-recursive [v6] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 14:57:04 GMT, Thomas Stuefe wrote: >> This is a continuation - second attempt - of https://github.com/openjdk/jdk/pull/28659. >> >> ---- >> >> A customer reported a native stack overflow when producing a JFR recording with path-to-gc-roots=true. This happens regularly, see similar cases in JBS (e.g. https://bugs.openjdk.org/browse/JDK-8371630, https://bugs.openjdk.org/browse/JDK-8282427 etc). >> >> We limit the maximum graph search depth (DFSClosure::max_dfs_depth) to prevent stack overflows. That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. >> >> This patch rewrites the DFS heap tracer to be non-recursive. This is mostly textbook stuff, but the devil is in the details. Nevertheless, the algorithm should be a straightforward read. >> >> ### Memory usage of old vs new algorithm: >> >> The new algorithm uses, on average, a bit less memory than the old one. The old algorithm did cost ((avg stackframe size in bytes) * depth). As we have seen, e.g., in JDK-8371630, a depth of 3200 can max out ~1MB of stack space. >> >> The new algorithm costs ((avg number of outgoing refs per instanceKlass oop) * depth * 16. For a depth of 3200, we get typical probe stack sizes of 100KB..200KB. But we also cap probestack size, similar to how we cap the max. graph depth. >> >> In any case, these numbers are nothing to worry about. For a more in-depth explanation about memory cost, please see the comment in dfsClosure.cpp. >> >> ### Possible improvements/simplifications in the future: >> >> DFS works perfectly well alone now. It no longer depends on stack size, and its memory usage is typically smaller than BFS. IMHO, it would be perfectly fine to get rid of BFS and rely solely on the non-recursive DFS. The benefit would be a decrease in complexity and fewer tests to run and maintain. It should also be easy to convert into a parallelized version later. >> >> I kept the _max_dfs_depth_ parameter for now, but tbh it is no longer very useful. Before, it prevented stack overflows. Now, it is just an indirect way to limit probe stack size. But we also explicitly cap the probe stack size, so _max_dfs_depth_ is redundant. Removing it would require changing the statically allocated reference stack to be dynamically allocated, but that should not be difficult. >> >> ### Observable differences >> >> There is one observable side effect to the changed a... > > Thomas Stuefe has updated the pull request incrementally with three additional commits since the last revision: > > - remove unnecessary copyright change > - remove debug output > - Erics test suggestions For the future, I think we want to keep BFS. Initially, I only had DFS, but the chains became so weird that I had to implement BFS. Regarding ordering, I think we want an order that makes the most sense to the user. ClassLoader is easier for users to understand than Global Object Handle. I'm not sure if that code is still present, but we had a specific order in which we processed roots. I think BFS will be able to cover most of the heap, and if there is a "linked list" where DFS would need to take over, the order is unlikely to matter at that point. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29382#issuecomment-3841633851 From stuefe at openjdk.org Tue Feb 3 14:45:28 2026 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 3 Feb 2026 14:45:28 GMT Subject: RFR: 8373096: JFR: Path-to-gc-roots search should be non-recursive [v6] In-Reply-To: References: Message-ID: On Tue, 3 Feb 2026 14:05:51 GMT, Erik Gahlin wrote: > I wonder if you have looked at performance. For example, in which order is it best to check for nullptr, oop has been visited or whether the stack is full? You added a _num_objects_processed variable, but it?s never used, so you might want to remove it. Will do. > For the future, I think we want to keep BFS. Initially, I only had DFS, but the chains became so weird that I had to implement BFS. Sure; you are the maintainer, after all. > > Regarding ordering, I think we want an order that makes the most sense to the user. ClassLoader is easier for users to understand than Global Object Handle. I'm not sure if that code is still present, but we had a specific order in which we processed roots. Did you mean this? https://github.com/openjdk/jdk/blob/99bc98357dab78bef2cce7a10c98d13d1e5730e3/src/hotspot/share/jfr/leakprofiler/chains/rootSetClosure.cpp#L87-L97 I can reverse the order in there and thus get the (roughly) reversed order. I already tested that, but refrained from adding it to the patch. What do you think about printing the actual (first, first and second ...) objects that are referenced by the roots? I think that would often help a lot. It helped me a lot in understanding what happened. In fact, I just traced the whole chain during development, hence the logging added to add_chain(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/29382#issuecomment-3841744021 From egahlin at openjdk.org Wed Feb 4 12:46:44 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 4 Feb 2026 12:46:44 GMT Subject: RFR: 8373096: JFR: Path-to-gc-roots search should be non-recursive [v6] In-Reply-To: References: Message-ID: On Tue, 3 Feb 2026 14:42:03 GMT, Thomas Stuefe wrote: > Did you mean this? > https://github.com/openjdk/jdk/blob/99bc98357dab78bef2cce7a10c98d13d1e5730e3/src/hotspot/share/jfr/leakprofiler/chains/rootSetClosure.cpp#L87-L97 Yes, we start with classes first because that typically makes the most sense to users. The problem with starting with threads is that you can get odd chains if one thread holds references to other threads. Generally, stacks are harder to understand compared to an inner class (e.g. a listener) holding a reference to the outer class. > I can reverse the order in there and thus get the (roughly) reversed order. I already tested that, but refrained from adding it to the patch. I think we want to process roots in the same order as in BFS. > What do you think about printing the actual (first, first and second ...) objects that are referenced by the roots? I think that would often help a lot. It helped me a lot in understanding what happened. I have used 'jfr print --events OldObjectSample' when debugging issues. At one point, I had a JavaFX GUI that allowed me to explore chains, but I think JMC can do that today. I have not studied non-leaks that much.I don't have a strong opinion on what to show in trace mode, as long as it doesn't slow down product builds. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29382#issuecomment-3847232784 From stuefe at openjdk.org Thu Feb 5 10:31:37 2026 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 5 Feb 2026 10:31:37 GMT Subject: RFR: 8373096: JFR: Path-to-gc-roots search should be non-recursive [v7] In-Reply-To: References: Message-ID: > This is a continuation - second attempt - of https://github.com/openjdk/jdk/pull/28659. > > ---- > > A customer reported a native stack overflow when producing a JFR recording with path-to-gc-roots=true. This happens regularly, see similar cases in JBS (e.g. https://bugs.openjdk.org/browse/JDK-8371630, https://bugs.openjdk.org/browse/JDK-8282427 etc). > > We limit the maximum graph search depth (DFSClosure::max_dfs_depth) to prevent stack overflows. That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. > > This patch rewrites the DFS heap tracer to be non-recursive. This is mostly textbook stuff, but the devil is in the details. Nevertheless, the algorithm should be a straightforward read. > > ### Memory usage of old vs new algorithm: > > The new algorithm uses, on average, a bit less memory than the old one. The old algorithm did cost ((avg stackframe size in bytes) * depth). As we have seen, e.g., in JDK-8371630, a depth of 3200 can max out ~1MB of stack space. > > The new algorithm costs ((avg number of outgoing refs per instanceKlass oop) * depth * 16. For a depth of 3200, we get typical probe stack sizes of 100KB..200KB. But we also cap probestack size, similar to how we cap the max. graph depth. > > In any case, these numbers are nothing to worry about. For a more in-depth explanation about memory cost, please see the comment in dfsClosure.cpp. > > ### Possible improvements/simplifications in the future: > > DFS works perfectly well alone now. It no longer depends on stack size, and its memory usage is typically smaller than BFS. IMHO, it would be perfectly fine to get rid of BFS and rely solely on the non-recursive DFS. The benefit would be a decrease in complexity and fewer tests to run and maintain. It should also be easy to convert into a parallelized version later. > > I kept the _max_dfs_depth_ parameter for now, but tbh it is no longer very useful. Before, it prevented stack overflows. Now, it is just an indirect way to limit probe stack size. But we also explicitly cap the probe stack size, so _max_dfs_depth_ is redundant. Removing it would require changing the statically allocated reference stack to be dynamically allocated, but that should not be difficult. > > ### Observable differences > > There is one observable side effect to the changed algorithm. The non-recursive algorithm processes oops a... Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 31 additional commits since the last revision: - nullness check first - Merge branch 'master' into JFR-leak-profiler-path-to-gc-roots-non-recursive-take2-with-tracing - Merge branch 'master' into JFR-leak-profiler-path-to-gc-roots-non-recursive-take2-with-tracing - remove unnecessary copyright change - remove debug output - Erics test suggestions - remove unnecessary diff - copyrights - erics test remarks - different ul log tag - ... and 21 more: https://git.openjdk.org/jdk/compare/2817ec85...df13e722 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29382/files - new: https://git.openjdk.org/jdk/pull/29382/files/957e001a..df13e722 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29382&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29382&range=05-06 Stats: 50301 lines in 667 files changed: 24375 ins; 16740 del; 9186 mod Patch: https://git.openjdk.org/jdk/pull/29382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29382/head:pull/29382 PR: https://git.openjdk.org/jdk/pull/29382 From stuefe at openjdk.org Thu Feb 5 10:31:38 2026 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 5 Feb 2026 10:31:38 GMT Subject: RFR: 8373096: JFR: Path-to-gc-roots search should be non-recursive [v6] In-Reply-To: References: Message-ID: On Tue, 3 Feb 2026 14:42:03 GMT, Thomas Stuefe wrote: > I wonder if you have looked at performance. For example, in which order is it best to check for nullptr, oop has been visited or whether the stack is full? You added a _num_objects_processed variable, but it?s never used, so you might want to remove it. It is better to check if the stack is full first. 10 mio references take (median user CPU time) about 10.7 seconds with mark bit search first, 9.7 seconds with stack fullness check first. That makes sense since the stack fullness check is just comparing two integers, whereas the markbit query is more involved. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29382#issuecomment-3852622930 From stuefe at openjdk.org Thu Feb 5 10:44:58 2026 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 5 Feb 2026 10:44:58 GMT Subject: RFR: 8373096: JFR: Path-to-gc-roots search should be non-recursive [v6] In-Reply-To: References: Message-ID: On Thu, 5 Feb 2026 10:27:19 GMT, Thomas Stuefe wrote: > > I wonder if you have looked at performance. For example, in which order is it best to check for nullptr, oop has been visited or whether the stack is full? You added a _num_objects_processed variable, but it?s never used, so you might want to remove it. > > It is better to check if the stack is full first. 10 mio references take (median user CPU time) about 10.7 seconds with mark bit search first, 9.7 seconds with stack fullness check first. That makes sense since the stack fullness check is just comparing two integers, whereas the markbit query is more involved. And note that the new solution is faster than the old recursive one, which takes about 11.5 seconds user CPU time on my machine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29382#issuecomment-3852705760 From egahlin at openjdk.org Mon Feb 9 21:27:03 2026 From: egahlin at openjdk.org (Erik Gahlin) Date: Mon, 9 Feb 2026 21:27:03 GMT Subject: RFR: 8373096: JFR: Path-to-gc-roots search should be non-recursive [v7] In-Reply-To: References: Message-ID: On Thu, 5 Feb 2026 10:31:37 GMT, Thomas Stuefe wrote: >> This is a continuation - second attempt - of https://github.com/openjdk/jdk/pull/28659. >> >> ---- >> >> A customer reported a native stack overflow when producing a JFR recording with path-to-gc-roots=true. This happens regularly, see similar cases in JBS (e.g. https://bugs.openjdk.org/browse/JDK-8371630, https://bugs.openjdk.org/browse/JDK-8282427 etc). >> >> We limit the maximum graph search depth (DFSClosure::max_dfs_depth) to prevent stack overflows. That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. >> >> This patch rewrites the DFS heap tracer to be non-recursive. This is mostly textbook stuff, but the devil is in the details. Nevertheless, the algorithm should be a straightforward read. >> >> ### Memory usage of old vs new algorithm: >> >> The new algorithm uses, on average, a bit less memory than the old one. The old algorithm did cost ((avg stackframe size in bytes) * depth). As we have seen, e.g., in JDK-8371630, a depth of 3200 can max out ~1MB of stack space. >> >> The new algorithm costs ((avg number of outgoing refs per instanceKlass oop) * depth * 16. For a depth of 3200, we get typical probe stack sizes of 100KB..200KB. But we also cap probestack size, similar to how we cap the max. graph depth. >> >> In any case, these numbers are nothing to worry about. For a more in-depth explanation about memory cost, please see the comment in dfsClosure.cpp. >> >> ### Possible improvements/simplifications in the future: >> >> DFS works perfectly well alone now. It no longer depends on stack size, and its memory usage is typically smaller than BFS. IMHO, it would be perfectly fine to get rid of BFS and rely solely on the non-recursive DFS. The benefit would be a decrease in complexity and fewer tests to run and maintain. It should also be easy to convert into a parallelized version later. >> >> I kept the _max_dfs_depth_ parameter for now, but tbh it is no longer very useful. Before, it prevented stack overflows. Now, it is just an indirect way to limit probe stack size. But we also explicitly cap the probe stack size, so _max_dfs_depth_ is redundant. Removing it would require changing the statically allocated reference stack to be dynamically allocated, but that should not be difficult. >> >> ### Observable differences >> >> There is one observable side effect to the changed a... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 31 additional commits since the last revision: > > - nullness check first > - Merge branch 'master' into JFR-leak-profiler-path-to-gc-roots-non-recursive-take2-with-tracing > - Merge branch 'master' into JFR-leak-profiler-path-to-gc-roots-non-recursive-take2-with-tracing > - remove unnecessary copyright change > - remove debug output > - Erics test suggestions > - remove unnecessary diff > - copyrights > - erics test remarks > - different ul log tag > - ... and 21 more: https://git.openjdk.org/jdk/compare/2e778d3d...df13e722 src/hotspot/share/jfr/leakprofiler/chains/dfsClosure.cpp line 153: > 151: assert(_probe_stack.is_empty(), "We should have drained the probe stack?"); > 152: } > 153: log_info(jfr, system, oldobject)("DFS: objects processed: " UINT64_FORMAT "," log_trace or log_debug ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29382#discussion_r2782546225 From jaroslav.bachorik at datadoghq.com Tue Feb 10 12:00:37 2026 From: jaroslav.bachorik at datadoghq.com (=?UTF-8?Q?Jaroslav_Bachor=C3=ADk?=) Date: Tue, 10 Feb 2026 13:00:37 +0100 Subject: RFC: Display contextual event fields in jfr view command In-Reply-To: References: Message-ID: Hi Erik, Thanks for the reply. Please, bear in mind that this is just an exploratory RFC, nothing is set in stone. When we started talking about using JFR events for expressing context we also talked about having support in the 'jfr' tool that would be able to show the context in a concise manner. What is 'concise manner' is still not completely clear and hence this RFC to start the discussion about what would make sense from the end-user point of view. >From your reply it seems like having context support in the 'jfr' tool is not strictly necessary as long as there is JMC support? Am I reading this correctly? Cheers, Jaroslav On Mon, Jan 26, 2026 at 1:34?PM Erik Gahlin wrote: > Hi Jaroslav, > > The 'jfr print' command is meant for presentations, demos, and debugging. > It was never designed for application troubleshooting. The contextual > support added in JDK 25 was included to demonstrate to application > developers how the @Contextual annotation can be used, and to show third > parties how contextual support can be implemented using the > jdk.jfr.consumer API. > > The 'view' command, on the other hand, was designed for troubleshooting > and can be used on a live process, so it should not use excessive memory or > CPU. > > You added a command-line flag, --show-context, perhaps to prevent > additional overhead from contextual processing, but before adding flags, I > think it might be a good time to step back and think about how we best can > present contextual information to users. A flag is usually hard for users > to find. It would be better to add support in JMC, so users can discover > contexts and then drill deeper by clicking in the GUI. > > I'm also wondering if contextual support belongs in the query language. > It's not clear how columns of nested contexts should be identified. It may > be better to create something like FormRenderer that only handles event > types. > > We have also discussed adding a bit in the chunk header if a contextual > event has been emitted. This would allow a parser to have a fast path when > there are no contextual events. > > Thanks > Erik > ________________________________________ > From: hotspot-jfr-dev on behalf of > Jaroslav Bachor?k > Sent: Wednesday, January 21, 2026 10:03 PM > To: hotspot-jfr-dev > Subject: RFC: Display contextual event fields in jfr view command > > Hello, > > I'd like to propose adding context display support to the `jfr view` > command. This would allow users to see which @Contextual events were active > when other events occurred, without requiring any changes to the JFR > recording format or runtime. > > Background > > Back in 2021, there was a discussion on this list about adding a Recording > Context concept to JFR (thread starting at 2021-June/002777). Erik > suggested an alternative to modifying the event format: use dedicated > context events with begin/end markers and correlate them during recording > analysis. > > This proposal implements exactly that approach on the tooling side. When > users have events with @Contextual annotated fields (such as trace IDs, > span IDs, or request contexts), they can now view which contexts were > active during any event - all computed at analysis time from the existing > recording data. > --- > > Current State > > The `jfr print` command already supports displaying contextual events. > When printing events, it shows active context fields inline: > > jfr print recording.jfr > > jdk.ThreadSleep { > Context: Trace.traceId = "abc-123-def" > Context: Trace.service = "order-service" > startTime = 12:00:01.000 > duration = 50 ms > ... > } > > This works well for detailed event inspection, but the `jfr view` command > (which displays events in a tabular format) has no equivalent capability. > --- > > The Problem > > When using `jfr view` to analyze recordings from distributed systems, > users cannot see which contexts were active. The tabular format is often > preferred for scanning many events quickly, but without context information > users must: > > 1. Note the timestamp of the event of interest > 2. Switch to `jfr print` or manually search for overlapping contextual > events > 3. Match by thread ID to avoid cross-thread confusion > 4. Repeat for every event they want to analyze > > This breaks the workflow when trying to correlate events with their > contexts at scale. > --- > > Proposed Solution > > Add a `--show-context` flag to `jfr view` that automatically displays > contextual event fields as additional columns: > > jfr view --show-context jdk.ThreadSleep recording.jfr > > ThreadSleep > > Time Sleep Time Trace.traceId Trace.service > ---------------------------------------------------------------- > 12:00:01 50 ms abc-123-def order-service > 12:00:02 100 ms abc-123-def order-service > 12:00:03 25 ms N/A N/A > > The context matching rule is: a contextual event is active when > contextStart <= eventStart AND contextEnd >= eventStart. > > Users can optionally filter which context types to display: > > jfr view --show-context=Span,Trace WorkEvent recording.jfr > --- > > Why This Approach? > > 1. No runtime overhead - context correlation happens entirely at analysis > time > 2. No format changes - works with existing recordings that have > @Contextual events > 3. Backward compatible - recordings remain readable by older tools > 4. Flexible - users choose which contexts to display > 5. Proven pattern - based on the timeline approach already used in > PrettyWriter > --- > > [PoC] Implementation Notes > > The implementation tracks context per-thread using a timeline-based > approach similar to PrettyWriter.java. Events are buffered in a priority > queue ordered by timestamp. Contextual events contribute both start and end > timestamps, and active contexts are tracked per-thread to prevent > cross-thread leakage. Memory is bounded (~1M events) to handle large > recordings. Queries without --show-context bypass this entirely, so there's > no overhead for existing usage. > > I've also added support for referencing contextual fields in GROUP BY > clauses for the `jfr query` command (debug builds), enabling aggregation > queries like: > > SELECT COUNT(*), Trace.traceId FROM WorkEvent GROUP BY Trace.traceId > --- > > Questions for Discussion > > 1. Is the matching rule (contextStart <= eventStart) correct? An > alternative would be to require the event to fall entirely within the > context. > 2. Should there be a maximum number of context columns to prevent very > wide output? > 3. Is 1M events a reasonable buffer size? This balances memory (~100MB) > with accuracy for long-running contexts. > 4. The `jfr print` command already shows context - should there be a way > to disable it for consistency, or is the current always-on behavior correct? > > > I'd welcome feedback on the approach before proceeding further. > > Thanks, > > Jaroslacv > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsikstro at openjdk.org Wed Feb 11 12:41:16 2026 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 11 Feb 2026 12:41:16 GMT Subject: RFR: 8377665: JFR: Symbol table not setup for early class unloading Message-ID: Hello, We observe a crash when running a major GC very early, which unloads classes, calling into JFR which is not properly set up for class unloading this early. A simple fix for this is to move the creation of the symbol table to the early setup of JFR, which allows us to successfully call `JfrCheckpointManager::on_unloading_classes()`. Testing: * Crash reproducer no longer crashes * Running through Oracle's tier1-4 ------------- Commit messages: - 8377665: JFR: Symbol table not setup for early class unloading Changes: https://git.openjdk.org/jdk/pull/29672/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29672&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8377665 Stats: 8 lines in 1 file changed: 5 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29672/head:pull/29672 PR: https://git.openjdk.org/jdk/pull/29672 From eosterlund at openjdk.org Wed Feb 11 16:32:44 2026 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 11 Feb 2026 16:32:44 GMT Subject: RFR: 8377665: JFR: Symbol table not setup for early class unloading In-Reply-To: References: Message-ID: On Wed, 11 Feb 2026 12:31:24 GMT, Joel Sikstr?m wrote: > Hello, > > We observe a crash when running a major GC very early, which unloads classes, calling into JFR which is not properly set up for class unloading this early. A simple fix for this is to move the creation of the symbol table to the early setup of JFR, which allows us to successfully call `JfrCheckpointManager::on_unloading_classes()`. > > Testing: > * Crash reproducer no longer crashes > * Running through Oracle's tier1-4 Looks good! ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29672#pullrequestreview-3785901022 From mgronlun at openjdk.org Wed Feb 11 20:16:49 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 11 Feb 2026 20:16:49 GMT Subject: RFR: 8377665: JFR: Symbol table not setup for early class unloading In-Reply-To: References: Message-ID: On Wed, 11 Feb 2026 12:31:24 GMT, Joel Sikstr?m wrote: > Hello, > > We observe a crash when running a major GC very early, which unloads classes, calling into JFR which is not properly set up for class unloading this early. A simple fix for this is to move the creation of the symbol table to the early setup of JFR, which allows us to successfully call `JfrCheckpointManager::on_unloading_classes()`. > > Testing: > * Crash reproducer no longer crashes > * Running through Oracle's tier1-4 Please see the comment about _initial_type_set state. Looks good, Joel. Thanks for finding and fixing this. I stepped it through and found a minor bug we should fix (related to unloading). Can you also please add: diff --git a/src/hotspot/share/jfr/recorder/checkpoint/types/jfrTypeSet.cpp b/src/hotspot/share/jfr/recorder/checkpoint/types/jfrTypeSet.cpp index b1c502e17f8..3dd9ea41d3d 100644 --- a/src/hotspot/share/jfr/recorder/checkpoint/types/jfrTypeSet.cpp +++ b/src/hotspot/share/jfr/recorder/checkpoint/types/jfrTypeSet.cpp @@ -1265,7 +1265,7 @@ static size_t teardown() { JfrKlassUnloading::clear(); _artifacts->clear(); _initial_type_set = true; - } else { + } else if (is_initial_typeset_for_chunk()) { _initial_type_set = false; } Else unloading() will modify the _initial_type_set flag, which we don't want. Thanks Markus ------------- Changes requested by mgronlun (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29672#pullrequestreview-3787174383 PR Comment: https://git.openjdk.org/jdk/pull/29672#issuecomment-3886872090 From sjohanss at openjdk.org Thu Feb 12 08:29:35 2026 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 12 Feb 2026 08:29:35 GMT Subject: RFR: 8377665: JFR: Symbol table not setup for early class unloading In-Reply-To: References: Message-ID: On Wed, 11 Feb 2026 12:31:24 GMT, Joel Sikstr?m wrote: > Hello, > > We observe a crash when running a major GC very early, which unloads classes, calling into JFR which is not properly set up for class unloading this early. A simple fix for this is to move the creation of the symbol table to the early setup of JFR, which allows us to successfully call `JfrCheckpointManager::on_unloading_classes()`. > > Testing: > * Crash reproducer no longer crashes > * Running through Oracle's tier1-4 src/hotspot/share/jfr/recorder/jfrRecorder.cpp line 111: > 109: return false; > 110: } > 111: Could/should this be moved in under `is_started_on_commandline()` like for the check point manager above? >From what I understand, the only time we can get to the problematic case with an uninitialized table is when started on the command-line and it would then make sense to only do this initialization for this case. Or am I missing something? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29672#discussion_r2797415457 From jsikstro at openjdk.org Thu Feb 12 08:49:21 2026 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 12 Feb 2026 08:49:21 GMT Subject: RFR: 8377665: JFR: Symbol table not setup for early class unloading [v2] In-Reply-To: References: Message-ID: > Hello, > > We observe a crash when running a major GC very early, which unloads classes, calling into JFR which is not properly set up for class unloading this early. A simple fix for this is to move the creation of the symbol table to the early setup of JFR, which allows us to successfully call `JfrCheckpointManager::on_unloading_classes()`. > > Testing: > * Crash reproducer no longer crashes > * Running through Oracle's tier1-4 Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Markus _initial_type_set feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29672/files - new: https://git.openjdk.org/jdk/pull/29672/files/b1d5650f..788a3c40 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29672&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29672&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29672/head:pull/29672 PR: https://git.openjdk.org/jdk/pull/29672 From mgronlun at openjdk.org Thu Feb 12 11:21:50 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 12 Feb 2026 11:21:50 GMT Subject: RFR: 8377665: JFR: Symbol table not setup for early class unloading [v2] In-Reply-To: References: Message-ID: <2n5ckB-WJJeat5VNTgC50WSPPpEeffoabpOtJZzPIQo=.85f5885d-daa8-4427-bf34-0f1e93d00dd6@github.com> On Thu, 12 Feb 2026 08:49:21 GMT, Joel Sikstr?m wrote: >> Hello, >> >> We observe a crash when running a major GC very early, which unloads classes, calling into JFR which is not properly set up for class unloading this early. A simple fix for this is to move the creation of the symbol table to the early setup of JFR, which allows us to successfully call `JfrCheckpointManager::on_unloading_classes()`. >> >> Testing: >> * Crash reproducer no longer crashes >> * Running through Oracle's tier1-4 > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Markus _initial_type_set feedback Please hold of integration on this yet, we get a failure in test jdk/jfr/jvm/TestCreateNative.java (it tries to create and recreate JFR multiple times). Its good to run through all the JFR tests: open/test/jdk/:jdk_jfr ------------- PR Comment: https://git.openjdk.org/jdk/pull/29672#issuecomment-3890292936 PR Comment: https://git.openjdk.org/jdk/pull/29672#issuecomment-3890299739 From jsikstro at openjdk.org Thu Feb 12 11:21:51 2026 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 12 Feb 2026 11:21:51 GMT Subject: RFR: 8377665: JFR: Symbol table not setup for early class unloading [v2] In-Reply-To: <2n5ckB-WJJeat5VNTgC50WSPPpEeffoabpOtJZzPIQo=.85f5885d-daa8-4427-bf34-0f1e93d00dd6@github.com> References: <2n5ckB-WJJeat5VNTgC50WSPPpEeffoabpOtJZzPIQo=.85f5885d-daa8-4427-bf34-0f1e93d00dd6@github.com> Message-ID: On Thu, 12 Feb 2026 11:16:46 GMT, Markus Gr?nlund wrote: > Please hold of integration on this yet, we get a failure in test jdk/jfr/jvm/TestCreateNative.java (it tries to create and recreate JFR multiple times). Yes, I noticed that in my test run now as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29672#issuecomment-3890304785 From mgronlun at openjdk.org Thu Feb 12 13:48:20 2026 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 12 Feb 2026 13:48:20 GMT Subject: RFR: 8377665: JFR: Symbol table not setup for early class unloading [v2] In-Reply-To: References: Message-ID: On Thu, 12 Feb 2026 08:49:21 GMT, Joel Sikstr?m wrote: >> Hello, >> >> We observe a crash when running a major GC very early, which unloads classes, calling into JFR which is not properly set up for class unloading this early. A simple fix for this is to move the creation of the symbol table to the early setup of JFR, which allows us to successfully call `JfrCheckpointManager::on_unloading_classes()`. >> >> Testing: >> * Crash reproducer no longer crashes >> * Running through Oracle's tier1-4 > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Markus _initial_type_set feedback I am looking into how we would like to fix this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29672#issuecomment-3891059970 From jsikstro at openjdk.org Thu Feb 12 13:55:27 2026 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 12 Feb 2026 13:55:27 GMT Subject: RFR: 8377665: JFR: Symbol table not setup for early class unloading [v2] In-Reply-To: References: Message-ID: On Thu, 12 Feb 2026 13:45:47 GMT, Markus Gr?nlund wrote: > I am looking into how we would like to fix this. Awesome! Thank you for the help. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29672#issuecomment-3891092340 From dholmes at openjdk.org Fri Feb 13 06:52:01 2026 From: dholmes at openjdk.org (David Holmes) Date: Fri, 13 Feb 2026 06:52:01 GMT Subject: RFR: 8377798: Hotspot build on macOS aarch64 with unused-functions warning reports some unused functions In-Reply-To: References: Message-ID: On Thu, 12 Feb 2026 15:59:26 GMT, Matthias Baesken wrote: > We currently set a warning for unused functions for gcc and clang, but later disable it for clang in the libjvm build. > I checked why it might be disabled for clang and there are a few functions/methods reported as unused, probably we can remove some or all of those ? > > macOS aarch64 product build shows: > > > /myjdk/jdk/src/hotspot/share/jfr/recorder/checkpoint/jfrCheckpointManager.cpp:179:20: warning: unused function 'is_thread_local' [-Wunused-function] > static inline bool is_thread_local(ConstBufferPtr buffer) { > ^ > /myjdk/jdk/src/hotspot/share/jfr/recorder/checkpoint/jfrCheckpointManager.cpp:184:20: warning: unused function 'is_virtual_thread_local' [-Wunused-function] > static inline bool is_virtual_thread_local(ConstBufferPtr buffer) { > ^ > 2 warnings generated. > /myjdk/jdk/src/hotspot/share/jfr/support/jfrDeprecationManager.cpp:197:20: warning: unused function 'jfr_is_started_on_command_line' [-Wunused-function] > static inline bool jfr_is_started_on_command_line() { > ^ > 1 warning generated. > /myjdk/jdk/src/hotspot/share/jfr/periodic/sampling/jfrSampleRequest.cpp:61:20: warning: unused function 'sp_in_stack' [-Wunused-function] > static inline bool sp_in_stack(const JfrSampleRequest& request, JavaThread* jt) { > ^ > /myjdk/jdk/src/hotspot/share/jfr/periodic/sampling/jfrSampleRequest.cpp:69:20: warning: unused function 'update_interpreter_frame_sender_pc' [-Wunused-function] > static inline void update_interpreter_frame_sender_pc(JfrSampleRequest& request, intptr_t* fp) { > ^ > /myjdk/jdk/src/hotspot/share/jfr/periodic/sampling/jfrSampleRequest.cpp:79:23: warning: unused function 'interpreter_frame_return_address' [-Wunused-function] > static inline address interpreter_frame_return_address(const JfrSampleRequest& request) { > ^ > /myjdk/jdk/src/hotspot/share/jfr/periodic/sampling/jfrSampleRequest.cpp:93:20: warning: unused function 'update_frame_sender_sp' [-Wunused-function] > static inline void update_frame_sender_sp(JfrSampleRequest& request, intptr_t* fp) { > ^ > /myjdk/jdk/src/hotspot/share/jfr/periodic/sampling/jfrSampleRequest.cpp:101:20: warning: unused function 'update_sp' [-Wunused-function] > static inline void update_sp(JfrSampleRequest& request, int frame_size) { > ^ > /myjdk/jdk/src/hotspot/share/jfr/periodic/sampling/jfrSampleRequest.cpp:106:20: warning: unused function 'update_pc' [-Wunused-function] > static inline void update_pc(J... src/hotspot/os/posix/perfMemory_posix.cpp line 497: > 495: } > 496: > 497: #ifndef __APPLE__ Shouldn't this be a big `ifdef __APPLE__` before `get_user_name(uid_t uid) ` and this becomes the `#else` part? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29695#discussion_r2802590736 From mbaesken at openjdk.org Fri Feb 13 08:50:04 2026 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 13 Feb 2026 08:50:04 GMT Subject: RFR: 8377798: Hotspot build on macOS aarch64 with unused-functions warning reports some unused functions In-Reply-To: References: Message-ID: On Fri, 13 Feb 2026 06:49:25 GMT, David Holmes wrote: >> We currently set a warning for unused functions for gcc and clang, but later disable it for clang in the libjvm build. >> I checked why it might be disabled for clang and there are a few functions/methods reported as unused, probably we can remove some or all of those ? >> >> macOS aarch64 product build shows: >> >> >> /myjdk/jdk/src/hotspot/share/jfr/recorder/checkpoint/jfrCheckpointManager.cpp:179:20: warning: unused function 'is_thread_local' [-Wunused-function] >> static inline bool is_thread_local(ConstBufferPtr buffer) { >> ^ >> /myjdk/jdk/src/hotspot/share/jfr/recorder/checkpoint/jfrCheckpointManager.cpp:184:20: warning: unused function 'is_virtual_thread_local' [-Wunused-function] >> static inline bool is_virtual_thread_local(ConstBufferPtr buffer) { >> ^ >> 2 warnings generated. >> /myjdk/jdk/src/hotspot/share/jfr/support/jfrDeprecationManager.cpp:197:20: warning: unused function 'jfr_is_started_on_command_line' [-Wunused-function] >> static inline bool jfr_is_started_on_command_line() { >> ^ >> 1 warning generated. >> /myjdk/jdk/src/hotspot/share/jfr/periodic/sampling/jfrSampleRequest.cpp:61:20: warning: unused function 'sp_in_stack' [-Wunused-function] >> static inline bool sp_in_stack(const JfrSampleRequest& request, JavaThread* jt) { >> ^ >> /myjdk/jdk/src/hotspot/share/jfr/periodic/sampling/jfrSampleRequest.cpp:69:20: warning: unused function 'update_interpreter_frame_sender_pc' [-Wunused-function] >> static inline void update_interpreter_frame_sender_pc(JfrSampleRequest& request, intptr_t* fp) { >> ^ >> /myjdk/jdk/src/hotspot/share/jfr/periodic/sampling/jfrSampleRequest.cpp:79:23: warning: unused function 'interpreter_frame_return_address' [-Wunused-function] >> static inline address interpreter_frame_return_address(const JfrSampleRequest& request) { >> ^ >> /myjdk/jdk/src/hotspot/share/jfr/periodic/sampling/jfrSampleRequest.cpp:93:20: warning: unused function 'update_frame_sender_sp' [-Wunused-function] >> static inline void update_frame_sender_sp(JfrSampleRequest& request, intptr_t* fp) { >> ^ >> /myjdk/jdk/src/hotspot/share/jfr/periodic/sampling/jfrSampleRequest.cpp:101:20: warning: unused function 'update_sp' [-Wunused-function] >> static inline void update_sp(JfrSampleRequest& request, int frame_size) { >> ^ >> /myjdk/jdk/src/hotspot/share/jfr/periodic/sampling/jfrSampleRequest.cpp:106:20: warning: unused funct... > > src/hotspot/os/posix/perfMemory_posix.cpp line 497: > >> 495: } >> 496: >> 497: #ifndef __APPLE__ > > Shouldn't this be a big `ifdef __APPLE__` before `get_user_name(uid_t uid) ` and this becomes the `#else` part? Seems mmap_create_shared uses the one-parameter get_user_name across POSIX platforms, so we cannot do this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29695#discussion_r2802996000 From erik.gahlin at oracle.com Mon Feb 16 08:07:43 2026 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Mon, 16 Feb 2026 08:07:43 +0000 Subject: In-Process Data Redaction Message-ID: Hi, This is a follow-up to our earlier discussion about scrubbing: https://mail.openjdk.org/pipermail/hotspot-jfr-dev/2025-June/007574.html I implemented support for selectively redacting sensitive information in-process using two command-line options, redact-key and redact-argument. I disabled command-line arguments for the SystemProcess event entirely. It would probably be better to do this in a separate PR, but I included it here to make the binary file test pass. For more information, I created a JEP draft: https://bugs.openjdk.org/browse/JDK-8372760 A draft PR can be found here: https://github.com/openjdk/jdk/pull/29736 Erik From stuefe at openjdk.org Mon Feb 16 10:02:46 2026 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 16 Feb 2026 10:02:46 GMT Subject: RFR: 8373096: JFR: Path-to-gc-roots search should be non-recursive [v6] In-Reply-To: References: Message-ID: <3vvjymmqpb7HI9Fz0HDPfhPXBODY40gho1sOnQBh43Q=.10dfd250-7f32-4805-bd98-5d62ee969328@github.com> On Wed, 4 Feb 2026 12:43:54 GMT, Erik Gahlin wrote: >>> I wonder if you have looked at performance. For example, in which order is it best to check for nullptr, oop has been visited or whether the stack is full? You added a _num_objects_processed variable, but it?s never used, so you might want to remove it. >> >> Will do. >> >>> For the future, I think we want to keep BFS. Initially, I only had DFS, but the chains became so weird that I had to implement BFS. >> >> Sure; you are the maintainer, after all. >> >>> >>> Regarding ordering, I think we want an order that makes the most sense to the user. ClassLoader is easier for users to understand than Global Object Handle. I'm not sure if that code is still present, but we had a specific order in which we processed roots. >> >> Did you mean this? >> >> >> https://github.com/openjdk/jdk/blob/99bc98357dab78bef2cce7a10c98d13d1e5730e3/src/hotspot/share/jfr/leakprofiler/chains/rootSetClosure.cpp#L87-L97 >> >> I can reverse the order in there and thus get the (roughly) reversed order. I already tested that, but refrained from adding it to the patch. >> >> What do you think about printing the actual (first, first and second ...) objects that are referenced by the roots? I think that would often help a lot. It helped me a lot in understanding what happened. >> >> In fact, I just traced the whole chain during development, hence the logging added to add_chain(). > >> Did you mean this? >> https://github.com/openjdk/jdk/blob/99bc98357dab78bef2cce7a10c98d13d1e5730e3/src/hotspot/share/jfr/leakprofiler/chains/rootSetClosure.cpp#L87-L97 > > Yes, we start with classes first because that typically makes the most sense to users. The problem with starting with threads is that you can get odd chains if one thread holds references to other threads. Generally, stacks are harder to understand compared to an inner class (e.g. a listener) holding a reference to the outer class. > >> I can reverse the order in there and thus get the (roughly) reversed order. I already tested that, but refrained from adding it to the patch. > > I think we want to process roots in the same order as in BFS. > >> What do you think about printing the actual (first, first and second ...) objects that are referenced by the roots? I think that would often help a lot. It helped me a lot in understanding what happened. > > I have used 'jfr print --events OldObjectSample' when debugging issues. At one point, I had a JavaFX GUI that allowed me to explore chains, but I think JMC can do that today. I have not studied non-leaks that much.I don't have a strong opinion on what to show in trace mode, as long as it doesn't slow down product builds. @egahlin @roberttoyonaga I put this back into draft and propose a minimally invasive patch that "works well enough"; either until I get the non-recursiveness bulletproof for all corner cases or until we get rid of DFS. Reason: I keep running into very tricky corner cases in the BFS+DFS mixed mode at the handoff point to DFS caused by: - The fact that the new non-recursive DFS can stop at any point (in theory) if the DFS probe stack runs full; that creates a new stop condition that did not exist before. Previously, DFS would stop at a maximum stack depth >>> 1. Now, in theory, DFS can stop at recursion level 1. That means we cannot guarantee that the oop handed down to DFS is fully iterated. - That causes problems since the BFS->DFS handoff oop is iterated twice. Once up in BFS, once in DFS. If the handoff oop is a large object array and it does not get fully iterated by DFS, BFS will reenter DFS with the same oop. This causes the runtime to scale quadratically with the array size (similar to https://bugs.openjdk.org/browse/JDK-8373490). All of this is solvable; no solution appeals to me, and they all make the patch even more complex. In light of the impending DFS removal, I will pull this back to draft and propose a simpler patch (the main concern is to get a patch out the door quickly for our customer). ------------- PR Comment: https://git.openjdk.org/jdk/pull/29382#issuecomment-3907529456 From erik.gahlin at oracle.com Mon Feb 16 16:03:34 2026 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Mon, 16 Feb 2026 16:03:34 +0000 Subject: RFC: Display contextual event fields in jfr view command In-Reply-To: References: Message-ID: Hi Jaroslav, I think contexts can be useful in the JFR tool, but the biggest problem we face right now is a lack of discoverability. If people don?t know about the feature or can?t see how it can be used, they won?t create contextual events. JMC support is therefore much more important at this stage. It is also easier to use, since you can drill down, which is hard to do in the terminal. I imagine a Contexts page in JMC with a combo box at the top where users can select available contextual events, for example Order or Trace. Below is a table that lists all contextual events for that type and their field values, for example Order ID and Duration. Below that is another table that lists all events that happened during the context, e.g. Socket Read or Thread Park. Having contextual values in the Properties view and the Thread Graph tooltip would also be useful. If we had something like that, it would be much easier to convince people to create contextual events. On a personal level, my biggest concern with doing anything more advanced with respect to contextual events for the JFR tool is that I have never seen a real @Contextual event in practice. Designing command line flags or commands that will be around for decades is therefore hard. Erik ________________________________________ From: Jaroslav Bachor?k Sent: Tuesday, February 10, 2026 1:00 PM To: Erik Gahlin Cc: hotspot-jfr-dev Subject: [External] : Re: RFC: Display contextual event fields in jfr view command Hi Erik, Thanks for the reply. Please, bear in mind that this is just an exploratory RFC, nothing is set in stone. When we started talking about using JFR events for expressing context we also talked about having support in the 'jfr' tool that would be able to show the context in a concise manner. What is 'concise manner' is still not completely clear and hence this RFC to start the discussion about what would make sense from the end-user point of view. From your reply it seems like having context support in the 'jfr' tool is not strictly necessary as long as there is JMC support? Am I reading this correctly? Cheers, Jaroslav On Mon, Jan 26, 2026 at 1:34?PM Erik Gahlin > wrote: Hi Jaroslav, The 'jfr print' command is meant for presentations, demos, and debugging. It was never designed for application troubleshooting. The contextual support added in JDK 25 was included to demonstrate to application developers how the @Contextual annotation can be used, and to show third parties how contextual support can be implemented using the jdk.jfr.consumer API. The 'view' command, on the other hand, was designed for troubleshooting and can be used on a live process, so it should not use excessive memory or CPU. You added a command-line flag, --show-context, perhaps to prevent additional overhead from contextual processing, but before adding flags, I think it might be a good time to step back and think about how we best can present contextual information to users. A flag is usually hard for users to find. It would be better to add support in JMC, so users can discover contexts and then drill deeper by clicking in the GUI. I'm also wondering if contextual support belongs in the query language. It's not clear how columns of nested contexts should be identified. It may be better to create something like FormRenderer that only handles event types. We have also discussed adding a bit in the chunk header if a contextual event has been emitted. This would allow a parser to have a fast path when there are no contextual events. Thanks Erik ________________________________________ From: hotspot-jfr-dev > on behalf of Jaroslav Bachor?k > Sent: Wednesday, January 21, 2026 10:03 PM To: hotspot-jfr-dev Subject: RFC: Display contextual event fields in jfr view command Hello, I'd like to propose adding context display support to the `jfr view` command. This would allow users to see which @Contextual events were active when other events occurred, without requiring any changes to the JFR recording format or runtime. Background Back in 2021, there was a discussion on this list about adding a Recording Context concept to JFR (thread starting at 2021-June/002777). Erik suggested an alternative to modifying the event format: use dedicated context events with begin/end markers and correlate them during recording analysis. This proposal implements exactly that approach on the tooling side. When users have events with @Contextual annotated fields (such as trace IDs, span IDs, or request contexts), they can now view which contexts were active during any event - all computed at analysis time from the existing recording data. --- Current State The `jfr print` command already supports displaying contextual events. When printing events, it shows active context fields inline: jfr print recording.jfr jdk.ThreadSleep { Context: Trace.traceId = "abc-123-def" Context: Trace.service = "order-service" startTime = 12:00:01.000 duration = 50 ms ... } This works well for detailed event inspection, but the `jfr view` command (which displays events in a tabular format) has no equivalent capability. --- The Problem When using `jfr view` to analyze recordings from distributed systems, users cannot see which contexts were active. The tabular format is often preferred for scanning many events quickly, but without context information users must: 1. Note the timestamp of the event of interest 2. Switch to `jfr print` or manually search for overlapping contextual events 3. Match by thread ID to avoid cross-thread confusion 4. Repeat for every event they want to analyze This breaks the workflow when trying to correlate events with their contexts at scale. --- Proposed Solution Add a `--show-context` flag to `jfr view` that automatically displays contextual event fields as additional columns: jfr view --show-context jdk.ThreadSleep recording.jfr ThreadSleep Time Sleep Time Trace.traceId Trace.service ---------------------------------------------------------------- 12:00:01 50 ms abc-123-def order-service 12:00:02 100 ms abc-123-def order-service 12:00:03 25 ms N/A N/A The context matching rule is: a contextual event is active when contextStart <= eventStart AND contextEnd >= eventStart. Users can optionally filter which context types to display: jfr view --show-context=Span,Trace WorkEvent recording.jfr --- Why This Approach? 1. No runtime overhead - context correlation happens entirely at analysis time 2. No format changes - works with existing recordings that have @Contextual events 3. Backward compatible - recordings remain readable by older tools 4. Flexible - users choose which contexts to display 5. Proven pattern - based on the timeline approach already used in PrettyWriter --- [PoC] Implementation Notes The implementation tracks context per-thread using a timeline-based approach similar to PrettyWriter.java. Events are buffered in a priority queue ordered by timestamp. Contextual events contribute both start and end timestamps, and active contexts are tracked per-thread to prevent cross-thread leakage. Memory is bounded (~1M events) to handle large recordings. Queries without --show-context bypass this entirely, so there's no overhead for existing usage. I've also added support for referencing contextual fields in GROUP BY clauses for the `jfr query` command (debug builds), enabling aggregation queries like: SELECT COUNT(*), Trace.traceId FROM WorkEvent GROUP BY Trace.traceId --- Questions for Discussion 1. Is the matching rule (contextStart <= eventStart) correct? An alternative would be to require the event to fall entirely within the context. 2. Should there be a maximum number of context columns to prevent very wide output? 3. Is 1M events a reasonable buffer size? This balances memory (~100MB) with accuracy for long-running contexts. 4. The `jfr print` command already shows context - should there be a way to disable it for consistency, or is the current always-on behavior correct? I'd welcome feedback on the approach before proceeding further. Thanks, Jaroslacv From jsikstro at openjdk.org Tue Feb 17 09:31:00 2026 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 17 Feb 2026 09:31:00 GMT Subject: RFR: 8377665: JFR: Symbol table not setup for early class unloading [v2] In-Reply-To: References: Message-ID: On Thu, 12 Feb 2026 08:49:21 GMT, Joel Sikstr?m wrote: >> Hello, >> >> We observe a crash when running a major GC very early, which unloads classes, calling into JFR which is not properly set up for class unloading this early. A simple fix for this is to move the creation of the symbol table to the early setup of JFR, which allows us to successfully call `JfrCheckpointManager::on_unloading_classes()`. >> >> Testing: >> * Crash reproducer no longer crashes >> * Running through Oracle's tier1-4 > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Markus _initial_type_set feedback After offline discussion with Markus I'm closing this PR and handing the issue over to him after this turned out to require more work than anticipated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29672#issuecomment-3913329275 From jsikstro at openjdk.org Tue Feb 17 09:31:01 2026 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 17 Feb 2026 09:31:01 GMT Subject: Withdrawn: 8377665: JFR: Symbol table not setup for early class unloading In-Reply-To: References: Message-ID: On Wed, 11 Feb 2026 12:31:24 GMT, Joel Sikstr?m wrote: > Hello, > > We observe a crash when running a major GC very early, which unloads classes, calling into JFR which is not properly set up for class unloading this early. A simple fix for this is to move the creation of the symbol table to the early setup of JFR, which allows us to successfully call `JfrCheckpointManager::on_unloading_classes()`. > > Testing: > * Crash reproducer no longer crashes > * Running through Oracle's tier1-4 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/29672