From mgronlun at openjdk.org Mon Dec 1 12:17:21 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 1 Dec 2025 12:17:21 GMT Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for class loading [v7] In-Reply-To: References: Message-ID: > Greetings, > > this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event. > > To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements. > > Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths. > > Testing: jdk_jfr, manual AOT verification, stress testing > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: adjustments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28505/files - new: https://git.openjdk.org/jdk/pull/28505/files/c0e1124e..ba54d2af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=05-06 Stats: 11 lines in 4 files changed: 0 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/28505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505 PR: https://git.openjdk.org/jdk/pull/28505 From mgronlun at openjdk.org Mon Dec 1 20:36:37 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 1 Dec 2025 20:36:37 GMT Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for class loading [v8] In-Reply-To: References: Message-ID: > Greetings, > > this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event. > > To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements. > > Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths. > > Testing: jdk_jfr, manual AOT verification, stress testing > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: GROUP BY definedClass is redundant ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28505/files - new: https://git.openjdk.org/jdk/pull/28505/files/ba54d2af..9782480e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505 PR: https://git.openjdk.org/jdk/pull/28505 From mgronlun at openjdk.org Mon Dec 1 22:11:41 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 1 Dec 2025 22:11:41 GMT Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for class loading [v9] In-Reply-To: References: Message-ID: > Greetings, > > this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event. > > To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements. > > Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths. > > Testing: jdk_jfr, manual AOT verification, stress testing > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: Remove class definitions view in favor of jdk.ClassDefine event view ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28505/files - new: https://git.openjdk.org/jdk/pull/28505/files/9782480e..6a3a8c16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=07-08 Stats: 8 lines in 1 file changed: 0 ins; 8 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505 PR: https://git.openjdk.org/jdk/pull/28505 From mgronlun at openjdk.org Mon Dec 1 23:12:29 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 1 Dec 2025 23:12:29 GMT Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for class loading [v10] In-Reply-To: References: Message-ID: > Greetings, > > this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event. > > To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements. > > Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths. > > Testing: jdk_jfr, manual AOT verification, stress testing > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: correct trunctation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28505/files - new: https://git.openjdk.org/jdk/pull/28505/files/6a3a8c16..b659a814 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505 PR: https://git.openjdk.org/jdk/pull/28505 From dholmes at openjdk.org Tue Dec 2 02:24:47 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 2 Dec 2025 02:24:47 GMT Subject: RFR: 8372587: Put jdk/jfr/jvm/TestWaste.java into the ProblemList In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 16:04:56 GMT, Kerem Kat wrote: > It still fails in oracle environment as described in [JDK-8371630](https://bugs.openjdk.org/browse/JDK-8371630). Unfortunately this only went to the JFR mailing list so I did not see it till I tried to track it down. Even more unfortunately no-one on the JFR mailing list deigned to looked at this. The fix is incorrect - see comment. test/jdk/ProblemList.txt line 753: > 751: jdk/jfr/event/oldobject/TestShenandoah.java 8342951 generic-all > 752: jdk/jfr/event/runtime/TestResidentSetSizeEvent.java 8309846 aix-ppc64 > 753: jdk/jfr/jvm/TestWaste.java 8372587 generic-all Suggestion: jdk/jfr/jvm/TestWaste.java 8371630 generic-all You used the wrong bug id. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28539#pullrequestreview-3527885093 PR Review Comment: https://git.openjdk.org/jdk/pull/28539#discussion_r2579362343 From krk at openjdk.org Tue Dec 2 10:26:15 2025 From: krk at openjdk.org (Kerem Kat) Date: Tue, 2 Dec 2025 10:26:15 GMT Subject: RFR: 8372587: Put jdk/jfr/jvm/TestWaste.java into the ProblemList [v2] In-Reply-To: References: Message-ID: > It still fails in oracle environment as described in [JDK-8371630](https://bugs.openjdk.org/browse/JDK-8371630). Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: Update test/jdk/ProblemList.txt Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28539/files - new: https://git.openjdk.org/jdk/pull/28539/files/1d625192..ab4f9245 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28539&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28539&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28539/head:pull/28539 PR: https://git.openjdk.org/jdk/pull/28539 From krk at openjdk.org Tue Dec 2 10:26:18 2025 From: krk at openjdk.org (Kerem Kat) Date: Tue, 2 Dec 2025 10:26:18 GMT Subject: RFR: 8372587: Put jdk/jfr/jvm/TestWaste.java into the ProblemList [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 02:20:40 GMT, David Holmes wrote: >> Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/jdk/ProblemList.txt >> >> Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > > test/jdk/ProblemList.txt line 753: > >> 751: jdk/jfr/event/oldobject/TestShenandoah.java 8342951 generic-all >> 752: jdk/jfr/event/runtime/TestResidentSetSizeEvent.java 8309846 aix-ppc64 >> 753: jdk/jfr/jvm/TestWaste.java 8372587 generic-all > > Suggestion: > > jdk/jfr/jvm/TestWaste.java 8371630 generic-all > > You used the wrong bug id. Thanks, I did use the subtask id. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28539#discussion_r2580553213 From krk at openjdk.org Tue Dec 2 13:23:07 2025 From: krk at openjdk.org (Kerem Kat) Date: Tue, 2 Dec 2025 13:23:07 GMT Subject: RFR: 8372587: Put jdk/jfr/jvm/TestWaste.java into the ProblemList [v2] In-Reply-To: References: Message-ID: <71nBpRWdBsXEFaQMsne-_vwcwSOEoYOgRkQRpyRn6wI=.d39ad7b5-22eb-44ac-9027-2e3742480cc2@github.com> On Tue, 2 Dec 2025 10:26:15 GMT, Kerem Kat wrote: >> It still fails in oracle environment as described in [JDK-8371630](https://bugs.openjdk.org/browse/JDK-8371630). > > Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: > > Update test/jdk/ProblemList.txt > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> Failing test is unrelated: > TEST: gc/TestAllocHumongousFragment.java#generational Internal Error (/home/runner/work/jdk/jdk/src/hotspot/share/oops/compressedOops.inline.hpp:58), pid=8050, tid=8054 # assert(Universe::is_in_heap(result)) failed: object not in heap 0x00000000fc300000 # # JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-krk-ab4f92453ba582b6c94007ba80e74d6a025d20e5) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-krk-ab4f92453ba582b6c94007ba80e74d6a025d20e5, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, shenandoah gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x72fcfe] CompressedOops::decode_not_null(narrowOop)+0x13e ------------- PR Comment: https://git.openjdk.org/jdk/pull/28539#issuecomment-3602025405 From mgronlun at openjdk.org Tue Dec 2 15:59:00 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 2 Dec 2025 15:59:00 GMT Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for class loading [v11] In-Reply-To: References: Message-ID: > Greetings, > > this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event. > > To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements. > > Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths. > > Testing: jdk_jfr, manual AOT verification, stress testing > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28505/files - new: https://git.openjdk.org/jdk/pull/28505/files/b659a814..66e63a40 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=09-10 Stats: 10 lines in 2 files changed: 0 ins; 8 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505 PR: https://git.openjdk.org/jdk/pull/28505 From mgronlun at openjdk.org Tue Dec 2 16:11:17 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 2 Dec 2025 16:11:17 GMT Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for class loading [v12] In-Reply-To: References: Message-ID: > Greetings, > > this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event. > > To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements. > > Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths. > > Testing: jdk_jfr, manual AOT verification, stress testing > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: restore longest-class-loading view ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28505/files - new: https://git.openjdk.org/jdk/pull/28505/files/66e63a40..3a44c10e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=10-11 Stats: 6 lines in 1 file changed: 0 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505 PR: https://git.openjdk.org/jdk/pull/28505 From mgronlun at openjdk.org Tue Dec 2 16:50:47 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 2 Dec 2025 16:50:47 GMT Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for class loading [v13] In-Reply-To: References: Message-ID: <4EZHq7TmhEpxqD3WRm4D8LHVXPuJabCRjTXOi0pPtZE=.866ecc1e-b0be-4797-a6f9-2a434a087bda@github.com> > Greetings, > > this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event. > > To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements. > > Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths. > > Testing: jdk_jfr, manual AOT verification, stress testing > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: use strcmp instead of strncmp for equals ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28505/files - new: https://git.openjdk.org/jdk/pull/28505/files/3a44c10e..c2e36d2f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505 PR: https://git.openjdk.org/jdk/pull/28505 From duke at openjdk.org Tue Dec 2 16:55:03 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Tue, 2 Dec 2025 16:55:03 GMT Subject: RFR: 8370715: JFR: Races are possible when dumping recordings [v4] In-Reply-To: References: Message-ID: > #### Summary > This PR changes the JFR snapshot dumping code so that multiple JFR recordings (potentially from different processes) racing to write the same dump destination won't mix their data. > > #### Problem > The dump destination file is created and/or wiped when a recording is started, but not wiped again before actually copying over the chunks during a dump. So in the window of time between creating/wiping and dumping chunks, another recording could write to the same dump destination and have it's chunks added to the snapshot. This can happen with either a single JVM or multiple JVMs that are racing. > > #### Proposed fix > This PR ensures that any data previously written to the dump destination is wiped before the new recording's data is written. File locking is also done while chunks are being written. > > Testing: > - new test `jdk/jdk/jfr/api/recording/dump/TestDumpOverwrite.java` > - Tier 1 Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision: change WARN to INFO ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28460/files - new: https://git.openjdk.org/jdk/pull/28460/files/f3f7da42..fe408389 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28460&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28460&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28460.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28460/head:pull/28460 PR: https://git.openjdk.org/jdk/pull/28460 From egahlin at openjdk.org Tue Dec 2 16:55:06 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 2 Dec 2025 16:55:06 GMT Subject: RFR: 8370715: JFR: Races are possible when dumping recordings [v3] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 17:51:02 GMT, Robert Toyonaga wrote: >> #### Summary >> This PR changes the JFR snapshot dumping code so that multiple JFR recordings (potentially from different processes) racing to write the same dump destination won't mix their data. >> >> #### Problem >> The dump destination file is created and/or wiped when a recording is started, but not wiped again before actually copying over the chunks during a dump. So in the window of time between creating/wiping and dumping chunks, another recording could write to the same dump destination and have it's chunks added to the snapshot. This can happen with either a single JVM or multiple JVMs that are racing. >> >> #### Proposed fix >> This PR ensures that any data previously written to the dump destination is wiped before the new recording's data is written. File locking is also done while chunks are being written. >> >> Testing: >> - new test `jdk/jdk/jfr/api/recording/dump/TestDumpOverwrite.java` >> - Tier 1 > > Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision: > > vm.flagless and fix copyright header Looks good, but I wonder if LogLevel.INFO should be used instead of WARN, similar to the Transferred bytes log entry? The fix now ensures that dump files are in a consistent state, which is good, but I'm not sure writing a file simultaneously should constitute a warning. If the process overwrites a file 1 ns before or 1 ns after the lock, a warning is not issued. In all cases, one of the files be overwritten (since that is what the user has requested). This isn't a major concern for me since it's unlikely to occur in practice, so I'm not insisting on a fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28460#issuecomment-3602058130 From duke at openjdk.org Tue Dec 2 17:02:36 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Tue, 2 Dec 2025 17:02:36 GMT Subject: RFR: 8370715: JFR: Races are possible when dumping recordings [v4] In-Reply-To: References: Message-ID: <-IjRKg6OdsOfzLikT9D8WeYBkTrOv-G9ZyJKREfRhgU=.f59e39e8-a500-4c55-8970-3ee57d020968@github.com> On Tue, 2 Dec 2025 16:55:03 GMT, Robert Toyonaga wrote: >> #### Summary >> This PR changes the JFR snapshot dumping code so that multiple JFR recordings (potentially from different processes) racing to write the same dump destination won't mix their data. >> >> #### Problem >> The dump destination file is created and/or wiped when a recording is started, but not wiped again before actually copying over the chunks during a dump. So in the window of time between creating/wiping and dumping chunks, another recording could write to the same dump destination and have it's chunks added to the snapshot. This can happen with either a single JVM or multiple JVMs that are racing. >> >> #### Proposed fix >> This PR ensures that any data previously written to the dump destination is wiped before the new recording's data is written. File locking is also done while chunks are being written. >> >> Testing: >> - new test `jdk/jdk/jfr/api/recording/dump/TestDumpOverwrite.java` >> - Tier 1 > > Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision: > > change WARN to INFO Yes, I am also okay with changing WARN to INFO. As long as there is some record the user can find of what happened. Thank you for the review feedback @egahlin ------------- PR Comment: https://git.openjdk.org/jdk/pull/28460#issuecomment-3603061200 From egahlin at openjdk.org Tue Dec 2 17:32:55 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 2 Dec 2025 17:32:55 GMT Subject: RFR: 8370715: JFR: Races are possible when dumping recordings [v4] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 16:55:03 GMT, Robert Toyonaga wrote: >> #### Summary >> This PR changes the JFR snapshot dumping code so that multiple JFR recordings (potentially from different processes) racing to write the same dump destination won't mix their data. >> >> #### Problem >> The dump destination file is created and/or wiped when a recording is started, but not wiped again before actually copying over the chunks during a dump. So in the window of time between creating/wiping and dumping chunks, another recording could write to the same dump destination and have it's chunks added to the snapshot. This can happen with either a single JVM or multiple JVMs that are racing. >> >> #### Proposed fix >> This PR ensures that any data previously written to the dump destination is wiped before the new recording's data is written. File locking is also done while chunks are being written. >> >> Testing: >> - new test `jdk/jdk/jfr/api/recording/dump/TestDumpOverwrite.java` >> - Tier 1 > > Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision: > > change WARN to INFO Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28460#pullrequestreview-3531349417 From duke at openjdk.org Tue Dec 2 17:46:35 2025 From: duke at openjdk.org (duke) Date: Tue, 2 Dec 2025 17:46:35 GMT Subject: RFR: 8370715: JFR: Races are possible when dumping recordings [v4] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 16:55:03 GMT, Robert Toyonaga wrote: >> #### Summary >> This PR changes the JFR snapshot dumping code so that multiple JFR recordings (potentially from different processes) racing to write the same dump destination won't mix their data. >> >> #### Problem >> The dump destination file is created and/or wiped when a recording is started, but not wiped again before actually copying over the chunks during a dump. So in the window of time between creating/wiping and dumping chunks, another recording could write to the same dump destination and have it's chunks added to the snapshot. This can happen with either a single JVM or multiple JVMs that are racing. >> >> #### Proposed fix >> This PR ensures that any data previously written to the dump destination is wiped before the new recording's data is written. File locking is also done while chunks are being written. >> >> Testing: >> - new test `jdk/jdk/jfr/api/recording/dump/TestDumpOverwrite.java` >> - Tier 1 > > Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision: > > change WARN to INFO @roberttoyonaga Your change (at version fe40838989a25ac6166686b382287d427342609d) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28460#issuecomment-3603251639 From dholmes at openjdk.org Tue Dec 2 22:03:20 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 2 Dec 2025 22:03:20 GMT Subject: RFR: 8372587: Put jdk/jfr/jvm/TestWaste.java into the ProblemList [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:26:15 GMT, Kerem Kat wrote: >> It still fails in oracle environment as described in [JDK-8371630](https://bugs.openjdk.org/browse/JDK-8371630). > > Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: > > Update test/jdk/ProblemList.txt > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> Good. Please integrate. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28539#pullrequestreview-3532192517 From ysuenaga at openjdk.org Wed Dec 3 10:27:04 2025 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Wed, 3 Dec 2025 10:27:04 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented Message-ID: The jtreg test TestEmergencyDumpAtOOM.java runs into the following error on ppc64 platforms. JFR emergency dump would be kicked at `VMError::report_and_die()`, then Java thread for JFR would not work due to secondary signal handler for error reporting. Passed all of jdk_jfr tests on Linux AMD64. ------------- Commit messages: - Fix typo - Delete TestEmergencyDumpAtOOM.java from ProblemList - 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented Changes: https://git.openjdk.org/jdk/pull/28563/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28563&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371014 Stats: 31 lines in 8 files changed: 23 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28563.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28563/head:pull/28563 PR: https://git.openjdk.org/jdk/pull/28563 From mbaesken at openjdk.org Wed Dec 3 10:27:05 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 3 Dec 2025 10:27:05 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: On Sat, 29 Nov 2025 06:06:16 GMT, Yasumasa Suenaga wrote: > The jtreg test TestEmergencyDumpAtOOM.java runs into the following error on ppc64 platforms. > > JFR emergency dump would be kicked at `VMError::report_and_die()`, then Java thread for JFR would not work due to secondary signal handler for error reporting. > > Passed all of jdk_jfr tests on Linux AMD64. With your PR added, we do not observe the error in test TestEmergencyDumpAtOOM any more. src/hotspot/share/jfr/jfr.cpp line 159: > 157: > 158: void Jfr::on_vm_error_report(outputStream* st) { > 159: assert(!JfrRecorder::is_recording(), "JFR should be stopped at erorr reporting"); 'erorr' - please fix the little typo ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28563#issuecomment-3605855107 PR Review Comment: https://git.openjdk.org/jdk/pull/28563#discussion_r2584283966 From duke at openjdk.org Wed Dec 3 10:41:46 2025 From: duke at openjdk.org (duke) Date: Wed, 3 Dec 2025 10:41:46 GMT Subject: RFR: 8372587: Put jdk/jfr/jvm/TestWaste.java into the ProblemList [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:26:15 GMT, Kerem Kat wrote: >> It still fails in oracle environment as described in [JDK-8371630](https://bugs.openjdk.org/browse/JDK-8371630). > > Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: > > Update test/jdk/ProblemList.txt > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> @krk Your change (at version ab4f92453ba582b6c94007ba80e74d6a025d20e5) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28539#issuecomment-3606187497 From mgronlun at openjdk.org Wed Dec 3 10:45:43 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 3 Dec 2025 10:45:43 GMT Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for class loading [v14] In-Reply-To: References: Message-ID: <2tMUYu-TFZeFFfpE_054SK4rzCZZgs_UMiNudMx440M=.ce00a309-abf1-4325-9781-65dc0b5c6d59@github.com> > Greetings, > > this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event. > > To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements. > > Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths. > > Testing: jdk_jfr, manual AOT verification, stress testing > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: copyright header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28505/files - new: https://git.openjdk.org/jdk/pull/28505/files/c2e36d2f..85bf2d25 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=12-13 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505 PR: https://git.openjdk.org/jdk/pull/28505 From egahlin at openjdk.org Wed Dec 3 10:58:54 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 3 Dec 2025 10:58:54 GMT Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for class loading [v14] In-Reply-To: <2tMUYu-TFZeFFfpE_054SK4rzCZZgs_UMiNudMx440M=.ce00a309-abf1-4325-9781-65dc0b5c6d59@github.com> References: <2tMUYu-TFZeFFfpE_054SK4rzCZZgs_UMiNudMx440M=.ce00a309-abf1-4325-9781-65dc0b5c6d59@github.com> Message-ID: On Wed, 3 Dec 2025 10:45:43 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event. >> >> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements. >> >> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths. >> >> Testing: jdk_jfr, manual AOT verification, stress testing >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > copyright header Nice work! ------------- Marked as reviewed by egahlin (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28505#pullrequestreview-3534470248 From krk at openjdk.org Wed Dec 3 13:05:13 2025 From: krk at openjdk.org (Kerem Kat) Date: Wed, 3 Dec 2025 13:05:13 GMT Subject: Integrated: 8372587: Put jdk/jfr/jvm/TestWaste.java into the ProblemList In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 16:04:56 GMT, Kerem Kat wrote: > It still fails in oracle environment as described in [JDK-8371630](https://bugs.openjdk.org/browse/JDK-8371630). This pull request has now been integrated. Changeset: abb75ba6 Author: Kerem Kat Committer: Volker Simonis URL: https://git.openjdk.org/jdk/commit/abb75ba656ebe14e9e8e1d4a1765d64dfce9e661 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8372587: Put jdk/jfr/jvm/TestWaste.java into the ProblemList Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/28539 From coleenp at openjdk.org Wed Dec 3 13:30:44 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 3 Dec 2025 13:30:44 GMT Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for class loading [v14] In-Reply-To: <2tMUYu-TFZeFFfpE_054SK4rzCZZgs_UMiNudMx440M=.ce00a309-abf1-4325-9781-65dc0b5c6d59@github.com> References: <2tMUYu-TFZeFFfpE_054SK4rzCZZgs_UMiNudMx440M=.ce00a309-abf1-4325-9781-65dc0b5c6d59@github.com> Message-ID: On Wed, 3 Dec 2025 10:45:43 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event. >> >> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements. >> >> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths. >> >> Testing: jdk_jfr, manual AOT verification, stress testing >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > copyright header Runtime changes look fine. Didn't review the new concurrent hash table. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28505#pullrequestreview-3535046644 From mgronlun at openjdk.org Wed Dec 3 13:54:07 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 3 Dec 2025 13:54:07 GMT Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for class loading [v15] In-Reply-To: References: Message-ID: > Greetings, > > this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event. > > To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements. > > Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths. > > Testing: jdk_jfr, manual AOT verification, stress testing > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: restore AOT modifications ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28505/files - new: https://git.openjdk.org/jdk/pull/28505/files/85bf2d25..601fbc0b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=13-14 Stats: 23 lines in 3 files changed: 0 ins; 18 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505 PR: https://git.openjdk.org/jdk/pull/28505 From mgronlun at openjdk.org Wed Dec 3 14:00:34 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 3 Dec 2025 14:00:34 GMT Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for class loading [v16] In-Reply-To: References: Message-ID: > Greetings, > > this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event. > > To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements. > > Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths. > > Testing: jdk_jfr, manual AOT verification, stress testing > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: restore aotClassLocation.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28505/files - new: https://git.openjdk.org/jdk/pull/28505/files/601fbc0b..0d1461f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=14-15 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505 PR: https://git.openjdk.org/jdk/pull/28505 From egahlin at openjdk.org Wed Dec 3 14:11:24 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 3 Dec 2025 14:11:24 GMT Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for class loading [v16] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 14:00:34 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event. >> >> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements. >> >> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths. >> >> Testing: jdk_jfr, manual AOT verification, stress testing >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > restore aotClassLocation.cpp Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28505#pullrequestreview-3535245425 From coleenp at openjdk.org Wed Dec 3 14:46:25 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 3 Dec 2025 14:46:25 GMT Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for class loading [v16] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 14:00:34 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event. >> >> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements. >> >> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths. >> >> Testing: jdk_jfr, manual AOT verification, stress testing >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > restore aotClassLocation.cpp Looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28505#pullrequestreview-3535405541 From egahlin at openjdk.org Wed Dec 3 14:51:25 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 3 Dec 2025 14:51:25 GMT Subject: RFR: 8373024: JFR: CPU throttle rate can't handle incorrect values Message-ID: Could I get a review of a PR that hardens the CPU throttle rate setting? Testing: test/jdk/jdk/jfr Thanks Erik ------------- Commit messages: - Initial Changes: https://git.openjdk.org/jdk/pull/28636/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28636&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373024 Stats: 5 lines in 3 files changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28636.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28636/head:pull/28636 PR: https://git.openjdk.org/jdk/pull/28636 From mgronlun at openjdk.org Wed Dec 3 15:15:29 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 3 Dec 2025 15:15:29 GMT Subject: RFR: 8373024: JFR: CPU throttle rate can't handle incorrect values In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 14:41:36 GMT, Erik Gahlin wrote: > Could I get a review of a PR that hardens the CPU throttle rate setting? > > Testing: test/jdk/jdk/jfr > > Thanks > Erik Marked as reviewed by mgronlun (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28636#pullrequestreview-3535543932 From mgronlun at openjdk.org Wed Dec 3 18:16:28 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 3 Dec 2025 18:16:28 GMT Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for class loading [v16] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 14:08:23 GMT, Erik Gahlin wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> restore aotClassLocation.cpp > > Marked as reviewed by egahlin (Reviewer). Thanks @egahlin and @coleenp for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28505#issuecomment-3608156678 From mgronlun at openjdk.org Wed Dec 3 18:16:31 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 3 Dec 2025 18:16:31 GMT Subject: Integrated: 8365400: Enhance JFR to emit file and module metadata for class loading In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 12:10:55 GMT, Markus Gr?nlund wrote: > Greetings, > > this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event. > > To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements. > > Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths. > > Testing: jdk_jfr, manual AOT verification, stress testing > > Thanks > Markus This pull request has now been integrated. Changeset: e93b10d0 Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/e93b10d08456f720e303771a882e79660911e1eb Stats: 1372 lines in 33 files changed: 1035 ins; 162 del; 175 mod 8365400: Enhance JFR to emit file and module metadata for class loading Reviewed-by: coleenp, egahlin ------------- PR: https://git.openjdk.org/jdk/pull/28505 From mdoerr at openjdk.org Wed Dec 3 21:35:55 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 3 Dec 2025 21:35:55 GMT Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented In-Reply-To: References: Message-ID: On Sat, 29 Nov 2025 06:06:16 GMT, Yasumasa Suenaga wrote: > The jtreg test TestEmergencyDumpAtOOM.java runs into the following error on ppc64 platforms. > > JFR emergency dump would be kicked at `VMError::report_and_die()`, then Java thread for JFR would not work due to secondary signal handler for error reporting. > > Passed all of jdk_jfr tests on Linux AMD64. I think this makes sense, but should also be reviewed by JFR folks. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28563#pullrequestreview-3537010567 From egahlin at openjdk.org Thu Dec 4 08:04:08 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Thu, 4 Dec 2025 08:04:08 GMT Subject: Integrated: 8373024: JFR: CPU throttle rate can't handle incorrect values In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 14:41:36 GMT, Erik Gahlin wrote: > Could I get a review of a PR that hardens the CPU throttle rate setting? > > Testing: test/jdk/jdk/jfr > > Thanks > Erik This pull request has now been integrated. Changeset: 63a10e00 Author: Erik Gahlin URL: https://git.openjdk.org/jdk/commit/63a10e0099111d69b167abf99d1a00084c4d6c1e Stats: 5 lines in 3 files changed: 1 ins; 0 del; 4 mod 8373024: JFR: CPU throttle rate can't handle incorrect values Reviewed-by: mgronlun ------------- PR: https://git.openjdk.org/jdk/pull/28636 From mgronlun at openjdk.org Thu Dec 4 10:16:41 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 4 Dec 2025 10:16:41 GMT Subject: RFR: 8373062: JFR build failure with CDS disabled Message-ID: Greetings, [JDK-8365400](https://bugs.openjdk.org/browse/JDK-8365400) caused a build problem when passing build option "--disable-cds" because of missing conditionals. Testing: manually building with "--disable-cds" Thanks Markus ------------- Commit messages: - 8373062 Changes: https://git.openjdk.org/jdk/pull/28656/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28656&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373062 Stats: 7 lines in 4 files changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28656.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28656/head:pull/28656 PR: https://git.openjdk.org/jdk/pull/28656 From mgronlun at openjdk.org Thu Dec 4 10:29:35 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 4 Dec 2025 10:29:35 GMT Subject: RFR: 8373062: JFR build failure with CDS disabled [v2] In-Reply-To: References: Message-ID: <3UTq-N9HJwvAaxkyFlRMfjpCJVWL7pM5g3S4qFiRGQo=.ed758eaf-4e0d-4b48-8226-1f3df4fbdaca@github.com> > Greetings, > > [JDK-8365400](https://bugs.openjdk.org/browse/JDK-8365400) caused a build problem when passing build option "--disable-cds" because of missing conditionals. > > Testing: manually building with "--disable-cds" > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: post_class_load_event wrongly placed inside INCLUDE_CDS section ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28656/files - new: https://git.openjdk.org/jdk/pull/28656/files/513642a6..369297f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28656&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28656&range=00-01 Stats: 22 lines in 1 file changed: 11 ins; 11 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28656.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28656/head:pull/28656 PR: https://git.openjdk.org/jdk/pull/28656 From egahlin at openjdk.org Thu Dec 4 10:45:26 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Thu, 4 Dec 2025 10:45:26 GMT Subject: RFR: 8373062: JFR build failure with CDS disabled [v3] In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 10:42:56 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> [JDK-8365400](https://bugs.openjdk.org/browse/JDK-8365400) caused a build problem when passing build option "--disable-cds" because of missing conditionals. >> >> Testing: manually building with "--disable-cds" >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > apa Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28656#pullrequestreview-3539289435 From mgronlun at openjdk.org Thu Dec 4 10:45:25 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 4 Dec 2025 10:45:25 GMT Subject: RFR: 8373062: JFR build failure with CDS disabled [v3] In-Reply-To: References: Message-ID: > Greetings, > > [JDK-8365400](https://bugs.openjdk.org/browse/JDK-8365400) caused a build problem when passing build option "--disable-cds" because of missing conditionals. > > Testing: manually building with "--disable-cds" > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: apa ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28656/files - new: https://git.openjdk.org/jdk/pull/28656/files/369297f5..0e8612c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28656&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28656&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28656.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28656/head:pull/28656 PR: https://git.openjdk.org/jdk/pull/28656 From mgronlun at openjdk.org Thu Dec 4 12:27:53 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 4 Dec 2025 12:27:53 GMT Subject: RFR: 8373062: JFR build failure with CDS disabled [v3] In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 10:42:01 GMT, Erik Gahlin wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> apa > > Marked as reviewed by egahlin (Reviewer). Thanks @egahlin for the review. I am going to to proceed with integration to ensure no build issues remain for RDP1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28656#issuecomment-3611987285 From mgronlun at openjdk.org Thu Dec 4 12:27:55 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 4 Dec 2025 12:27:55 GMT Subject: Integrated: 8373062: JFR build failure with CDS disabled In-Reply-To: References: Message-ID: <50F7UB-iOcEhzjtgSeuYCHLBtS0EMmTNqZs_2wTM9Hs=.5a87c2be-06d6-4719-9285-1d6487016dee@github.com> On Thu, 4 Dec 2025 10:09:50 GMT, Markus Gr?nlund wrote: > Greetings, > > [JDK-8365400](https://bugs.openjdk.org/browse/JDK-8365400) caused a build problem when passing build option "--disable-cds" because of missing conditionals. > > Testing: manually building with "--disable-cds" > > Thanks > Markus This pull request has now been integrated. Changeset: bcbdf90f Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/bcbdf90fce44ad87e7728ba0febef0951e361589 Stats: 29 lines in 5 files changed: 16 ins; 11 del; 2 mod 8373062: JFR build failure with CDS disabled Reviewed-by: egahlin ------------- PR: https://git.openjdk.org/jdk/pull/28656 From stuefe at openjdk.org Thu Dec 4 13:56:16 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 Dec 2025 13:56:16 GMT Subject: RFR: 8370715: JFR: Races are possible when dumping recordings [v4] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 16:55:03 GMT, Robert Toyonaga wrote: >> #### Summary >> This PR changes the JFR snapshot dumping code so that multiple JFR recordings (potentially from different processes) racing to write the same dump destination won't mix their data. >> >> #### Problem >> The dump destination file is created and/or wiped when a recording is started, but not wiped again before actually copying over the chunks during a dump. So in the window of time between creating/wiping and dumping chunks, another recording could write to the same dump destination and have it's chunks added to the snapshot. This can happen with either a single JVM or multiple JVMs that are racing. >> >> #### Proposed fix >> This PR ensures that any data previously written to the dump destination is wiped before the new recording's data is written. File locking is also done while chunks are being written. >> >> Testing: >> - new test `jdk/jdk/jfr/api/recording/dump/TestDumpOverwrite.java` >> - Tier 1 > > Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision: > > change WARN to INFO Good! Had a quick look at the GHA failures, they are unrelated. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28460#pullrequestreview-3540155334 PR Comment: https://git.openjdk.org/jdk/pull/28460#issuecomment-3612377195 From duke at openjdk.org Thu Dec 4 13:59:52 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Thu, 4 Dec 2025 13:59:52 GMT Subject: Integrated: 8370715: JFR: Races are possible when dumping recordings In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 21:02:56 GMT, Robert Toyonaga wrote: > #### Summary > This PR changes the JFR snapshot dumping code so that multiple JFR recordings (potentially from different processes) racing to write the same dump destination won't mix their data. > > #### Problem > The dump destination file is created and/or wiped when a recording is started, but not wiped again before actually copying over the chunks during a dump. So in the window of time between creating/wiping and dumping chunks, another recording could write to the same dump destination and have it's chunks added to the snapshot. This can happen with either a single JVM or multiple JVMs that are racing. > > #### Proposed fix > This PR ensures that any data previously written to the dump destination is wiped before the new recording's data is written. File locking is also done while chunks are being written. > > Testing: > - new test `jdk/jdk/jfr/api/recording/dump/TestDumpOverwrite.java` > - Tier 1 This pull request has now been integrated. Changeset: c4ec983d Author: Robert Toyonaga Committer: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/c4ec983da57ee8aea71e88d5de2570c5d65a69df Stats: 88 lines in 2 files changed: 87 ins; 0 del; 1 mod 8370715: JFR: Races are possible when dumping recordings Reviewed-by: egahlin, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/28460 From stuefe at openjdk.org Fri Dec 5 05:52:32 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 5 Dec 2025 05:52:32 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive Message-ID: A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. This RFE changes the algorithm to be non-recursive. Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. Testing: - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical - Ran locally all jtreg tests in jdk/jfr - GHAs ------------- Commit messages: - remove test output - Copyright - start Changes: https://git.openjdk.org/jdk/pull/28659/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373096 Stats: 70 lines in 2 files changed: 40 ins; 17 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/28659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28659/head:pull/28659 PR: https://git.openjdk.org/jdk/pull/28659 From stuefe at openjdk.org Sat Dec 6 06:15:54 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 6 Dec 2025 06:15:54 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 15:54:04 GMT, Thomas Stuefe wrote: > A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. > > We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. > > This RFE changes the algorithm to be non-recursive. > > Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. > > Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. > > Testing: > > - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out > - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical > - Ran locally all jtreg tests in jdk/jfr > - GHAs Ping @egahlin ------------- PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3619621653 From egahlin at openjdk.org Mon Dec 8 18:08:09 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Mon, 8 Dec 2025 18:08:09 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive In-Reply-To: References: Message-ID: On Sat, 6 Dec 2025 06:13:44 GMT, Thomas Stuefe wrote: > Ping @egahlin This looks like a good fix and refactoring. I agree, increasing the depth is better done separately. I will need to look at this more and run some tests. We now return when we find a marked object. 102 if (_mark_bits->is_marked(pointee)) { 103 return; 104 } Previously we just aborted that closure. Need to think about if it matters. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3628350221 From duke at openjdk.org Tue Dec 9 01:22:00 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Tue, 9 Dec 2025 01:22:00 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 15:54:04 GMT, Thomas Stuefe wrote: > A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. > > We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. > > This RFE changes the algorithm to be non-recursive. > > Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. > > Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. > > Testing: > > - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out > - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical > - Ran locally all jtreg tests in jdk/jfr > - GHAs src/hotspot/share/jfr/leakprofiler/chains/dfsClosure.cpp line 103: > 101: } else { > 102: if (_mark_bits->is_marked(pointee)) { > 103: return; I think this improvement is a good idea! But maybe this line should be replaced with a `continue`, otherwise we can terminate the DFS prematurely and skip evaluation of other chains extending from other references already pushed to the stack. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28659#discussion_r2600708367 From stuefe at openjdk.org Tue Dec 9 05:55:57 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 9 Dec 2025 05:55:57 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 01:19:16 GMT, Robert Toyonaga wrote: >> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. >> >> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. >> >> This RFE changes the algorithm to be non-recursive. >> >> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. >> >> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. >> >> Testing: >> >> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out >> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical >> - Ran locally all jtreg tests in jdk/jfr >> - GHAs > > src/hotspot/share/jfr/leakprofiler/chains/dfsClosure.cpp line 103: > >> 101: } else { >> 102: if (_mark_bits->is_marked(pointee)) { >> 103: return; > > I think this improvement is a good idea! But maybe this line should be replaced with a `continue`, otherwise we can terminate the DFS prematurely and skip evaluation of other chains extending from other references already pushed to the stack. Good catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28659#discussion_r2601196718 From stuefe at openjdk.org Tue Dec 9 08:13:11 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 9 Dec 2025 08:13:11 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v2] In-Reply-To: References: Message-ID: > A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. > > We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. > > This RFE changes the algorithm to be non-recursive. > > Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. > > Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. > > Testing: > > - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out > - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical > - Ran locally all jtreg tests in jdk/jfr > - GHAs Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28659/files - new: https://git.openjdk.org/jdk/pull/28659/files/5ac152db..e1a4736b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=00-01 Stats: 16 lines in 2 files changed: 9 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/28659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28659/head:pull/28659 PR: https://git.openjdk.org/jdk/pull/28659 From stuefe at openjdk.org Tue Dec 9 11:26:45 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 9 Dec 2025 11:26:45 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v3] In-Reply-To: References: Message-ID: > A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. > > We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. > > This RFE changes the algorithm to be non-recursive. > > Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. > > Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. > > Testing: > > - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out > - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical > - Ran locally all jtreg tests in jdk/jfr > - GHAs Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - revert accidental checkin - test improvements ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28659/files - new: https://git.openjdk.org/jdk/pull/28659/files/e1a4736b..d5ee7c4b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=01-02 Stats: 101 lines in 1 file changed: 81 ins; 3 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/28659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28659/head:pull/28659 PR: https://git.openjdk.org/jdk/pull/28659 From stuefe at openjdk.org Tue Dec 9 12:08:16 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 9 Dec 2025 12:08:16 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v4] In-Reply-To: References: Message-ID: > A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. > > We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. > > This RFE changes the algorithm to be non-recursive. > > Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. > > Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. > > Testing: > > - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out > - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical > - Ran locally all jtreg tests in jdk/jfr > - GHAs Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: final fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28659/files - new: https://git.openjdk.org/jdk/pull/28659/files/d5ee7c4b..73497c30 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28659/head:pull/28659 PR: https://git.openjdk.org/jdk/pull/28659 From stuefe at openjdk.org Tue Dec 9 13:21:56 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 9 Dec 2025 13:21:56 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v4] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 12:08:16 GMT, Thomas Stuefe wrote: >> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. >> >> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. >> >> This RFE changes the algorithm to be non-recursive. >> >> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. >> >> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. >> >> Testing: >> >> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out >> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical >> - Ran locally all jtreg tests in jdk/jfr >> - GHAs > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > final fixes I see that we have a problem with very broad objects or large object arrays that are nested with this approach, as the space-time complexity of traversing the net with this approach becomes too large. I'll try to modify the patch to take that into account. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3632232783 From fthevenet at openjdk.org Wed Dec 10 14:27:35 2025 From: fthevenet at openjdk.org (Frederic Thevenet) Date: Wed, 10 Dec 2025 14:27:35 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v4] In-Reply-To: References: Message-ID: <0El1-nrTE4DNmVq0AjNmSejSbqW_ysuSMs2DEj20UKA=.a1ed9c4b-1f8c-400f-9ae7-db44081e356c@github.com> On Tue, 9 Dec 2025 12:08:16 GMT, Thomas Stuefe wrote: >> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. >> >> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. >> >> This RFE changes the algorithm to be non-recursive. >> >> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. >> >> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. >> >> Testing: >> >> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out >> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical >> - Ran locally all jtreg tests in jdk/jfr >> - GHAs > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > final fixes src/hotspot/share/jfr/leakprofiler/chains/dfsClosure.cpp line 177: > 175: assert(!ref.is_null(), "invariant"); > 176: const oop pointee = ref.dereference(); > 177: assert(pointee != nullptr, "invariant"); Small thing: is this still useful since since `pointee` is no longer used with this change? We assert that `ref.dereference() != nullptr` right after we pop it from the stack in `drain_probe_stack` anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28659#discussion_r2606872452 From stuefe at openjdk.org Thu Dec 11 05:33:23 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 11 Dec 2025 05:33:23 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v4] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 12:08:16 GMT, Thomas Stuefe wrote: >> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. >> >> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. >> >> This RFE changes the algorithm to be non-recursive. >> >> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. >> >> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. >> >> Testing: >> >> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out >> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical >> - Ran locally all jtreg tests in jdk/jfr >> - GHAs > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > final fixes Update: the performance problem with large Arrays I see is pre-existing, and I will post an RFE separately. But my patch can benefit from striping large arrays in order to avoid large probing stacks. I will do that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3640209385 From stuefe at openjdk.org Thu Dec 11 07:59:34 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 11 Dec 2025 07:59:34 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v4] In-Reply-To: References: Message-ID: <083O8MLtieturWtthx5z8jOY4Vij_HjQUOpVX3ZqxUc=.76fff71d-1e4c-4dbc-9d6a-e721a5658e67@github.com> On Tue, 9 Dec 2025 12:08:16 GMT, Thomas Stuefe wrote: >> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. >> >> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. >> >> This RFE changes the algorithm to be non-recursive. >> >> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. >> >> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. >> >> Testing: >> >> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out >> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical >> - Ran locally all jtreg tests in jdk/jfr >> - GHAs > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > final fixes For the performance problem, see https://bugs.openjdk.org/browse/JDK-8373490 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3640696702 From egahlin at openjdk.org Thu Dec 11 08:24:23 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Thu, 11 Dec 2025 08:24:23 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v4] In-Reply-To: <083O8MLtieturWtthx5z8jOY4Vij_HjQUOpVX3ZqxUc=.76fff71d-1e4c-4dbc-9d6a-e721a5658e67@github.com> References: <083O8MLtieturWtthx5z8jOY4Vij_HjQUOpVX3ZqxUc=.76fff71d-1e4c-4dbc-9d6a-e721a5658e67@github.com> Message-ID: On Thu, 11 Dec 2025 07:57:18 GMT, Thomas Stuefe wrote: > For the performance problem, see https://bugs.openjdk.org/browse/JDK-8373490 Our long-term plan is to get rid of DFS [1], but it's good to have a fix that we can backport. [1] https://bugs.openjdk.org/browse/JDK-8245430 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3640773953 From stuefe at openjdk.org Fri Dec 12 07:57:14 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 12 Dec 2025 07:57:14 GMT Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for large object arrays Message-ID: I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions: 1) We have large object arrays on the heap 2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.cpp#L142-L144). This, in turn, is more likely to happen with large object arrays. Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two. The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by decreasing the size of the BFS edge queue in the code. In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results: - Object Array size of 200,000 elements: BFS only, all is well (~5 seconds) - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds) - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds) The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever. Examining the problem more closely, I see: - BFS search starts iterating over the object array. It will find that its edge queue is too small, and will drop down to DFS for every single object element. That in itself is not the problem. - It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size. The reason for this is a missing mark check at the border between BFS and DFS. Tests: - I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs). I also manually verified the resulting JFR file and confirmed that I see the gc roots listed for the array elements. - I ran JFR jtreg tests manually ------------- Commit messages: - fix - start Changes: https://git.openjdk.org/jdk/pull/28781/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28781&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373490 Stats: 208 lines in 2 files changed: 202 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28781/head:pull/28781 PR: https://git.openjdk.org/jdk/pull/28781 From stuefe at openjdk.org Fri Dec 12 08:01:51 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 12 Dec 2025 08:01:51 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v4] In-Reply-To: References: <083O8MLtieturWtthx5z8jOY4Vij_HjQUOpVX3ZqxUc=.76fff71d-1e4c-4dbc-9d6a-e721a5658e67@github.com> Message-ID: On Thu, 11 Dec 2025 08:21:29 GMT, Erik Gahlin wrote: >> For the performance problem, see https://bugs.openjdk.org/browse/JDK-8373490 > >> For the performance problem, see https://bugs.openjdk.org/browse/JDK-8373490 > > Our long-term plan is to get rid of DFS [1], but it's good to have a fix that we can backport. > > [1] https://bugs.openjdk.org/browse/JDK-8245430 @egahlin @roberttoyonaga I posted a patch for the performance problem. The patch is very simple, and I would like to get that in first before progressing with this patch, to make backports easier (though it probably does not matter). Could you pls give me a quick review? The issue is rather simple. https://github.com/openjdk/jdk/pull/28781 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3645351477 From egahlin at openjdk.org Fri Dec 12 08:37:54 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Fri, 12 Dec 2025 08:37:54 GMT Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for large object arrays In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe wrote: > I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions: > > 1) We have large object arrays on the heap > > 2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.cpp#L142-L144). This, in turn, is more likely to happen with large object arrays. > > Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two. > > The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code. > > In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results: > > - Object Array size of 200,000 elements: BFS only, all is well (~5 seconds) > - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds) > - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds) > > The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever. > > Examining the problem more closely, I see: > > - BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem. > > - It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size. > > The reason for this is a missing mark check at the border between BFS and DFS. > > Tests: > > - I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs... Looks reasonable. I will run some tests before approving. (It?s the end of the year, and I have vacation time I must take, so my availability for reviews is limited.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3645483611 From stuefe at openjdk.org Fri Dec 12 09:11:50 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 12 Dec 2025 09:11:50 GMT Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for large object arrays In-Reply-To: References: Message-ID: <0gQ7TMLhG1xkyrgjazvcoq_pCRewwhKaVnX7_wEQhcw=.9e18ff5c-41d9-4f6f-9b9d-2cc37694a669@github.com> On Fri, 12 Dec 2025 08:34:47 GMT, Erik Gahlin wrote: > Looks reasonable. I will run some tests before approving. > > (It?s the end of the year, and I have vacation time I must take, so my availability for reviews is limited.) Same here :-) Hope you have a nice vacation, btw! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3645590241 From duke at openjdk.org Fri Dec 12 13:37:07 2025 From: duke at openjdk.org (Bara' Hasheesh) Date: Fri, 12 Dec 2025 13:37:07 GMT Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath Message-ID: A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks A new test was added that fails without the change & passes with it I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86 ------------- Commit messages: - exception message - Move shutdown check to PlatformRecorder.start & missing space - 8373439: Fix deadlock between recorder start & VMDeath hook Changes: https://git.openjdk.org/jdk/pull/28767/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28767&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373439 Stats: 66 lines in 3 files changed: 64 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28767.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28767/head:pull/28767 PR: https://git.openjdk.org/jdk/pull/28767 From fandreuzzi at openjdk.org Fri Dec 12 13:37:09 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Fri, 12 Dec 2025 13:37:09 GMT Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath In-Reply-To: References: Message-ID: <7l8_qZiR8JS8IdXLaelvruMTGLYpekzQPYmjPB66GPI=.dfdaae86-1cf8-47d9-913d-6d8b747c643c@github.com> On Thu, 11 Dec 2025 15:19:51 GMT, Bara' Hasheesh wrote: > A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks > > A new test was added that fails without the change & passes with it > > I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86 test/jdk/jdk/jfr/api/recording/deadlock/TestShutdownDeadLock.java line 43: > 41: Recording r = new Recording(); > 42: r.start(); > 43: r.stop(); Are these two lines needed for the reproducer? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2611098871 From duke at openjdk.org Fri Dec 12 13:37:09 2025 From: duke at openjdk.org (Bara' Hasheesh) Date: Fri, 12 Dec 2025 13:37:09 GMT Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath In-Reply-To: <7l8_qZiR8JS8IdXLaelvruMTGLYpekzQPYmjPB66GPI=.dfdaae86-1cf8-47d9-913d-6d8b747c643c@github.com> References: <7l8_qZiR8JS8IdXLaelvruMTGLYpekzQPYmjPB66GPI=.dfdaae86-1cf8-47d9-913d-6d8b747c643c@github.com> Message-ID: On Thu, 11 Dec 2025 15:44:48 GMT, Francesco Andreuzzi wrote: >> A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks >> >> A new test was added that fails without the change & passes with it >> >> I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86 > > test/jdk/jdk/jfr/api/recording/deadlock/TestShutdownDeadLock.java line 43: > >> 41: Recording r = new Recording(); >> 42: r.start(); >> 43: r.stop(); > > Are these two lines needed for the reproducer? These calls they are made to guarantee that the entire JFR components are fully initialized (internals threads & other structures) & fully functional as recordings are able to be processed normally. While not needed as only `new Recording` is needed to start that creation process, but I would personally vote to keep them ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2611162129 From egahlin at openjdk.org Fri Dec 12 14:19:15 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Fri, 12 Dec 2025 14:19:15 GMT Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for large object arrays In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe wrote: > I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions: > > 1) We have large object arrays on the heap > > 2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.cpp#L142-L144). This, in turn, is more likely to happen with large object arrays. > > Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two. > > The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code. > > In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results: > > - Object Array size of 200,000 elements: BFS only, all is well (~5 seconds) > - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds) > - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds) > > The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever. > > Examining the problem more closely, I see: > > - BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem. > > - It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size. > > The reason for this is a missing mark check at the border between BFS and DFS. > > Tests: > > - I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs... Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28781#pullrequestreview-3571992566 From stuefe at openjdk.org Fri Dec 12 14:49:16 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 12 Dec 2025 14:49:16 GMT Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:19:51 GMT, Bara' Hasheesh wrote: > A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks > > A new test was added that fails without the change & passes with it > > I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86 @Baraa-Hasheesh would this be a solution for https://bugs.openjdk.org/browse/JDK-8373257? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28767#issuecomment-3646763877 From duke at openjdk.org Fri Dec 12 16:01:29 2025 From: duke at openjdk.org (Bara' Hasheesh) Date: Fri, 12 Dec 2025 16:01:29 GMT Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 14:47:00 GMT, Thomas Stuefe wrote: >> A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks >> >> A new test was added that fails without the change & passes with it >> >> I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86 > > @Baraa-Hasheesh would this be a solution for https://bugs.openjdk.org/browse/JDK-8373257? @tstuefe From the ticket description that looks like a different deadlock The deadlock here happens as the `JfrPostBox` will wait infinitely on the `recorderthread` to process it's message, but that thread already exited due to call of `PlatformRecorder.destroy -> JVMSupport.destroyJFR -> JVM.destroyJFR` Your case seems to be deadlocking somewhere else ------------- PR Comment: https://git.openjdk.org/jdk/pull/28767#issuecomment-3647155450 From egahlin at openjdk.org Fri Dec 12 17:10:54 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Fri, 12 Dec 2025 17:10:54 GMT Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:19:51 GMT, Bara' Hasheesh wrote: > A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks > > A new test was added that fails without the change & passes with it > > I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86 src/jdk.jfr/share/classes/jdk/jfr/internal/PlatformRecording.java line 108: > 106: synchronized (recorder) { > 107: if (PlatformRecorder.isInShutDown()) { > 108: throw new IllegalStateException("Flight recorder is already shutdown"); I need to think about if throwing ISE is the best alternative here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2614980643 From egahlin at openjdk.org Fri Dec 12 17:17:52 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Fri, 12 Dec 2025 17:17:52 GMT Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for large object arrays In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe wrote: > I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions: > > 1) We have large object arrays on the heap > > 2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.cpp#L142-L144). This, in turn, is more likely to happen with large object arrays. > > Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two. > > The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code. > > In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results: > > - Object Array size of 200,000 elements: BFS only, all is well (~5 seconds) > - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds) > - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds) > > The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever. > > Examining the problem more closely, I see: > > - BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem. > > - It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size. > > The reason for this is a missing mark check at the border between BFS and DFS. > > Tests: > > - I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs... A better title for the bug might be "JFR: path-to-gc-roots=true is very slow for large object arrays". The leak profiler name is not something we have used externally. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3647430575 From duke at openjdk.org Fri Dec 12 19:03:53 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Fri, 12 Dec 2025 19:03:53 GMT Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for large object arrays In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe wrote: > I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions: > > 1) We have large object arrays on the heap > > 2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.cpp#L142-L144). This, in turn, is more likely to happen with large object arrays. > > Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two. > > The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code. > > In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results: > > - Object Array size of 200,000 elements: BFS only, all is well (~5 seconds) > - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds) > - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds) > > The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever. > > Examining the problem more closely, I see: > > - BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem. > > - It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size. > > The reason for this is a missing mark check at the border between BFS and DFS. > > Tests: > > - I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs... This fix looks good to me ------------- Marked as reviewed by roberttoyonaga at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/28781#pullrequestreview-3573204588 From duke at openjdk.org Fri Dec 12 19:25:50 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Fri, 12 Dec 2025 19:25:50 GMT Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for large object arrays In-Reply-To: References: Message-ID: <_WHtdpClhCRzQlZJn-mA8KgNaG9g-nYbxKZ8P16ceMA=.bf58d6e9-67e4-491c-91bd-97c4c8bceb9c@github.com> On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe wrote: > I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions: > > 1) We have large object arrays on the heap > > 2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.cpp#L142-L144). This, in turn, is more likely to happen with large object arrays. > > Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two. > > The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code. > > In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results: > > - Object Array size of 200,000 elements: BFS only, all is well (~5 seconds) > - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds) > - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds) > > The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever. > > Examining the problem more closely, I see: > > - BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem. > > - It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size. > > The reason for this is a missing mark check at the border between BFS and DFS. > > Tests: > > - I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs... Is there a way to add a timeout to the test for the failure case? Time outs in unit tests are not completely reliable, but maybe something generous like 10s might be reasonable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3647841771 From egahlin at openjdk.org Fri Dec 12 21:23:52 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Fri, 12 Dec 2025 21:23:52 GMT Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for large object arrays In-Reply-To: <_WHtdpClhCRzQlZJn-mA8KgNaG9g-nYbxKZ8P16ceMA=.bf58d6e9-67e4-491c-91bd-97c4c8bceb9c@github.com> References: <_WHtdpClhCRzQlZJn-mA8KgNaG9g-nYbxKZ8P16ceMA=.bf58d6e9-67e4-491c-91bd-97c4c8bceb9c@github.com> Message-ID: <5niLk8ntVOpDpjrN6jMsolLGSOuY7pEyDLXZJ513KWY=.3bb31fcd-574a-4120-8975-5f82d94ed998@github.com> On Fri, 12 Dec 2025 19:23:34 GMT, Robert Toyonaga wrote: > Is there a way to add a timeout to the test for the failure case? Time outs in unit tests are not completely reliable, but maybe something generous like 10s might be reasonable. I don't think we should have a timeout. Timeouts typically only lead to false positives, when you run on some old hardware, during stress, using some other option etc. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3648179388 From stuefe at openjdk.org Sun Dec 14 12:00:08 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 14 Dec 2025 12:00:08 GMT Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for large object arrays In-Reply-To: <5niLk8ntVOpDpjrN6jMsolLGSOuY7pEyDLXZJ513KWY=.3bb31fcd-574a-4120-8975-5f82d94ed998@github.com> References: <_WHtdpClhCRzQlZJn-mA8KgNaG9g-nYbxKZ8P16ceMA=.bf58d6e9-67e4-491c-91bd-97c4c8bceb9c@github.com> <5niLk8ntVOpDpjrN6jMsolLGSOuY7pEyDLXZJ513KWY=.3bb31fcd-574a-4120-8975-5f82d94ed998@github.com> Message-ID: On Fri, 12 Dec 2025 21:20:56 GMT, Erik Gahlin wrote: > Is there a way to add a timeout to the test for the failure case? Time outs in unit tests are not completely reliable, but maybe something generous like 10s might be reasonable. Adding to what Eric wrote, there is the timeout of jtreg itself, of course. By default 120secs. But folks can increase or decrease that at command line level. Many thanks for the speedy reviews, @egahlin and @roberttoyonaga ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3650778670 PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3650779585 From stuefe at openjdk.org Sun Dec 14 12:00:10 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 14 Dec 2025 12:00:10 GMT Subject: Integrated: 8373490: JFR Leak Profiler: path-to-gc-root very slow for large object arrays In-Reply-To: References: Message-ID: <56ifPVrvcEVK4r7m9yF7KgXUuruA-bWqWB6vbbuzMm8=.c7adcfbe-36fa-470e-aa90-643d74ce327f@github.com> On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe wrote: > I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions: > > 1) We have large object arrays on the heap > > 2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.cpp#L142-L144). This, in turn, is more likely to happen with large object arrays. > > Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two. > > The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code. > > In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results: > > - Object Array size of 200,000 elements: BFS only, all is well (~5 seconds) > - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds) > - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds) > > The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever. > > Examining the problem more closely, I see: > > - BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem. > > - It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size. > > The reason for this is a missing mark check at the border between BFS and DFS. > > Tests: > > - I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs... This pull request has now been integrated. Changeset: 99f90bef Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/99f90befafe9476de17e416d45a9875569171935 Stats: 208 lines in 2 files changed: 202 ins; 6 del; 0 mod 8373490: JFR Leak Profiler: path-to-gc-root very slow for large object arrays Reviewed-by: egahlin ------------- PR: https://git.openjdk.org/jdk/pull/28781 From egahlin at openjdk.org Mon Dec 15 09:43:41 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Mon, 15 Dec 2025 09:43:41 GMT Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath In-Reply-To: References: Message-ID: <45-fSOH97K8FKgc1YN6w1tKz11nvfnFBOp25uN1s_TI=.cadf2c62-db15-434e-90dd-f18817ab9697@github.com> On Fri, 12 Dec 2025 17:08:14 GMT, Erik Gahlin wrote: >> A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks >> >> A new test was added that fails without the change & passes with it >> >> I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86 > > src/jdk.jfr/share/classes/jdk/jfr/internal/PlatformRecording.java line 108: > >> 106: synchronized (recorder) { >> 107: if (PlatformRecorder.isInShutDown()) { >> 108: throw new IllegalStateException("Flight recorder is already shutdown"); > > I need to think about if throwing ISE is the best alternative here. A user may call recording.start() at any time, and throwing an ISE would break the current API and require callers to guard against shutdown. This would affect not just the Recording class but also RecordingStream and FlightRecorder, and possibly the FlightRecorderMXBean. Another alternative is to add PlatformRecorder.isInShutdown checks in various places (stop, dump etc) to make it a dummy recording, but that becomes complicated. An internal checked exception could help ensure all paths are covered. A third alternative is to avoid destroying native JFR at shutdown. That said, we still need to clean up the disk repository. Markus will be back in January. I think we should wait until he returns to get more input on this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2618681154 From duke at openjdk.org Mon Dec 15 10:23:40 2025 From: duke at openjdk.org (Bara' Hasheesh) Date: Mon, 15 Dec 2025 10:23:40 GMT Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath In-Reply-To: <45-fSOH97K8FKgc1YN6w1tKz11nvfnFBOp25uN1s_TI=.cadf2c62-db15-434e-90dd-f18817ab9697@github.com> References: <45-fSOH97K8FKgc1YN6w1tKz11nvfnFBOp25uN1s_TI=.cadf2c62-db15-434e-90dd-f18817ab9697@github.com> Message-ID: <9MZ4FJ9vdY0QxuvgrnWj-5HlE8rz_xI3bKCyeCj-Upw=.581f8f7b-b8b9-4fc9-a608-e67cb6bb5ef7@github.com> On Mon, 15 Dec 2025 09:40:37 GMT, Erik Gahlin wrote: > A third alternative is to avoid destroying native JFR at shutdown I think for this option it's better to wait on Markus as you mentioned ------- For the other options, I don't like the first option of `isInShutdown` as some APIs, are designed to run fine after the flag is set, such as the `stop` API As for the internal checked exception, the exception by it self won't be sufficient, as the recoding also needs to be transformed into a "dummy recording" as from a user perspective the application continued normally so the recording "started", otherwise this recording becomes a deadlock hazard Now the aspect of a "dummy recording" is interesting, what do you think about adding a new `RecordingState` we call it dummy, which we add checks for where is needed within the various APIs ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2618813671 From egahlin at openjdk.org Mon Dec 15 10:55:34 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Mon, 15 Dec 2025 10:55:34 GMT Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath In-Reply-To: <9MZ4FJ9vdY0QxuvgrnWj-5HlE8rz_xI3bKCyeCj-Upw=.581f8f7b-b8b9-4fc9-a608-e67cb6bb5ef7@github.com> References: <45-fSOH97K8FKgc1YN6w1tKz11nvfnFBOp25uN1s_TI=.cadf2c62-db15-434e-90dd-f18817ab9697@github.com> <9MZ4FJ9vdY0QxuvgrnWj-5HlE8rz_xI3bKCyeCj-Upw=.581f8f7b-b8b9-4fc9-a608-e67cb6bb5ef7@github.com> Message-ID: On Mon, 15 Dec 2025 10:20:34 GMT, Bara' Hasheesh wrote: >> A user may call recording.start() at any time, and throwing an ISE would break the current API and require callers to guard against shutdown. This would affect not just the Recording class but also RecordingStream and FlightRecorder, and possibly the FlightRecorderMXBean. Another alternative is to add PlatformRecorder.isInShutdown checks in various places (stop, dump etc) to make it a dummy recording, but that becomes complicated. An internal checked exception could help ensure all paths are covered. A third alternative is to avoid destroying native JFR at shutdown. That said, we still need to clean up the disk repository. Markus will be back in January. I think we should wait until he returns to get more input on this. > >> A third alternative is to avoid destroying native JFR at shutdown > > I think for this option it's better to wait on Markus as you mentioned > > ------- > > For the other options, I don't like the first option of `isInShutdown` as some APIs, are designed to run fine after the flag is set, such as the `stop` API > > As for the internal checked exception, the exception by it self won't be sufficient, as the recoding also needs to be transformed into a "dummy recording" as from a user perspective the application continued normally so the recording "started", otherwise this recording becomes a deadlock hazard > > Now the aspect of a "dummy recording" is interesting, what do you think about adding a new `RecordingState` we call it dummy, which we add checks for where is needed within the various APIs I thought about it, but it doesn't solve the underlying problem and may break things for API users. We might as well use RUNNING and some internal flag. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2618923020 From duke at openjdk.org Tue Dec 16 09:08:25 2025 From: duke at openjdk.org (Bara' Hasheesh) Date: Tue, 16 Dec 2025 09:08:25 GMT Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath [v2] In-Reply-To: References: Message-ID: > A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks > > A new test was added that fails without the change & passes with it > > I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86 Bara' Hasheesh has updated the pull request incrementally with one additional commit since the last revision: flags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28767/files - new: https://git.openjdk.org/jdk/pull/28767/files/d1eec7c3..e8811e0f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28767&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28767&range=00-01 Stats: 65 lines in 4 files changed: 53 ins; 6 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/28767.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28767/head:pull/28767 PR: https://git.openjdk.org/jdk/pull/28767 From egahlin at openjdk.org Tue Dec 16 09:41:03 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 16 Dec 2025 09:41:03 GMT Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath [v2] In-Reply-To: References: <45-fSOH97K8FKgc1YN6w1tKz11nvfnFBOp25uN1s_TI=.cadf2c62-db15-434e-90dd-f18817ab9697@github.com> <9MZ4FJ9vdY0QxuvgrnWj-5HlE8rz_xI3bKCyeCj-Upw=.581f8f7b-b8b9-4fc9-a608-e67cb6bb5ef7@github.com> Message-ID: On Mon, 15 Dec 2025 10:53:21 GMT, Erik Gahlin wrote: >>> A third alternative is to avoid destroying native JFR at shutdown >> >> I think for this option it's better to wait on Markus as you mentioned >> >> ------- >> >> For the other options, I don't like the first option of `isInShutdown` as some APIs, are designed to run fine after the flag is set, such as the `stop` API >> >> As for the internal checked exception, the exception by it self won't be sufficient, as the recoding also needs to be transformed into a "dummy recording" as from a user perspective the application continued normally so the recording "started", otherwise this recording becomes a deadlock hazard >> >> Now the aspect of a "dummy recording" is interesting, what do you think about adding a new `RecordingState` we call it dummy, which we add checks for where is needed within the various APIs > > I thought about it, but it doesn't solve the underlying problem and may break things for API users. We might as well use RUNNING and some internal flag. I don't think it makes sense to implement anything until we have resolved the design, which is best done after New Year's. Also, I will be on vacation, so my time for reviews will be limited. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2622512926 From duke at openjdk.org Tue Dec 16 09:41:04 2025 From: duke at openjdk.org (Bara' Hasheesh) Date: Tue, 16 Dec 2025 09:41:04 GMT Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath [v2] In-Reply-To: References: <45-fSOH97K8FKgc1YN6w1tKz11nvfnFBOp25uN1s_TI=.cadf2c62-db15-434e-90dd-f18817ab9697@github.com> <9MZ4FJ9vdY0QxuvgrnWj-5HlE8rz_xI3bKCyeCj-Upw=.581f8f7b-b8b9-4fc9-a608-e67cb6bb5ef7@github.com> Message-ID: On Tue, 16 Dec 2025 09:36:23 GMT, Erik Gahlin wrote: >> I thought about it, but it doesn't solve the underlying problem and may break things for API users. We might as well use RUNNING and some internal flag. > > I don't think it makes sense to implement anything until we have resolved the design, which is best done after New Year's. Also, I will be on vacation, so my time for reviews will be limited. Noted, I will put this on hold until next year ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2622519564 From stuefe at openjdk.org Wed Dec 17 09:55:04 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 17 Dec 2025 09:55:04 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v5] In-Reply-To: References: Message-ID: > A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. > > We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. > > This RFE changes the algorithm to be non-recursive. > > Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. > > Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. > > Testing: > > - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out > - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical > - Ran locally all jtreg tests in jdk/jfr > - GHAs Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into JFR-leak-profiler-path-to-gc-roots-non-recursive - final fixes - revert accidental checkin - test improvements - fix - remove test output - Copyright - start ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28659/files - new: https://git.openjdk.org/jdk/pull/28659/files/73497c30..10bc510b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=03-04 Stats: 45414 lines in 841 files changed: 30528 ins; 10541 del; 4345 mod Patch: https://git.openjdk.org/jdk/pull/28659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28659/head:pull/28659 PR: https://git.openjdk.org/jdk/pull/28659 From stuefe at openjdk.org Wed Dec 17 10:06:59 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 17 Dec 2025 10:06:59 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v6] In-Reply-To: References: Message-ID: > A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. > > We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. > > This RFE changes the algorithm to be non-recursive. > > Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. > > Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. > > Testing: > > - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out > - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical > - Ran locally all jtreg tests in jdk/jfr > - GHAs Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - completely revert test changes - revert part of the test changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28659/files - new: https://git.openjdk.org/jdk/pull/28659/files/10bc510b..09886f48 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=04-05 Stats: 108 lines in 1 file changed: 9 ins; 87 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/28659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28659/head:pull/28659 PR: https://git.openjdk.org/jdk/pull/28659 From stuefe at openjdk.org Thu Dec 18 10:11:20 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 18 Dec 2025 10:11:20 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v7] In-Reply-To: References: Message-ID: > A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. > > We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. > > This RFE changes the algorithm to be non-recursive. > > Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. > > Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. > > Testing: > > - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out > - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical > - Ran locally all jtreg tests in jdk/jfr > - GHAs Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: do strides for arrays ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28659/files - new: https://git.openjdk.org/jdk/pull/28659/files/09886f48..94ce9065 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=05-06 Stats: 28 lines in 2 files changed: 21 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/28659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28659/head:pull/28659 PR: https://git.openjdk.org/jdk/pull/28659 From duke at openjdk.org Thu Dec 18 20:31:07 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Thu, 18 Dec 2025 20:31:07 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v7] In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 10:11:20 GMT, Thomas Stuefe wrote: >> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation. >> >> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small. >> >> This RFE changes the algorithm to be non-recursive. >> >> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first. >> >> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE. >> >> Testing: >> >> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out >> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical >> - Ran locally all jtreg tests in jdk/jfr >> - GHAs > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > do strides for arrays I think adding this array striding is a good idea to avoid flooding the stack due to large arrays. I left a comment about a possible problem below. src/hotspot/share/jfr/leakprofiler/chains/dfsClosure.cpp line 134: > 132: ProbeStackItem psi2 = psi; > 133: psi2.chunk ++; > 134: _probe_stack.push(psi2); Could it be a problem that `pointee` has already been marked? To accomplish the striding, the same `pointee` needs to be revisited with the new chunk count to evaluate the next range. However, the next time it's popped off the stack, it will get skipped over on line 108 since it's already been marked. I ran `TestJcmdDumpPathToGCRootsBFSDFS.java` and the test passes even after I add `assert(psi.chunk==0 )` in this block, which might indicate only the first range of each array is ever getting evaluated. ------------- PR Review: https://git.openjdk.org/jdk/pull/28659#pullrequestreview-3594914234 PR Review Comment: https://git.openjdk.org/jdk/pull/28659#discussion_r2632525732 From haosun at openjdk.org Fri Dec 19 06:50:19 2025 From: haosun at openjdk.org (Hao Sun) Date: Fri, 19 Dec 2025 06:50:19 GMT Subject: RFR: 8373122: JFR build failure with CDS disabled due to -Werror=unused-function after JDK-8365400 Message-ID: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com> `get_source(const AOTClassLocation* cl, JavaThread* jt)` is only used by `JfrClassDefineEvent::on_restoration()` within an `INCLUDE_CDS` guard. Add the same `INCLUDE_CDS` guard to mitigate the GCC warning. Tests: JDK build with CDS disabled passed on both x86_64 and aarch64. tier1~3 passed on both x86_64 and aarch64. ------------- Commit messages: - 8373122: JFR build failure with CDS disabled due to -Werror=unused-function after JDK-8365400 Changes: https://git.openjdk.org/jdk/pull/28917/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28917&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373122 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28917.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28917/head:pull/28917 PR: https://git.openjdk.org/jdk/pull/28917 From stuefe at openjdk.org Fri Dec 19 07:09:58 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 19 Dec 2025 07:09:58 GMT Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v7] In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 20:26:14 GMT, Robert Toyonaga wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> do strides for arrays > > src/hotspot/share/jfr/leakprofiler/chains/dfsClosure.cpp line 134: > >> 132: ProbeStackItem psi2 = psi; >> 133: psi2.chunk ++; >> 134: _probe_stack.push(psi2); > > Could checking the marked status of `pointee` now become a problem ? To accomplish the striding, the same `pointee` needs to be revisited with the new chunk count to evaluate the next range. However, the next time it's popped off the stack, it will get skipped over on line 108 since it's already been marked. > > `TestJcmdDumpPathToGCRootsBFSDFS.java` passes even after I add `assert(psi.chunk==0 )` in this block after line 122, which might indicate only the first range of each array is ever getting evaluated. Yes, I saw this yesterday too. The current version does not work. I am rethinking the solution. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28659#discussion_r2633959011 From fandreuzzi at openjdk.org Fri Dec 19 10:18:19 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Fri, 19 Dec 2025 10:18:19 GMT Subject: RFR: 8373122: JFR build failure with CDS disabled due to -Werror=unused-function after JDK-8365400 In-Reply-To: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com> References: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com> Message-ID: <3__AkRGeel74eUCetVlLDTiL18A3RxM_OlTHuO-vTt0=.3296278a-44a1-4095-b390-1fc510659d25@github.com> On Fri, 19 Dec 2025 06:43:05 GMT, Hao Sun wrote: > `get_source(const AOTClassLocation* cl, JavaThread* jt)` is only used by `JfrClassDefineEvent::on_restoration()` within an `INCLUDE_CDS` guard. > > Add the same `INCLUDE_CDS` guard to mitigate the GCC warning. > > Tests: JDK build with CDS disabled passed on both x86_64 and aarch64. > tier1~3 passed on both x86_64 and aarch64. Marked as reviewed by fandreuzzi (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28917#pullrequestreview-3597965372 From duke at openjdk.org Fri Dec 19 12:15:42 2025 From: duke at openjdk.org (duke) Date: Fri, 19 Dec 2025 12:15:42 GMT Subject: Withdrawn: 8365306: Provide OS Process Size and Libc statistic metrics to JFR In-Reply-To: References: Message-ID: <0qN4sAF2Ov5kJnH-f5HiWvk93t29ycVow454Rw9lsAs=.f548a8b3-387a-4f7b-9eb5-41f259a5e2f7@github.com> On Wed, 13 Aug 2025 09:42:57 GMT, Thomas Stuefe wrote: > This provides the following new metrics: > - `ProcessSize` event (new, periodic) > - vsize (for analyzing address-space fragmentation issues) > - RSS including subtypes (subtypes are useful for excluding atypical issues, e.g. kernel problems that cause large file buffer bloat) > - peak RSS > - process swap (if we swap we cannot trust the RSS values, plus it indicates bad sizing) > - pte size (to quickly see if we run with a super-large working set but an unsuitably small page size) > - `LibcStatistics` (new, periodic) > - outstanding malloc size (important counterpoint to whatever NMT tries to tell me, which alone is often misleading) > - retained malloc size (super-important for the same reason) > - number of libc trims the hotspot executed (needed to gauge the usefulness of the retain counter, and to see if a customer employs native heap auto trimming (`-XX:TrimNativeHeapInterval`) > - `NativeHeapTrim` (new, event-driven) (for both manual and automatic trims) > - RSS before and RSS after > - RSS recovered by this trim > - whether it was an automatic or manual trim > - duration > - `JavaThreadStatistic` > - os thread counter (new field) (useful to understand the behavior of third-party code in our process if threads are created that bypass the JVM. E.g. some custom launchers do that.) > - nonJava thread counter (new field) (needed to interprete the os thread counter) > > Notes: > - we already have `ResidentSetSize` event, and the new `ProcessSize` event is a superset of that. I don't know how these cases are handled. I'd prefer to throw the old event out, but JMC has a hard-coded chart for RSS, so I kept it in unless someone tells me to remove it. > > - Obviously, the libc events are very platform-specific. Still, I argue that these metrics are highly useful. We want people to use JFR and JMC; people include developers that are dealing with performance problems that require platform-specific knowledge to understand. See my comment in the JBS issue. > > I provided implementations, as far as possible, to Linux, MacOS and Windows. > > Testing: > - ran the new tests manually and as part of GHAs This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/26756