From mgronlun at openjdk.org  Mon Dec  1 12:17:21 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Mon, 1 Dec 2025 12:17:21 GMT
Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for
 class loading [v7]
In-Reply-To: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
Message-ID: <QWOcsVhbwx4YOi3iD737sRtRuptx6zalMIhIO89ol7U=.2ff42b09-8372-457d-bade-6fccefc203c1@github.com>

> Greetings,
> 
> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event.
> 
> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements.
> 
> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths.
> 
> Testing: jdk_jfr, manual AOT verification, stress testing
> 
> Thanks
> Markus

Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:

  adjustments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28505/files
  - new: https://git.openjdk.org/jdk/pull/28505/files/c0e1124e..ba54d2af

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=05-06

  Stats: 11 lines in 4 files changed: 0 ins; 1 del; 10 mod
  Patch: https://git.openjdk.org/jdk/pull/28505.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505

PR: https://git.openjdk.org/jdk/pull/28505

From mgronlun at openjdk.org  Mon Dec  1 20:36:37 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Mon, 1 Dec 2025 20:36:37 GMT
Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for
 class loading [v8]
In-Reply-To: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
Message-ID: <BUM0EuGruVXh4gaNIrPA5DNCN8envc2o1NdNppsQ6YI=.7e8b63d2-e755-4e31-b1d0-1c581aa0ab0f@github.com>

> Greetings,
> 
> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event.
> 
> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements.
> 
> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths.
> 
> Testing: jdk_jfr, manual AOT verification, stress testing
> 
> Thanks
> Markus

Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:

  GROUP BY definedClass is redundant

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28505/files
  - new: https://git.openjdk.org/jdk/pull/28505/files/ba54d2af..9782480e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=06-07

  Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/28505.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505

PR: https://git.openjdk.org/jdk/pull/28505

From mgronlun at openjdk.org  Mon Dec  1 22:11:41 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Mon, 1 Dec 2025 22:11:41 GMT
Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for
 class loading [v9]
In-Reply-To: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
Message-ID: <BvTiv0oGH_ssVsVfQVsKqNZIF99wa9s9pUifYHNkquw=.a4c7d86c-0fe0-4463-ad83-6b84a988371b@github.com>

> Greetings,
> 
> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event.
> 
> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements.
> 
> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths.
> 
> Testing: jdk_jfr, manual AOT verification, stress testing
> 
> Thanks
> Markus

Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:

  Remove class definitions view in favor of jdk.ClassDefine event view

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28505/files
  - new: https://git.openjdk.org/jdk/pull/28505/files/9782480e..6a3a8c16

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=07-08

  Stats: 8 lines in 1 file changed: 0 ins; 8 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/28505.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505

PR: https://git.openjdk.org/jdk/pull/28505

From mgronlun at openjdk.org  Mon Dec  1 23:12:29 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Mon, 1 Dec 2025 23:12:29 GMT
Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for
 class loading [v10]
In-Reply-To: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
Message-ID: <O2xxZgwS-MaJkqzEZbtbVZZNZXfOEplQplUmgtg00w8=.cb1b48d6-9c35-45b1-a081-aa01fc987ea1@github.com>

> Greetings,
> 
> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event.
> 
> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements.
> 
> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths.
> 
> Testing: jdk_jfr, manual AOT verification, stress testing
> 
> Thanks
> Markus

Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:

  correct trunctation

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28505/files
  - new: https://git.openjdk.org/jdk/pull/28505/files/6a3a8c16..b659a814

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=09
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=08-09

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/28505.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505

PR: https://git.openjdk.org/jdk/pull/28505

From dholmes at openjdk.org  Tue Dec  2 02:24:47 2025
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 2 Dec 2025 02:24:47 GMT
Subject: RFR: 8372587: Put jdk/jfr/jvm/TestWaste.java into the ProblemList
In-Reply-To: <H247MCPHJP5uabQALazU-mg0l3EtbtM1L-5EZVkwemg=.03b1dbd6-e16e-406b-b118-ba45572e38d6@github.com>
References: <H247MCPHJP5uabQALazU-mg0l3EtbtM1L-5EZVkwemg=.03b1dbd6-e16e-406b-b118-ba45572e38d6@github.com>
Message-ID: <xEi9QPbFPrudDVIpAhnu14NHqCJQyzEA9jyjDAvoXNo=.74ea9a02-597a-41cc-a91e-2817f381cb36@github.com>

On Thu, 27 Nov 2025 16:04:56 GMT, Kerem Kat <krk at openjdk.org> wrote:

> It still fails in oracle environment as described in [JDK-8371630](https://bugs.openjdk.org/browse/JDK-8371630).

Unfortunately this only went to the JFR mailing list so I did not see it till I tried to track it down. Even more unfortunately no-one on the JFR mailing list deigned to looked at this.

The fix is incorrect - see comment.

test/jdk/ProblemList.txt line 753:

> 751: jdk/jfr/event/oldobject/TestShenandoah.java                     8342951 generic-all
> 752: jdk/jfr/event/runtime/TestResidentSetSizeEvent.java             8309846 aix-ppc64
> 753: jdk/jfr/jvm/TestWaste.java                                      8372587 generic-all

Suggestion:

jdk/jfr/jvm/TestWaste.java                                      8371630 generic-all

You used the wrong bug id.

-------------

Changes requested by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/28539#pullrequestreview-3527885093
PR Review Comment: https://git.openjdk.org/jdk/pull/28539#discussion_r2579362343

From krk at openjdk.org  Tue Dec  2 10:26:15 2025
From: krk at openjdk.org (Kerem Kat)
Date: Tue, 2 Dec 2025 10:26:15 GMT
Subject: RFR: 8372587: Put jdk/jfr/jvm/TestWaste.java into the ProblemList
 [v2]
In-Reply-To: <H247MCPHJP5uabQALazU-mg0l3EtbtM1L-5EZVkwemg=.03b1dbd6-e16e-406b-b118-ba45572e38d6@github.com>
References: <H247MCPHJP5uabQALazU-mg0l3EtbtM1L-5EZVkwemg=.03b1dbd6-e16e-406b-b118-ba45572e38d6@github.com>
Message-ID: <wY-O955cGahyVezW1hfZEr5A7NI-PfSEAZGPS6zRr_g=.e56a4ef1-c4e6-479f-8349-f8fde7eeb8b2@github.com>

> It still fails in oracle environment as described in [JDK-8371630](https://bugs.openjdk.org/browse/JDK-8371630).

Kerem Kat has updated the pull request incrementally with one additional commit since the last revision:

  Update test/jdk/ProblemList.txt
  
  Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28539/files
  - new: https://git.openjdk.org/jdk/pull/28539/files/1d625192..ab4f9245

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28539&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28539&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/28539.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28539/head:pull/28539

PR: https://git.openjdk.org/jdk/pull/28539

From krk at openjdk.org  Tue Dec  2 10:26:18 2025
From: krk at openjdk.org (Kerem Kat)
Date: Tue, 2 Dec 2025 10:26:18 GMT
Subject: RFR: 8372587: Put jdk/jfr/jvm/TestWaste.java into the ProblemList
 [v2]
In-Reply-To: <xEi9QPbFPrudDVIpAhnu14NHqCJQyzEA9jyjDAvoXNo=.74ea9a02-597a-41cc-a91e-2817f381cb36@github.com>
References: <H247MCPHJP5uabQALazU-mg0l3EtbtM1L-5EZVkwemg=.03b1dbd6-e16e-406b-b118-ba45572e38d6@github.com>
 <xEi9QPbFPrudDVIpAhnu14NHqCJQyzEA9jyjDAvoXNo=.74ea9a02-597a-41cc-a91e-2817f381cb36@github.com>
Message-ID: <L-bRKMi3YyolsovtffGgtKco19LGPULT5ALy6mEz-hk=.e2832c60-e83c-4c4a-89ad-41e20dbfd4ed@github.com>

On Tue, 2 Dec 2025 02:20:40 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Kerem Kat has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update test/jdk/ProblemList.txt
>>   
>>   Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com>
>
> test/jdk/ProblemList.txt line 753:
> 
>> 751: jdk/jfr/event/oldobject/TestShenandoah.java                     8342951 generic-all
>> 752: jdk/jfr/event/runtime/TestResidentSetSizeEvent.java             8309846 aix-ppc64
>> 753: jdk/jfr/jvm/TestWaste.java                                      8372587 generic-all
> 
> Suggestion:
> 
> jdk/jfr/jvm/TestWaste.java                                      8371630 generic-all
> 
> You used the wrong bug id.

Thanks, I did use the subtask id.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28539#discussion_r2580553213

From krk at openjdk.org  Tue Dec  2 13:23:07 2025
From: krk at openjdk.org (Kerem Kat)
Date: Tue, 2 Dec 2025 13:23:07 GMT
Subject: RFR: 8372587: Put jdk/jfr/jvm/TestWaste.java into the ProblemList
 [v2]
In-Reply-To: <wY-O955cGahyVezW1hfZEr5A7NI-PfSEAZGPS6zRr_g=.e56a4ef1-c4e6-479f-8349-f8fde7eeb8b2@github.com>
References: <H247MCPHJP5uabQALazU-mg0l3EtbtM1L-5EZVkwemg=.03b1dbd6-e16e-406b-b118-ba45572e38d6@github.com>
 <wY-O955cGahyVezW1hfZEr5A7NI-PfSEAZGPS6zRr_g=.e56a4ef1-c4e6-479f-8349-f8fde7eeb8b2@github.com>
Message-ID: <71nBpRWdBsXEFaQMsne-_vwcwSOEoYOgRkQRpyRn6wI=.d39ad7b5-22eb-44ac-9027-2e3742480cc2@github.com>

On Tue, 2 Dec 2025 10:26:15 GMT, Kerem Kat <krk at openjdk.org> wrote:

>> It still fails in oracle environment as described in [JDK-8371630](https://bugs.openjdk.org/browse/JDK-8371630).
>
> Kerem Kat has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update test/jdk/ProblemList.txt
>   
>   Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com>

Failing test is unrelated:

> TEST: gc/TestAllocHumongousFragment.java#generational


 Internal Error (/home/runner/work/jdk/jdk/src/hotspot/share/oops/compressedOops.inline.hpp:58), pid=8050, tid=8054
#  assert(Universe::is_in_heap(result)) failed: object not in heap 0x00000000fc300000
#
# JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-krk-ab4f92453ba582b6c94007ba80e74d6a025d20e5)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-krk-ab4f92453ba582b6c94007ba80e74d6a025d20e5, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, shenandoah gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x72fcfe]  CompressedOops::decode_not_null(narrowOop)+0x13e

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28539#issuecomment-3602025405

From mgronlun at openjdk.org  Tue Dec  2 15:59:00 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Tue, 2 Dec 2025 15:59:00 GMT
Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for
 class loading [v11]
In-Reply-To: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
Message-ID: <iqoQPPCntSTG9SuWwEIjHKS_l_vtQZf9jdpOVqkjPH8=.3ef2a5fd-9058-426e-8b55-8ba43816b538@github.com>

> Greetings,
> 
> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event.
> 
> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements.
> 
> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths.
> 
> Testing: jdk_jfr, manual AOT verification, stress testing
> 
> Thanks
> Markus

Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:

  fixes

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28505/files
  - new: https://git.openjdk.org/jdk/pull/28505/files/b659a814..66e63a40

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=10
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=09-10

  Stats: 10 lines in 2 files changed: 0 ins; 8 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/28505.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505

PR: https://git.openjdk.org/jdk/pull/28505

From mgronlun at openjdk.org  Tue Dec  2 16:11:17 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Tue, 2 Dec 2025 16:11:17 GMT
Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for
 class loading [v12]
In-Reply-To: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
Message-ID: <bG4vHKUZzFIaLUsMCQBroZIpuVvqbYZjON8UV0hE43A=.625565db-4cec-4ccb-95e0-a852f110753a@github.com>

> Greetings,
> 
> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event.
> 
> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements.
> 
> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths.
> 
> Testing: jdk_jfr, manual AOT verification, stress testing
> 
> Thanks
> Markus

Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:

  restore longest-class-loading view

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28505/files
  - new: https://git.openjdk.org/jdk/pull/28505/files/66e63a40..3a44c10e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=11
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=10-11

  Stats: 6 lines in 1 file changed: 0 ins; 3 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/28505.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505

PR: https://git.openjdk.org/jdk/pull/28505

From mgronlun at openjdk.org  Tue Dec  2 16:50:47 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Tue, 2 Dec 2025 16:50:47 GMT
Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for
 class loading [v13]
In-Reply-To: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
Message-ID: <4EZHq7TmhEpxqD3WRm4D8LHVXPuJabCRjTXOi0pPtZE=.866ecc1e-b0be-4797-a6f9-2a434a087bda@github.com>

> Greetings,
> 
> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event.
> 
> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements.
> 
> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths.
> 
> Testing: jdk_jfr, manual AOT verification, stress testing
> 
> Thanks
> Markus

Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:

  use strcmp instead of strncmp for equals

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28505/files
  - new: https://git.openjdk.org/jdk/pull/28505/files/3a44c10e..c2e36d2f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=12
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=11-12

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/28505.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505

PR: https://git.openjdk.org/jdk/pull/28505

From duke at openjdk.org  Tue Dec  2 16:55:03 2025
From: duke at openjdk.org (Robert Toyonaga)
Date: Tue, 2 Dec 2025 16:55:03 GMT
Subject: RFR: 8370715: JFR: Races are possible when dumping recordings [v4]
In-Reply-To: <kp7YrMJipPoIvIgYHFztzJpoDm_opt1X-q_IQiJ_t-I=.af64c82f-64cd-405c-be25-21ef21b8bdc0@github.com>
References: <kp7YrMJipPoIvIgYHFztzJpoDm_opt1X-q_IQiJ_t-I=.af64c82f-64cd-405c-be25-21ef21b8bdc0@github.com>
Message-ID: <FttY7XYQZoRgrN-MAD_vhi1N2VqJLa5DQIoZrO9KFHY=.b538a62f-fbbe-461c-a587-8fd39824c49a@github.com>

> #### Summary
> This PR changes the JFR snapshot dumping code so that multiple JFR recordings (potentially from different processes) racing to write the same dump destination won't mix their data. 
> 
> #### Problem
> The dump destination file is created and/or wiped when a recording is started, but not wiped again before actually copying over the chunks during a dump. So in the window of time between creating/wiping and dumping chunks, another recording could write to the same dump destination and have it's chunks added to the snapshot. This can happen with either a single JVM or multiple JVMs that are racing.
> 
> #### Proposed fix
> This PR ensures that any data previously written to the dump destination is wiped before the new recording's data is written.  File locking is also done while chunks are being written. 
> 
> Testing:
> - new test `jdk/jdk/jfr/api/recording/dump/TestDumpOverwrite.java`
> - Tier 1

Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision:

  change WARN to INFO

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28460/files
  - new: https://git.openjdk.org/jdk/pull/28460/files/f3f7da42..fe408389

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28460&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28460&range=02-03

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/28460.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28460/head:pull/28460

PR: https://git.openjdk.org/jdk/pull/28460

From egahlin at openjdk.org  Tue Dec  2 16:55:06 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Tue, 2 Dec 2025 16:55:06 GMT
Subject: RFR: 8370715: JFR: Races are possible when dumping recordings [v3]
In-Reply-To: <hlvXBQB4pPIq_gm7lenobIWV4EkG-THVhDjfT3wXaig=.c07bc5ad-eafe-4a54-9d84-12bf48830a23@github.com>
References: <kp7YrMJipPoIvIgYHFztzJpoDm_opt1X-q_IQiJ_t-I=.af64c82f-64cd-405c-be25-21ef21b8bdc0@github.com>
 <hlvXBQB4pPIq_gm7lenobIWV4EkG-THVhDjfT3wXaig=.c07bc5ad-eafe-4a54-9d84-12bf48830a23@github.com>
Message-ID: <b9OVetmBM1d-M6w3JED4pviUDaWRHKvHnAMXmLmIMBg=.692553dc-a15b-452c-97e0-7d5c14b3c4b1@github.com>

On Wed, 26 Nov 2025 17:51:02 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> #### Summary
>> This PR changes the JFR snapshot dumping code so that multiple JFR recordings (potentially from different processes) racing to write the same dump destination won't mix their data. 
>> 
>> #### Problem
>> The dump destination file is created and/or wiped when a recording is started, but not wiped again before actually copying over the chunks during a dump. So in the window of time between creating/wiping and dumping chunks, another recording could write to the same dump destination and have it's chunks added to the snapshot. This can happen with either a single JVM or multiple JVMs that are racing.
>> 
>> #### Proposed fix
>> This PR ensures that any data previously written to the dump destination is wiped before the new recording's data is written.  File locking is also done while chunks are being written. 
>> 
>> Testing:
>> - new test `jdk/jdk/jfr/api/recording/dump/TestDumpOverwrite.java`
>> - Tier 1
>
> Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision:
> 
>   vm.flagless and fix copyright header

Looks good, but I wonder if LogLevel.INFO should be used instead of WARN, similar to the Transferred bytes log entry? 

The fix now ensures that dump files are in a consistent state, which is good, but I'm not sure writing a file simultaneously should constitute a warning. If the process overwrites a file 1 ns before or 1 ns after the lock, a warning is not issued. In all cases, one of the files be overwritten (since that is what the user has requested). This isn't a major concern for me since it's unlikely to occur in practice, so I'm not insisting on a fix.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28460#issuecomment-3602058130

From duke at openjdk.org  Tue Dec  2 17:02:36 2025
From: duke at openjdk.org (Robert Toyonaga)
Date: Tue, 2 Dec 2025 17:02:36 GMT
Subject: RFR: 8370715: JFR: Races are possible when dumping recordings [v4]
In-Reply-To: <FttY7XYQZoRgrN-MAD_vhi1N2VqJLa5DQIoZrO9KFHY=.b538a62f-fbbe-461c-a587-8fd39824c49a@github.com>
References: <kp7YrMJipPoIvIgYHFztzJpoDm_opt1X-q_IQiJ_t-I=.af64c82f-64cd-405c-be25-21ef21b8bdc0@github.com>
 <FttY7XYQZoRgrN-MAD_vhi1N2VqJLa5DQIoZrO9KFHY=.b538a62f-fbbe-461c-a587-8fd39824c49a@github.com>
Message-ID: <-IjRKg6OdsOfzLikT9D8WeYBkTrOv-G9ZyJKREfRhgU=.f59e39e8-a500-4c55-8970-3ee57d020968@github.com>

On Tue, 2 Dec 2025 16:55:03 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> #### Summary
>> This PR changes the JFR snapshot dumping code so that multiple JFR recordings (potentially from different processes) racing to write the same dump destination won't mix their data. 
>> 
>> #### Problem
>> The dump destination file is created and/or wiped when a recording is started, but not wiped again before actually copying over the chunks during a dump. So in the window of time between creating/wiping and dumping chunks, another recording could write to the same dump destination and have it's chunks added to the snapshot. This can happen with either a single JVM or multiple JVMs that are racing.
>> 
>> #### Proposed fix
>> This PR ensures that any data previously written to the dump destination is wiped before the new recording's data is written.  File locking is also done while chunks are being written. 
>> 
>> Testing:
>> - new test `jdk/jdk/jfr/api/recording/dump/TestDumpOverwrite.java`
>> - Tier 1
>
> Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision:
> 
>   change WARN to INFO

Yes, I am also okay with changing WARN to INFO. As long as there is some record the user can find of what happened. 

Thank you for the review feedback @egahlin

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28460#issuecomment-3603061200

From egahlin at openjdk.org  Tue Dec  2 17:32:55 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Tue, 2 Dec 2025 17:32:55 GMT
Subject: RFR: 8370715: JFR: Races are possible when dumping recordings [v4]
In-Reply-To: <FttY7XYQZoRgrN-MAD_vhi1N2VqJLa5DQIoZrO9KFHY=.b538a62f-fbbe-461c-a587-8fd39824c49a@github.com>
References: <kp7YrMJipPoIvIgYHFztzJpoDm_opt1X-q_IQiJ_t-I=.af64c82f-64cd-405c-be25-21ef21b8bdc0@github.com>
 <FttY7XYQZoRgrN-MAD_vhi1N2VqJLa5DQIoZrO9KFHY=.b538a62f-fbbe-461c-a587-8fd39824c49a@github.com>
Message-ID: <KGTIS3t_mi8AYV5u4uxqY1DrhB965XF_ttonf0V3mn4=.fef1e417-523d-4462-a6dd-10237c095d83@github.com>

On Tue, 2 Dec 2025 16:55:03 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> #### Summary
>> This PR changes the JFR snapshot dumping code so that multiple JFR recordings (potentially from different processes) racing to write the same dump destination won't mix their data. 
>> 
>> #### Problem
>> The dump destination file is created and/or wiped when a recording is started, but not wiped again before actually copying over the chunks during a dump. So in the window of time between creating/wiping and dumping chunks, another recording could write to the same dump destination and have it's chunks added to the snapshot. This can happen with either a single JVM or multiple JVMs that are racing.
>> 
>> #### Proposed fix
>> This PR ensures that any data previously written to the dump destination is wiped before the new recording's data is written.  File locking is also done while chunks are being written. 
>> 
>> Testing:
>> - new test `jdk/jdk/jfr/api/recording/dump/TestDumpOverwrite.java`
>> - Tier 1
>
> Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision:
> 
>   change WARN to INFO

Marked as reviewed by egahlin (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/28460#pullrequestreview-3531349417

From duke at openjdk.org  Tue Dec  2 17:46:35 2025
From: duke at openjdk.org (duke)
Date: Tue, 2 Dec 2025 17:46:35 GMT
Subject: RFR: 8370715: JFR: Races are possible when dumping recordings [v4]
In-Reply-To: <FttY7XYQZoRgrN-MAD_vhi1N2VqJLa5DQIoZrO9KFHY=.b538a62f-fbbe-461c-a587-8fd39824c49a@github.com>
References: <kp7YrMJipPoIvIgYHFztzJpoDm_opt1X-q_IQiJ_t-I=.af64c82f-64cd-405c-be25-21ef21b8bdc0@github.com>
 <FttY7XYQZoRgrN-MAD_vhi1N2VqJLa5DQIoZrO9KFHY=.b538a62f-fbbe-461c-a587-8fd39824c49a@github.com>
Message-ID: <M7K6HEfssvTO8ZnAKjw0qd4EYnhSERsao0KP7KkKgeA=.4b82fcf5-8f6a-462f-b68c-4e6855dcd507@github.com>

On Tue, 2 Dec 2025 16:55:03 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> #### Summary
>> This PR changes the JFR snapshot dumping code so that multiple JFR recordings (potentially from different processes) racing to write the same dump destination won't mix their data. 
>> 
>> #### Problem
>> The dump destination file is created and/or wiped when a recording is started, but not wiped again before actually copying over the chunks during a dump. So in the window of time between creating/wiping and dumping chunks, another recording could write to the same dump destination and have it's chunks added to the snapshot. This can happen with either a single JVM or multiple JVMs that are racing.
>> 
>> #### Proposed fix
>> This PR ensures that any data previously written to the dump destination is wiped before the new recording's data is written.  File locking is also done while chunks are being written. 
>> 
>> Testing:
>> - new test `jdk/jdk/jfr/api/recording/dump/TestDumpOverwrite.java`
>> - Tier 1
>
> Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision:
> 
>   change WARN to INFO

@roberttoyonaga 
Your change (at version fe40838989a25ac6166686b382287d427342609d) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28460#issuecomment-3603251639

From dholmes at openjdk.org  Tue Dec  2 22:03:20 2025
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 2 Dec 2025 22:03:20 GMT
Subject: RFR: 8372587: Put jdk/jfr/jvm/TestWaste.java into the ProblemList
 [v2]
In-Reply-To: <wY-O955cGahyVezW1hfZEr5A7NI-PfSEAZGPS6zRr_g=.e56a4ef1-c4e6-479f-8349-f8fde7eeb8b2@github.com>
References: <H247MCPHJP5uabQALazU-mg0l3EtbtM1L-5EZVkwemg=.03b1dbd6-e16e-406b-b118-ba45572e38d6@github.com>
 <wY-O955cGahyVezW1hfZEr5A7NI-PfSEAZGPS6zRr_g=.e56a4ef1-c4e6-479f-8349-f8fde7eeb8b2@github.com>
Message-ID: <V0ckFQlCLaUS-N647zOJGi5YZQv4mLLXPh0AmogPIz0=.90a7ed69-fcb5-498c-9e7d-7d19a72aa013@github.com>

On Tue, 2 Dec 2025 10:26:15 GMT, Kerem Kat <krk at openjdk.org> wrote:

>> It still fails in oracle environment as described in [JDK-8371630](https://bugs.openjdk.org/browse/JDK-8371630).
>
> Kerem Kat has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update test/jdk/ProblemList.txt
>   
>   Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com>

Good. Please integrate.

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/28539#pullrequestreview-3532192517

From ysuenaga at openjdk.org  Wed Dec  3 10:27:04 2025
From: ysuenaga at openjdk.org (Yasumasa Suenaga)
Date: Wed, 3 Dec 2025 10:27:04 GMT
Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is
 incorrectly implemented
Message-ID: <SiBemHHhiHk3uDElvjRK16U2SNea7P1wsptA_4z7kxA=.0746951f-41b4-46b4-8397-40603dea6298@github.com>

The jtreg test TestEmergencyDumpAtOOM.java runs into the following error on ppc64 platforms.

JFR emergency dump would be kicked at `VMError::report_and_die()`, then Java thread for JFR would not work due to secondary signal handler for error reporting.

Passed all of jdk_jfr tests on Linux AMD64.

-------------

Commit messages:
 - Fix typo
 - Delete TestEmergencyDumpAtOOM.java from ProblemList
 - 8371014: Dump JFR recording on CrashOnOutOfMemoryError is incorrectly implemented

Changes: https://git.openjdk.org/jdk/pull/28563/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28563&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8371014
  Stats: 31 lines in 8 files changed: 23 ins; 3 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/28563.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28563/head:pull/28563

PR: https://git.openjdk.org/jdk/pull/28563

From mbaesken at openjdk.org  Wed Dec  3 10:27:05 2025
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Wed, 3 Dec 2025 10:27:05 GMT
Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is
 incorrectly implemented
In-Reply-To: <SiBemHHhiHk3uDElvjRK16U2SNea7P1wsptA_4z7kxA=.0746951f-41b4-46b4-8397-40603dea6298@github.com>
References: <SiBemHHhiHk3uDElvjRK16U2SNea7P1wsptA_4z7kxA=.0746951f-41b4-46b4-8397-40603dea6298@github.com>
Message-ID: <bZdqbLu_C8RrX6GWWFGMJiAZLsIfAuoshaAth5QggLY=.7394a63c-1a10-4450-81d0-1b3bc45ae146@github.com>

On Sat, 29 Nov 2025 06:06:16 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:

> The jtreg test TestEmergencyDumpAtOOM.java runs into the following error on ppc64 platforms.
> 
> JFR emergency dump would be kicked at `VMError::report_and_die()`, then Java thread for JFR would not work due to secondary signal handler for error reporting.
> 
> Passed all of jdk_jfr tests on Linux AMD64.

With your PR added, we do not observe the error in test TestEmergencyDumpAtOOM any more.

src/hotspot/share/jfr/jfr.cpp line 159:

> 157: 
> 158: void Jfr::on_vm_error_report(outputStream* st) {
> 159:   assert(!JfrRecorder::is_recording(), "JFR should be stopped at erorr reporting");

'erorr' - please fix the little typo !

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28563#issuecomment-3605855107
PR Review Comment: https://git.openjdk.org/jdk/pull/28563#discussion_r2584283966

From duke at openjdk.org  Wed Dec  3 10:41:46 2025
From: duke at openjdk.org (duke)
Date: Wed, 3 Dec 2025 10:41:46 GMT
Subject: RFR: 8372587: Put jdk/jfr/jvm/TestWaste.java into the ProblemList
 [v2]
In-Reply-To: <wY-O955cGahyVezW1hfZEr5A7NI-PfSEAZGPS6zRr_g=.e56a4ef1-c4e6-479f-8349-f8fde7eeb8b2@github.com>
References: <H247MCPHJP5uabQALazU-mg0l3EtbtM1L-5EZVkwemg=.03b1dbd6-e16e-406b-b118-ba45572e38d6@github.com>
 <wY-O955cGahyVezW1hfZEr5A7NI-PfSEAZGPS6zRr_g=.e56a4ef1-c4e6-479f-8349-f8fde7eeb8b2@github.com>
Message-ID: <TIPvvS5RXPFXvBwKG8m5ETiC0rSUQRg9O__pXsva2qU=.ff1ca061-6ba2-4af0-8a3c-9c56797a3803@github.com>

On Tue, 2 Dec 2025 10:26:15 GMT, Kerem Kat <krk at openjdk.org> wrote:

>> It still fails in oracle environment as described in [JDK-8371630](https://bugs.openjdk.org/browse/JDK-8371630).
>
> Kerem Kat has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update test/jdk/ProblemList.txt
>   
>   Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com>

@krk 
Your change (at version ab4f92453ba582b6c94007ba80e74d6a025d20e5) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28539#issuecomment-3606187497

From mgronlun at openjdk.org  Wed Dec  3 10:45:43 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Wed, 3 Dec 2025 10:45:43 GMT
Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for
 class loading [v14]
In-Reply-To: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
Message-ID: <2tMUYu-TFZeFFfpE_054SK4rzCZZgs_UMiNudMx440M=.ce00a309-abf1-4325-9781-65dc0b5c6d59@github.com>

> Greetings,
> 
> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event.
> 
> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements.
> 
> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths.
> 
> Testing: jdk_jfr, manual AOT verification, stress testing
> 
> Thanks
> Markus

Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:

  copyright header

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28505/files
  - new: https://git.openjdk.org/jdk/pull/28505/files/c2e36d2f..85bf2d25

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=13
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=12-13

  Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/28505.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505

PR: https://git.openjdk.org/jdk/pull/28505

From egahlin at openjdk.org  Wed Dec  3 10:58:54 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Wed, 3 Dec 2025 10:58:54 GMT
Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for
 class loading [v14]
In-Reply-To: <2tMUYu-TFZeFFfpE_054SK4rzCZZgs_UMiNudMx440M=.ce00a309-abf1-4325-9781-65dc0b5c6d59@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
 <2tMUYu-TFZeFFfpE_054SK4rzCZZgs_UMiNudMx440M=.ce00a309-abf1-4325-9781-65dc0b5c6d59@github.com>
Message-ID: <oFk6YeygG-fx49QR6hCGTVzLmcMiDAXjjnTw36jhEK4=.145660c8-d67e-496e-b8d1-1fd157365d42@github.com>

On Wed, 3 Dec 2025 10:45:43 GMT, Markus Gr?nlund <mgronlun at openjdk.org> wrote:

>> Greetings,
>> 
>> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event.
>> 
>> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements.
>> 
>> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths.
>> 
>> Testing: jdk_jfr, manual AOT verification, stress testing
>> 
>> Thanks
>> Markus
>
> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:
> 
>   copyright header

Nice work!

-------------

Marked as reviewed by egahlin (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/28505#pullrequestreview-3534470248

From krk at openjdk.org  Wed Dec  3 13:05:13 2025
From: krk at openjdk.org (Kerem Kat)
Date: Wed, 3 Dec 2025 13:05:13 GMT
Subject: Integrated: 8372587: Put jdk/jfr/jvm/TestWaste.java into the
 ProblemList
In-Reply-To: <H247MCPHJP5uabQALazU-mg0l3EtbtM1L-5EZVkwemg=.03b1dbd6-e16e-406b-b118-ba45572e38d6@github.com>
References: <H247MCPHJP5uabQALazU-mg0l3EtbtM1L-5EZVkwemg=.03b1dbd6-e16e-406b-b118-ba45572e38d6@github.com>
Message-ID: <YGQdu8MGk5WU57biPdnyX1Y0k98TUmVVR5tmdOV7wX8=.ac4c4577-5e9b-41c6-a9f5-5011d6f1d6b0@github.com>

On Thu, 27 Nov 2025 16:04:56 GMT, Kerem Kat <krk at openjdk.org> wrote:

> It still fails in oracle environment as described in [JDK-8371630](https://bugs.openjdk.org/browse/JDK-8371630).

This pull request has now been integrated.

Changeset: abb75ba6
Author:    Kerem Kat <krk at openjdk.org>
Committer: Volker Simonis <simonis at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/abb75ba656ebe14e9e8e1d4a1765d64dfce9e661
Stats:     1 line in 1 file changed: 1 ins; 0 del; 0 mod

8372587: Put jdk/jfr/jvm/TestWaste.java into the ProblemList

Reviewed-by: dholmes

-------------

PR: https://git.openjdk.org/jdk/pull/28539

From coleenp at openjdk.org  Wed Dec  3 13:30:44 2025
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Wed, 3 Dec 2025 13:30:44 GMT
Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for
 class loading [v14]
In-Reply-To: <2tMUYu-TFZeFFfpE_054SK4rzCZZgs_UMiNudMx440M=.ce00a309-abf1-4325-9781-65dc0b5c6d59@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
 <2tMUYu-TFZeFFfpE_054SK4rzCZZgs_UMiNudMx440M=.ce00a309-abf1-4325-9781-65dc0b5c6d59@github.com>
Message-ID: <WwsgNB6sS5Vd3juIsd3lCjE3DVCxtiFEJ05s0HC-M1g=.f99daf63-482d-4922-b823-8c7672fcd0c0@github.com>

On Wed, 3 Dec 2025 10:45:43 GMT, Markus Gr?nlund <mgronlun at openjdk.org> wrote:

>> Greetings,
>> 
>> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event.
>> 
>> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements.
>> 
>> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths.
>> 
>> Testing: jdk_jfr, manual AOT verification, stress testing
>> 
>> Thanks
>> Markus
>
> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:
> 
>   copyright header

Runtime changes look fine.  Didn't review the new concurrent hash table.

-------------

Marked as reviewed by coleenp (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/28505#pullrequestreview-3535046644

From mgronlun at openjdk.org  Wed Dec  3 13:54:07 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Wed, 3 Dec 2025 13:54:07 GMT
Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for
 class loading [v15]
In-Reply-To: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
Message-ID: <yXte_WQrCfyhCozrGYHJMXiALSLQyS65xhpznp0ZlKY=.c92d2707-397e-4979-8018-65f69e351a48@github.com>

> Greetings,
> 
> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event.
> 
> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements.
> 
> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths.
> 
> Testing: jdk_jfr, manual AOT verification, stress testing
> 
> Thanks
> Markus

Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:

  restore AOT modifications

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28505/files
  - new: https://git.openjdk.org/jdk/pull/28505/files/85bf2d25..601fbc0b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=14
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=13-14

  Stats: 23 lines in 3 files changed: 0 ins; 18 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/28505.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505

PR: https://git.openjdk.org/jdk/pull/28505

From mgronlun at openjdk.org  Wed Dec  3 14:00:34 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Wed, 3 Dec 2025 14:00:34 GMT
Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for
 class loading [v16]
In-Reply-To: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
Message-ID: <C99OQSYha-tnuEf3K_pPD20cs4edtsQ2J1PgimbOhoI=.c0d83419-65e6-491a-ae74-e7ea82ac533f@github.com>

> Greetings,
> 
> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event.
> 
> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements.
> 
> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths.
> 
> Testing: jdk_jfr, manual AOT verification, stress testing
> 
> Thanks
> Markus

Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:

  restore aotClassLocation.cpp

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28505/files
  - new: https://git.openjdk.org/jdk/pull/28505/files/601fbc0b..0d1461f7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=15
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28505&range=14-15

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/28505.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28505/head:pull/28505

PR: https://git.openjdk.org/jdk/pull/28505

From egahlin at openjdk.org  Wed Dec  3 14:11:24 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Wed, 3 Dec 2025 14:11:24 GMT
Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for
 class loading [v16]
In-Reply-To: <C99OQSYha-tnuEf3K_pPD20cs4edtsQ2J1PgimbOhoI=.c0d83419-65e6-491a-ae74-e7ea82ac533f@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
 <C99OQSYha-tnuEf3K_pPD20cs4edtsQ2J1PgimbOhoI=.c0d83419-65e6-491a-ae74-e7ea82ac533f@github.com>
Message-ID: <RgLj0UYKzO7krYSOLt5AluNK4p839Tbq4twA4n7JoTs=.cab2c41f-ff46-4db2-a0d9-6ba89cde4e0c@github.com>

On Wed, 3 Dec 2025 14:00:34 GMT, Markus Gr?nlund <mgronlun at openjdk.org> wrote:

>> Greetings,
>> 
>> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event.
>> 
>> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements.
>> 
>> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths.
>> 
>> Testing: jdk_jfr, manual AOT verification, stress testing
>> 
>> Thanks
>> Markus
>
> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:
> 
>   restore aotClassLocation.cpp

Marked as reviewed by egahlin (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/28505#pullrequestreview-3535245425

From coleenp at openjdk.org  Wed Dec  3 14:46:25 2025
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Wed, 3 Dec 2025 14:46:25 GMT
Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for
 class loading [v16]
In-Reply-To: <C99OQSYha-tnuEf3K_pPD20cs4edtsQ2J1PgimbOhoI=.c0d83419-65e6-491a-ae74-e7ea82ac533f@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
 <C99OQSYha-tnuEf3K_pPD20cs4edtsQ2J1PgimbOhoI=.c0d83419-65e6-491a-ae74-e7ea82ac533f@github.com>
Message-ID: <MLQU5BcTCstEeP-_Ci3oqo8AvS414r1z4NMzLva9aBg=.c3e8b30c-31cf-46eb-9567-ed718cd70e96@github.com>

On Wed, 3 Dec 2025 14:00:34 GMT, Markus Gr?nlund <mgronlun at openjdk.org> wrote:

>> Greetings,
>> 
>> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event.
>> 
>> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements.
>> 
>> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths.
>> 
>> Testing: jdk_jfr, manual AOT verification, stress testing
>> 
>> Thanks
>> Markus
>
> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:
> 
>   restore aotClassLocation.cpp

Looks good.

-------------

Marked as reviewed by coleenp (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/28505#pullrequestreview-3535405541

From egahlin at openjdk.org  Wed Dec  3 14:51:25 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Wed, 3 Dec 2025 14:51:25 GMT
Subject: RFR: 8373024: JFR: CPU throttle rate can't handle incorrect values
Message-ID: <dTLToNAmt9Snf5Utxn1TvqxJGM9QV6kxiS53f_evMeo=.7ab6c8e0-0be3-4231-b2be-deae5679feca@github.com>

Could I get a review of a PR that hardens the CPU throttle rate setting?

Testing: test/jdk/jdk/jfr

Thanks
Erik

-------------

Commit messages:
 - Initial

Changes: https://git.openjdk.org/jdk/pull/28636/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28636&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8373024
  Stats: 5 lines in 3 files changed: 1 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/28636.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28636/head:pull/28636

PR: https://git.openjdk.org/jdk/pull/28636

From mgronlun at openjdk.org  Wed Dec  3 15:15:29 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Wed, 3 Dec 2025 15:15:29 GMT
Subject: RFR: 8373024: JFR: CPU throttle rate can't handle incorrect values
In-Reply-To: <dTLToNAmt9Snf5Utxn1TvqxJGM9QV6kxiS53f_evMeo=.7ab6c8e0-0be3-4231-b2be-deae5679feca@github.com>
References: <dTLToNAmt9Snf5Utxn1TvqxJGM9QV6kxiS53f_evMeo=.7ab6c8e0-0be3-4231-b2be-deae5679feca@github.com>
Message-ID: <XDiF5EcYb2vvEMY4gLUbnhSjaIrv1VMIOn_0uwfszLU=.8d6d4f6e-87e9-4673-a6a8-fcf5470b2b1e@github.com>

On Wed, 3 Dec 2025 14:41:36 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

> Could I get a review of a PR that hardens the CPU throttle rate setting?
> 
> Testing: test/jdk/jdk/jfr
> 
> Thanks
> Erik

Marked as reviewed by mgronlun (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/28636#pullrequestreview-3535543932

From mgronlun at openjdk.org  Wed Dec  3 18:16:28 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Wed, 3 Dec 2025 18:16:28 GMT
Subject: RFR: 8365400: Enhance JFR to emit file and module metadata for
 class loading [v16]
In-Reply-To: <RgLj0UYKzO7krYSOLt5AluNK4p839Tbq4twA4n7JoTs=.cab2c41f-ff46-4db2-a0d9-6ba89cde4e0c@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
 <C99OQSYha-tnuEf3K_pPD20cs4edtsQ2J1PgimbOhoI=.c0d83419-65e6-491a-ae74-e7ea82ac533f@github.com>
 <RgLj0UYKzO7krYSOLt5AluNK4p839Tbq4twA4n7JoTs=.cab2c41f-ff46-4db2-a0d9-6ba89cde4e0c@github.com>
Message-ID: <XhoWzEGwt3cLxl3cQezgclQ7Dwxwtg-Z0WDu1x4Hgqs=.94b99d87-4ed0-4bb7-94fa-2880d9da46d1@github.com>

On Wed, 3 Dec 2025 14:08:23 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

>> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   restore aotClassLocation.cpp
>
> Marked as reviewed by egahlin (Reviewer).

Thanks @egahlin and @coleenp for your reviews!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28505#issuecomment-3608156678

From mgronlun at openjdk.org  Wed Dec  3 18:16:31 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Wed, 3 Dec 2025 18:16:31 GMT
Subject: Integrated: 8365400: Enhance JFR to emit file and module metadata for
 class loading
In-Reply-To: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
References: <GdhWHZhomFQs19oFkwcZhEYcS6BDIXIX9p1d5hI_w9g=.3aea7d5c-ebc1-429c-97d2-ca26408590a9@github.com>
Message-ID: <AmNsUpjMSIqwY7RTVl4U0eOP8V0D5HvNrmr9BCt_oCs=.8df6d522-79b1-4927-b9a5-38419e5fcf52@github.com>

On Wed, 26 Nov 2025 12:10:55 GMT, Markus Gr?nlund <mgronlun at openjdk.org> wrote:

> Greetings,
> 
> this enhancement adds a "source" field, label "Source" to the jdk.ClassDefine event.
> 
> To enable this functionality, JFR needs a concurrent symbol table. We can build a simpler version of a concurrent hash table, taking advantage of the JFR epoch system. This will be useful also for planned future enhancements.
> 
> Extensions are made to AOT to consistently report identical canonical paths for classes as non-AOT code paths.
> 
> Testing: jdk_jfr, manual AOT verification, stress testing
> 
> Thanks
> Markus

This pull request has now been integrated.

Changeset: e93b10d0
Author:    Markus Gr?nlund <mgronlun at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/e93b10d08456f720e303771a882e79660911e1eb
Stats:     1372 lines in 33 files changed: 1035 ins; 162 del; 175 mod

8365400: Enhance JFR to emit file and module metadata for class loading

Reviewed-by: coleenp, egahlin

-------------

PR: https://git.openjdk.org/jdk/pull/28505

From mdoerr at openjdk.org  Wed Dec  3 21:35:55 2025
From: mdoerr at openjdk.org (Martin Doerr)
Date: Wed, 3 Dec 2025 21:35:55 GMT
Subject: RFR: 8371014: Dump JFR recording on CrashOnOutOfMemoryError is
 incorrectly implemented
In-Reply-To: <SiBemHHhiHk3uDElvjRK16U2SNea7P1wsptA_4z7kxA=.0746951f-41b4-46b4-8397-40603dea6298@github.com>
References: <SiBemHHhiHk3uDElvjRK16U2SNea7P1wsptA_4z7kxA=.0746951f-41b4-46b4-8397-40603dea6298@github.com>
Message-ID: <DGFl3s8vmgJ3OKXcZqge33rSgTlnGuS3DxN2_UnIX5c=.9d97c05d-bab2-4f50-9b8f-c3b73e66666a@github.com>

On Sat, 29 Nov 2025 06:06:16 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:

> The jtreg test TestEmergencyDumpAtOOM.java runs into the following error on ppc64 platforms.
> 
> JFR emergency dump would be kicked at `VMError::report_and_die()`, then Java thread for JFR would not work due to secondary signal handler for error reporting.
> 
> Passed all of jdk_jfr tests on Linux AMD64.

I think this makes sense, but should also be reviewed by JFR folks.

-------------

Marked as reviewed by mdoerr (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/28563#pullrequestreview-3537010567

From egahlin at openjdk.org  Thu Dec  4 08:04:08 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Thu, 4 Dec 2025 08:04:08 GMT
Subject: Integrated: 8373024: JFR: CPU throttle rate can't handle incorrect
 values
In-Reply-To: <dTLToNAmt9Snf5Utxn1TvqxJGM9QV6kxiS53f_evMeo=.7ab6c8e0-0be3-4231-b2be-deae5679feca@github.com>
References: <dTLToNAmt9Snf5Utxn1TvqxJGM9QV6kxiS53f_evMeo=.7ab6c8e0-0be3-4231-b2be-deae5679feca@github.com>
Message-ID: <Oxun1eN9vQxbxI8hWSonOpTtfHTOaLgGja4zwpEFmjw=.566175e1-a226-458a-a40b-88e79e3a906d@github.com>

On Wed, 3 Dec 2025 14:41:36 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

> Could I get a review of a PR that hardens the CPU throttle rate setting?
> 
> Testing: test/jdk/jdk/jfr
> 
> Thanks
> Erik

This pull request has now been integrated.

Changeset: 63a10e00
Author:    Erik Gahlin <egahlin at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/63a10e0099111d69b167abf99d1a00084c4d6c1e
Stats:     5 lines in 3 files changed: 1 ins; 0 del; 4 mod

8373024: JFR: CPU throttle rate can't handle incorrect values

Reviewed-by: mgronlun

-------------

PR: https://git.openjdk.org/jdk/pull/28636

From mgronlun at openjdk.org  Thu Dec  4 10:16:41 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Thu, 4 Dec 2025 10:16:41 GMT
Subject: RFR: 8373062: JFR build failure with CDS disabled
Message-ID: <VBzvkarY-LB9n0CZheJ2JKxRe98VkeRqotTD8P4-vmY=.5ce21498-d7d2-4289-875e-46a01bf4b3e5@github.com>

Greetings,

[JDK-8365400](https://bugs.openjdk.org/browse/JDK-8365400) caused a build problem when passing build option "--disable-cds" because of missing conditionals.

Testing: manually building with "--disable-cds"

Thanks
Markus

-------------

Commit messages:
 - 8373062

Changes: https://git.openjdk.org/jdk/pull/28656/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28656&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8373062
  Stats: 7 lines in 4 files changed: 5 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/28656.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28656/head:pull/28656

PR: https://git.openjdk.org/jdk/pull/28656

From mgronlun at openjdk.org  Thu Dec  4 10:29:35 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Thu, 4 Dec 2025 10:29:35 GMT
Subject: RFR: 8373062: JFR build failure with CDS disabled [v2]
In-Reply-To: <VBzvkarY-LB9n0CZheJ2JKxRe98VkeRqotTD8P4-vmY=.5ce21498-d7d2-4289-875e-46a01bf4b3e5@github.com>
References: <VBzvkarY-LB9n0CZheJ2JKxRe98VkeRqotTD8P4-vmY=.5ce21498-d7d2-4289-875e-46a01bf4b3e5@github.com>
Message-ID: <3UTq-N9HJwvAaxkyFlRMfjpCJVWL7pM5g3S4qFiRGQo=.ed758eaf-4e0d-4b48-8226-1f3df4fbdaca@github.com>

> Greetings,
> 
> [JDK-8365400](https://bugs.openjdk.org/browse/JDK-8365400) caused a build problem when passing build option "--disable-cds" because of missing conditionals.
> 
> Testing: manually building with "--disable-cds"
> 
> Thanks
> Markus

Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:

  post_class_load_event wrongly placed inside INCLUDE_CDS section

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28656/files
  - new: https://git.openjdk.org/jdk/pull/28656/files/513642a6..369297f5

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28656&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28656&range=00-01

  Stats: 22 lines in 1 file changed: 11 ins; 11 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/28656.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28656/head:pull/28656

PR: https://git.openjdk.org/jdk/pull/28656

From egahlin at openjdk.org  Thu Dec  4 10:45:26 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Thu, 4 Dec 2025 10:45:26 GMT
Subject: RFR: 8373062: JFR build failure with CDS disabled [v3]
In-Reply-To: <DtNs9GWLIlUKi2sKtPBYJYGmSfuE5o845M5iiND0e_Y=.ee698d53-ff62-49af-a652-662b98f996a2@github.com>
References: <VBzvkarY-LB9n0CZheJ2JKxRe98VkeRqotTD8P4-vmY=.5ce21498-d7d2-4289-875e-46a01bf4b3e5@github.com>
 <DtNs9GWLIlUKi2sKtPBYJYGmSfuE5o845M5iiND0e_Y=.ee698d53-ff62-49af-a652-662b98f996a2@github.com>
Message-ID: <lUv-P8eS0_SVT4RaQ9XXEtko5s8pQ9kBBhsFOlEUlNM=.1c1c280a-74a9-4085-9e8d-bfc724ad8728@github.com>

On Thu, 4 Dec 2025 10:42:56 GMT, Markus Gr?nlund <mgronlun at openjdk.org> wrote:

>> Greetings,
>> 
>> [JDK-8365400](https://bugs.openjdk.org/browse/JDK-8365400) caused a build problem when passing build option "--disable-cds" because of missing conditionals.
>> 
>> Testing: manually building with "--disable-cds"
>> 
>> Thanks
>> Markus
>
> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:
> 
>   apa

Marked as reviewed by egahlin (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/28656#pullrequestreview-3539289435

From mgronlun at openjdk.org  Thu Dec  4 10:45:25 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Thu, 4 Dec 2025 10:45:25 GMT
Subject: RFR: 8373062: JFR build failure with CDS disabled [v3]
In-Reply-To: <VBzvkarY-LB9n0CZheJ2JKxRe98VkeRqotTD8P4-vmY=.5ce21498-d7d2-4289-875e-46a01bf4b3e5@github.com>
References: <VBzvkarY-LB9n0CZheJ2JKxRe98VkeRqotTD8P4-vmY=.5ce21498-d7d2-4289-875e-46a01bf4b3e5@github.com>
Message-ID: <DtNs9GWLIlUKi2sKtPBYJYGmSfuE5o845M5iiND0e_Y=.ee698d53-ff62-49af-a652-662b98f996a2@github.com>

> Greetings,
> 
> [JDK-8365400](https://bugs.openjdk.org/browse/JDK-8365400) caused a build problem when passing build option "--disable-cds" because of missing conditionals.
> 
> Testing: manually building with "--disable-cds"
> 
> Thanks
> Markus

Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:

  apa

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28656/files
  - new: https://git.openjdk.org/jdk/pull/28656/files/369297f5..0e8612c7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28656&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28656&range=01-02

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/28656.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28656/head:pull/28656

PR: https://git.openjdk.org/jdk/pull/28656

From mgronlun at openjdk.org  Thu Dec  4 12:27:53 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Thu, 4 Dec 2025 12:27:53 GMT
Subject: RFR: 8373062: JFR build failure with CDS disabled [v3]
In-Reply-To: <lUv-P8eS0_SVT4RaQ9XXEtko5s8pQ9kBBhsFOlEUlNM=.1c1c280a-74a9-4085-9e8d-bfc724ad8728@github.com>
References: <VBzvkarY-LB9n0CZheJ2JKxRe98VkeRqotTD8P4-vmY=.5ce21498-d7d2-4289-875e-46a01bf4b3e5@github.com>
 <DtNs9GWLIlUKi2sKtPBYJYGmSfuE5o845M5iiND0e_Y=.ee698d53-ff62-49af-a652-662b98f996a2@github.com>
 <lUv-P8eS0_SVT4RaQ9XXEtko5s8pQ9kBBhsFOlEUlNM=.1c1c280a-74a9-4085-9e8d-bfc724ad8728@github.com>
Message-ID: <XIedijQHRYNQDq3i2GtalJ7bIP9EMqXW19cyG5HrjfQ=.483cc306-bb9e-45c4-bfd5-cb8c6f982c21@github.com>

On Thu, 4 Dec 2025 10:42:01 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

>> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   apa
>
> Marked as reviewed by egahlin (Reviewer).

Thanks @egahlin for the review. I am going to to proceed with integration to ensure no build issues remain for RDP1.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28656#issuecomment-3611987285

From mgronlun at openjdk.org  Thu Dec  4 12:27:55 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Thu, 4 Dec 2025 12:27:55 GMT
Subject: Integrated: 8373062: JFR build failure with CDS disabled
In-Reply-To: <VBzvkarY-LB9n0CZheJ2JKxRe98VkeRqotTD8P4-vmY=.5ce21498-d7d2-4289-875e-46a01bf4b3e5@github.com>
References: <VBzvkarY-LB9n0CZheJ2JKxRe98VkeRqotTD8P4-vmY=.5ce21498-d7d2-4289-875e-46a01bf4b3e5@github.com>
Message-ID: <50F7UB-iOcEhzjtgSeuYCHLBtS0EMmTNqZs_2wTM9Hs=.5a87c2be-06d6-4719-9285-1d6487016dee@github.com>

On Thu, 4 Dec 2025 10:09:50 GMT, Markus Gr?nlund <mgronlun at openjdk.org> wrote:

> Greetings,
> 
> [JDK-8365400](https://bugs.openjdk.org/browse/JDK-8365400) caused a build problem when passing build option "--disable-cds" because of missing conditionals.
> 
> Testing: manually building with "--disable-cds"
> 
> Thanks
> Markus

This pull request has now been integrated.

Changeset: bcbdf90f
Author:    Markus Gr?nlund <mgronlun at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/bcbdf90fce44ad87e7728ba0febef0951e361589
Stats:     29 lines in 5 files changed: 16 ins; 11 del; 2 mod

8373062: JFR build failure with CDS disabled

Reviewed-by: egahlin

-------------

PR: https://git.openjdk.org/jdk/pull/28656

From stuefe at openjdk.org  Thu Dec  4 13:56:16 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 4 Dec 2025 13:56:16 GMT
Subject: RFR: 8370715: JFR: Races are possible when dumping recordings [v4]
In-Reply-To: <FttY7XYQZoRgrN-MAD_vhi1N2VqJLa5DQIoZrO9KFHY=.b538a62f-fbbe-461c-a587-8fd39824c49a@github.com>
References: <kp7YrMJipPoIvIgYHFztzJpoDm_opt1X-q_IQiJ_t-I=.af64c82f-64cd-405c-be25-21ef21b8bdc0@github.com>
 <FttY7XYQZoRgrN-MAD_vhi1N2VqJLa5DQIoZrO9KFHY=.b538a62f-fbbe-461c-a587-8fd39824c49a@github.com>
Message-ID: <vdmgZB5dTcuBT9WPAWxk2nzKfZbABv9mreqTa0akW6Y=.9c0a79f8-2c99-4067-8bc2-2cc230891919@github.com>

On Tue, 2 Dec 2025 16:55:03 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> #### Summary
>> This PR changes the JFR snapshot dumping code so that multiple JFR recordings (potentially from different processes) racing to write the same dump destination won't mix their data. 
>> 
>> #### Problem
>> The dump destination file is created and/or wiped when a recording is started, but not wiped again before actually copying over the chunks during a dump. So in the window of time between creating/wiping and dumping chunks, another recording could write to the same dump destination and have it's chunks added to the snapshot. This can happen with either a single JVM or multiple JVMs that are racing.
>> 
>> #### Proposed fix
>> This PR ensures that any data previously written to the dump destination is wiped before the new recording's data is written.  File locking is also done while chunks are being written. 
>> 
>> Testing:
>> - new test `jdk/jdk/jfr/api/recording/dump/TestDumpOverwrite.java`
>> - Tier 1
>
> Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision:
> 
>   change WARN to INFO

Good!

Had a quick look at the GHA failures, they are unrelated.

-------------

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/28460#pullrequestreview-3540155334
PR Comment: https://git.openjdk.org/jdk/pull/28460#issuecomment-3612377195

From duke at openjdk.org  Thu Dec  4 13:59:52 2025
From: duke at openjdk.org (Robert Toyonaga)
Date: Thu, 4 Dec 2025 13:59:52 GMT
Subject: Integrated: 8370715: JFR: Races are possible when dumping recordings
In-Reply-To: <kp7YrMJipPoIvIgYHFztzJpoDm_opt1X-q_IQiJ_t-I=.af64c82f-64cd-405c-be25-21ef21b8bdc0@github.com>
References: <kp7YrMJipPoIvIgYHFztzJpoDm_opt1X-q_IQiJ_t-I=.af64c82f-64cd-405c-be25-21ef21b8bdc0@github.com>
Message-ID: <LnO6_5z0m87rBGyLZTXH9T9WFqPV1QMxFufmoCqLFfE=.676f332a-1985-4804-9354-88cd831b6748@github.com>

On Fri, 21 Nov 2025 21:02:56 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

> #### Summary
> This PR changes the JFR snapshot dumping code so that multiple JFR recordings (potentially from different processes) racing to write the same dump destination won't mix their data. 
> 
> #### Problem
> The dump destination file is created and/or wiped when a recording is started, but not wiped again before actually copying over the chunks during a dump. So in the window of time between creating/wiping and dumping chunks, another recording could write to the same dump destination and have it's chunks added to the snapshot. This can happen with either a single JVM or multiple JVMs that are racing.
> 
> #### Proposed fix
> This PR ensures that any data previously written to the dump destination is wiped before the new recording's data is written.  File locking is also done while chunks are being written. 
> 
> Testing:
> - new test `jdk/jdk/jfr/api/recording/dump/TestDumpOverwrite.java`
> - Tier 1

This pull request has now been integrated.

Changeset: c4ec983d
Author:    Robert Toyonaga <rtoyonag at redhat.com>
Committer: Thomas Stuefe <stuefe at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/c4ec983da57ee8aea71e88d5de2570c5d65a69df
Stats:     88 lines in 2 files changed: 87 ins; 0 del; 1 mod

8370715: JFR: Races are possible when dumping recordings

Reviewed-by: egahlin, stuefe

-------------

PR: https://git.openjdk.org/jdk/pull/28460

From stuefe at openjdk.org  Fri Dec  5 05:52:32 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Fri, 5 Dec 2025 05:52:32 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive
Message-ID: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>

A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.

We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.

This RFE changes the algorithm to be non-recursive. 

Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.

Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.

Testing:

- Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
- Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
- Ran locally all jtreg tests in jdk/jfr
- GHAs

-------------

Commit messages:
 - remove test output
 - Copyright
 - start

Changes: https://git.openjdk.org/jdk/pull/28659/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8373096
  Stats: 70 lines in 2 files changed: 40 ins; 17 del; 13 mod
  Patch: https://git.openjdk.org/jdk/pull/28659.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28659/head:pull/28659

PR: https://git.openjdk.org/jdk/pull/28659

From stuefe at openjdk.org  Sat Dec  6 06:15:54 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sat, 6 Dec 2025 06:15:54 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive
In-Reply-To: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
Message-ID: <bifTzGIUAyyYeIpmqk7tVAB-Y6YqW-g5KBxFrgEaTew=.c125eeb6-7d04-40b6-b9f9-5fe8330c55c4@github.com>

On Thu, 4 Dec 2025 15:54:04 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.
> 
> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
> 
> This RFE changes the algorithm to be non-recursive. 
> 
> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.
> 
> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.
> 
> Testing:
> 
> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
> - Ran locally all jtreg tests in jdk/jfr
> - GHAs

Ping @egahlin

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3619621653

From egahlin at openjdk.org  Mon Dec  8 18:08:09 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Mon, 8 Dec 2025 18:08:09 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive
In-Reply-To: <bifTzGIUAyyYeIpmqk7tVAB-Y6YqW-g5KBxFrgEaTew=.c125eeb6-7d04-40b6-b9f9-5fe8330c55c4@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
 <bifTzGIUAyyYeIpmqk7tVAB-Y6YqW-g5KBxFrgEaTew=.c125eeb6-7d04-40b6-b9f9-5fe8330c55c4@github.com>
Message-ID: <xvy0Z7mSkjCma0_jwSxOdsTHUQQnL0nrcWkO8nfAb_Q=.aa6ed219-de18-47df-adbb-532d7388e6bc@github.com>

On Sat, 6 Dec 2025 06:13:44 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> Ping @egahlin

This looks like a good fix and refactoring. I agree, increasing the depth is better done separately. I will need to look at this more and run some tests. We now return when we find a marked object.


102       if (_mark_bits->is_marked(pointee)) {
103         return;
104       }


Previously we just aborted that closure. Need to think about if it matters.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3628350221

From duke at openjdk.org  Tue Dec  9 01:22:00 2025
From: duke at openjdk.org (Robert Toyonaga)
Date: Tue, 9 Dec 2025 01:22:00 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive
In-Reply-To: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
Message-ID: <RK8QLZ5JhDC8d8wpkggMyWrtHalEnTEhb6_Efm9btlU=.d21596c5-a84a-412a-91b0-3acd6f07545f@github.com>

On Thu, 4 Dec 2025 15:54:04 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.
> 
> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
> 
> This RFE changes the algorithm to be non-recursive. 
> 
> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.
> 
> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.
> 
> Testing:
> 
> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
> - Ran locally all jtreg tests in jdk/jfr
> - GHAs

src/hotspot/share/jfr/leakprofiler/chains/dfsClosure.cpp line 103:

> 101:     } else {
> 102:       if (_mark_bits->is_marked(pointee)) {
> 103:         return;

I think this improvement is a good idea!  But maybe this line should be replaced with a `continue`, otherwise we can terminate the DFS prematurely and skip evaluation of other chains extending from other references already pushed to the stack.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28659#discussion_r2600708367

From stuefe at openjdk.org  Tue Dec  9 05:55:57 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 9 Dec 2025 05:55:57 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive
In-Reply-To: <RK8QLZ5JhDC8d8wpkggMyWrtHalEnTEhb6_Efm9btlU=.d21596c5-a84a-412a-91b0-3acd6f07545f@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
 <RK8QLZ5JhDC8d8wpkggMyWrtHalEnTEhb6_Efm9btlU=.d21596c5-a84a-412a-91b0-3acd6f07545f@github.com>
Message-ID: <i1PkejT3IdcSiWw3kli25jtrN6N9kivTom_BaJWfrCs=.6cea2091-641a-4e2c-9f1f-3e7e1151df38@github.com>

On Tue, 9 Dec 2025 01:19:16 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.
>> 
>> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
>> 
>> This RFE changes the algorithm to be non-recursive. 
>> 
>> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.
>> 
>> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.
>> 
>> Testing:
>> 
>> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
>> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
>> - Ran locally all jtreg tests in jdk/jfr
>> - GHAs
>
> src/hotspot/share/jfr/leakprofiler/chains/dfsClosure.cpp line 103:
> 
>> 101:     } else {
>> 102:       if (_mark_bits->is_marked(pointee)) {
>> 103:         return;
> 
> I think this improvement is a good idea!  But maybe this line should be replaced with a `continue`, otherwise we can terminate the DFS prematurely and skip evaluation of other chains extending from other references already pushed to the stack.

Good catch!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28659#discussion_r2601196718

From stuefe at openjdk.org  Tue Dec  9 08:13:11 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 9 Dec 2025 08:13:11 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive [v2]
In-Reply-To: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
Message-ID: <UdNpvd8-ssgctaeW_Z0Ha5rtxO9_MJZ_111fpKJ6Kks=.61f5b976-e37d-47d5-a697-5792b1cfa487@github.com>

> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.
> 
> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
> 
> This RFE changes the algorithm to be non-recursive. 
> 
> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.
> 
> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.
> 
> Testing:
> 
> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
> - Ran locally all jtreg tests in jdk/jfr
> - GHAs

Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:

  fix

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28659/files
  - new: https://git.openjdk.org/jdk/pull/28659/files/5ac152db..e1a4736b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=00-01

  Stats: 16 lines in 2 files changed: 9 ins; 1 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/28659.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28659/head:pull/28659

PR: https://git.openjdk.org/jdk/pull/28659

From stuefe at openjdk.org  Tue Dec  9 11:26:45 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 9 Dec 2025 11:26:45 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive [v3]
In-Reply-To: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
Message-ID: <A7u_DfliAiblu14-WO8PMfO0SPO_kVytOCB5biGWb04=.6d20b12c-a899-46e7-9b0a-fcd4ccd12141@github.com>

> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.
> 
> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
> 
> This RFE changes the algorithm to be non-recursive. 
> 
> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.
> 
> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.
> 
> Testing:
> 
> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
> - Ran locally all jtreg tests in jdk/jfr
> - GHAs

Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision:

 - revert accidental checkin
 - test improvements

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28659/files
  - new: https://git.openjdk.org/jdk/pull/28659/files/e1a4736b..d5ee7c4b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=01-02

  Stats: 101 lines in 1 file changed: 81 ins; 3 del; 17 mod
  Patch: https://git.openjdk.org/jdk/pull/28659.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28659/head:pull/28659

PR: https://git.openjdk.org/jdk/pull/28659

From stuefe at openjdk.org  Tue Dec  9 12:08:16 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 9 Dec 2025 12:08:16 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive [v4]
In-Reply-To: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
Message-ID: <obgj-E3z8-mCjfg3Z9P5tP6uBmQZCzfnbM-eIGaJ6aY=.98e1975a-25d5-4f43-ac58-83d41d6eb603@github.com>

> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.
> 
> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
> 
> This RFE changes the algorithm to be non-recursive. 
> 
> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.
> 
> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.
> 
> Testing:
> 
> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
> - Ran locally all jtreg tests in jdk/jfr
> - GHAs

Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:

  final fixes

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28659/files
  - new: https://git.openjdk.org/jdk/pull/28659/files/d5ee7c4b..73497c30

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=02-03

  Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/28659.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28659/head:pull/28659

PR: https://git.openjdk.org/jdk/pull/28659

From stuefe at openjdk.org  Tue Dec  9 13:21:56 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 9 Dec 2025 13:21:56 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive [v4]
In-Reply-To: <obgj-E3z8-mCjfg3Z9P5tP6uBmQZCzfnbM-eIGaJ6aY=.98e1975a-25d5-4f43-ac58-83d41d6eb603@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
 <obgj-E3z8-mCjfg3Z9P5tP6uBmQZCzfnbM-eIGaJ6aY=.98e1975a-25d5-4f43-ac58-83d41d6eb603@github.com>
Message-ID: <P9hJVl3bYTYpcV898yiuLPzDrEZUXWsQdhKKz6omF-k=.ff0f1840-603a-4802-b4c3-c62cfd3acc32@github.com>

On Tue, 9 Dec 2025 12:08:16 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.
>> 
>> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
>> 
>> This RFE changes the algorithm to be non-recursive. 
>> 
>> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.
>> 
>> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.
>> 
>> Testing:
>> 
>> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
>> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
>> - Ran locally all jtreg tests in jdk/jfr
>> - GHAs
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   final fixes

I see that we have a problem with very broad objects or large object arrays that are nested with this approach, as the space-time complexity of traversing the net with this approach becomes too large. I'll try to modify the patch to take that into account.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3632232783

From fthevenet at openjdk.org  Wed Dec 10 14:27:35 2025
From: fthevenet at openjdk.org (Frederic Thevenet)
Date: Wed, 10 Dec 2025 14:27:35 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive [v4]
In-Reply-To: <obgj-E3z8-mCjfg3Z9P5tP6uBmQZCzfnbM-eIGaJ6aY=.98e1975a-25d5-4f43-ac58-83d41d6eb603@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
 <obgj-E3z8-mCjfg3Z9P5tP6uBmQZCzfnbM-eIGaJ6aY=.98e1975a-25d5-4f43-ac58-83d41d6eb603@github.com>
Message-ID: <0El1-nrTE4DNmVq0AjNmSejSbqW_ysuSMs2DEj20UKA=.a1ed9c4b-1f8c-400f-9ae7-db44081e356c@github.com>

On Tue, 9 Dec 2025 12:08:16 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.
>> 
>> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
>> 
>> This RFE changes the algorithm to be non-recursive. 
>> 
>> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.
>> 
>> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.
>> 
>> Testing:
>> 
>> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
>> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
>> - Ran locally all jtreg tests in jdk/jfr
>> - GHAs
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   final fixes

src/hotspot/share/jfr/leakprofiler/chains/dfsClosure.cpp line 177:

> 175:   assert(!ref.is_null(), "invariant");
> 176:   const oop pointee = ref.dereference();
> 177:   assert(pointee != nullptr, "invariant");

Small thing: is this still useful since since `pointee` is no longer used with this change?
We assert that `ref.dereference() != nullptr`  right after we pop it from the stack in `drain_probe_stack` anyway.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28659#discussion_r2606872452

From stuefe at openjdk.org  Thu Dec 11 05:33:23 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 11 Dec 2025 05:33:23 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive [v4]
In-Reply-To: <obgj-E3z8-mCjfg3Z9P5tP6uBmQZCzfnbM-eIGaJ6aY=.98e1975a-25d5-4f43-ac58-83d41d6eb603@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
 <obgj-E3z8-mCjfg3Z9P5tP6uBmQZCzfnbM-eIGaJ6aY=.98e1975a-25d5-4f43-ac58-83d41d6eb603@github.com>
Message-ID: <V_YOnjYDlY1nx7xUBnDLrUOWGlx3_QNR6xnsAXAisRI=.1c4338eb-38df-4e16-b328-4c39091f821a@github.com>

On Tue, 9 Dec 2025 12:08:16 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.
>> 
>> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
>> 
>> This RFE changes the algorithm to be non-recursive. 
>> 
>> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.
>> 
>> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.
>> 
>> Testing:
>> 
>> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
>> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
>> - Ran locally all jtreg tests in jdk/jfr
>> - GHAs
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   final fixes

Update: the performance problem with large Arrays I see is pre-existing, and I will post an RFE separately. But my patch can benefit from striping large arrays in order to avoid large probing stacks. I will do that.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3640209385

From stuefe at openjdk.org  Thu Dec 11 07:59:34 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 11 Dec 2025 07:59:34 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive [v4]
In-Reply-To: <obgj-E3z8-mCjfg3Z9P5tP6uBmQZCzfnbM-eIGaJ6aY=.98e1975a-25d5-4f43-ac58-83d41d6eb603@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
 <obgj-E3z8-mCjfg3Z9P5tP6uBmQZCzfnbM-eIGaJ6aY=.98e1975a-25d5-4f43-ac58-83d41d6eb603@github.com>
Message-ID: <083O8MLtieturWtthx5z8jOY4Vij_HjQUOpVX3ZqxUc=.76fff71d-1e4c-4dbc-9d6a-e721a5658e67@github.com>

On Tue, 9 Dec 2025 12:08:16 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.
>> 
>> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
>> 
>> This RFE changes the algorithm to be non-recursive. 
>> 
>> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.
>> 
>> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.
>> 
>> Testing:
>> 
>> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
>> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
>> - Ran locally all jtreg tests in jdk/jfr
>> - GHAs
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   final fixes

For the performance problem, see https://bugs.openjdk.org/browse/JDK-8373490

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3640696702

From egahlin at openjdk.org  Thu Dec 11 08:24:23 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Thu, 11 Dec 2025 08:24:23 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive [v4]
In-Reply-To: <083O8MLtieturWtthx5z8jOY4Vij_HjQUOpVX3ZqxUc=.76fff71d-1e4c-4dbc-9d6a-e721a5658e67@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
 <obgj-E3z8-mCjfg3Z9P5tP6uBmQZCzfnbM-eIGaJ6aY=.98e1975a-25d5-4f43-ac58-83d41d6eb603@github.com>
 <083O8MLtieturWtthx5z8jOY4Vij_HjQUOpVX3ZqxUc=.76fff71d-1e4c-4dbc-9d6a-e721a5658e67@github.com>
Message-ID: <Fs-GmmMZhEX2zTXPxwVBtj2H_iBF7iq2MihQ8cW-uzQ=.f4199178-f0e1-428d-99a8-492df5382278@github.com>

On Thu, 11 Dec 2025 07:57:18 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> For the performance problem, see https://bugs.openjdk.org/browse/JDK-8373490

Our long-term plan is to get rid of DFS [1], but it's good to have a fix that we can backport. 

[1] https://bugs.openjdk.org/browse/JDK-8245430

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3640773953

From stuefe at openjdk.org  Fri Dec 12 07:57:14 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Fri, 12 Dec 2025 07:57:14 GMT
Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for large
 object arrays
Message-ID: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>

I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions:

1) We have large object arrays on the heap

2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.cpp#L142-L144). This, in turn, is more likely to happen with large object arrays.

Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two.

The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by decreasing the size of the BFS edge queue in the code.

In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results:

- Object Array size of 200,000 elements: BFS only, all is well (~5 seconds)
- Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds)
- Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds)

The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever.

Examining the problem more closely, I see:

- BFS search starts iterating over the object array. It will find that its edge queue is too small, and will drop down to DFS for every single object element. That in itself is not the problem.

- It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size.

The reason for this is a missing mark check at the border between BFS and DFS. 

Tests:

- I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs). I also manually verified the resulting JFR file and confirmed that I see the gc roots listed for the array elements.
- I ran JFR jtreg tests manually

-------------

Commit messages:
 - fix
 - start

Changes: https://git.openjdk.org/jdk/pull/28781/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28781&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8373490
  Stats: 208 lines in 2 files changed: 202 ins; 6 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/28781.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28781/head:pull/28781

PR: https://git.openjdk.org/jdk/pull/28781

From stuefe at openjdk.org  Fri Dec 12 08:01:51 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Fri, 12 Dec 2025 08:01:51 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive [v4]
In-Reply-To: <Fs-GmmMZhEX2zTXPxwVBtj2H_iBF7iq2MihQ8cW-uzQ=.f4199178-f0e1-428d-99a8-492df5382278@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
 <obgj-E3z8-mCjfg3Z9P5tP6uBmQZCzfnbM-eIGaJ6aY=.98e1975a-25d5-4f43-ac58-83d41d6eb603@github.com>
 <083O8MLtieturWtthx5z8jOY4Vij_HjQUOpVX3ZqxUc=.76fff71d-1e4c-4dbc-9d6a-e721a5658e67@github.com>
 <Fs-GmmMZhEX2zTXPxwVBtj2H_iBF7iq2MihQ8cW-uzQ=.f4199178-f0e1-428d-99a8-492df5382278@github.com>
Message-ID: <k7nbbjMjMwr2BGd2tVQM0UIJfjpvVxz_LjgzeuSbgAw=.3ea4c099-8f3b-4443-a8c0-f3246a7ccde3@github.com>

On Thu, 11 Dec 2025 08:21:29 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

>> For the performance problem, see https://bugs.openjdk.org/browse/JDK-8373490
>
>> For the performance problem, see https://bugs.openjdk.org/browse/JDK-8373490
> 
> Our long-term plan is to get rid of DFS [1], but it's good to have a fix that we can backport. 
> 
> [1] https://bugs.openjdk.org/browse/JDK-8245430

@egahlin @roberttoyonaga I posted a patch for the performance problem. The patch is very simple, and I would like to get that in first before progressing with this patch, to make backports easier (though it probably does not matter). Could you pls give me a quick review? The issue is rather simple.

https://github.com/openjdk/jdk/pull/28781

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28659#issuecomment-3645351477

From egahlin at openjdk.org  Fri Dec 12 08:37:54 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Fri, 12 Dec 2025 08:37:54 GMT
Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for
 large object arrays
In-Reply-To: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>
References: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>
Message-ID: <aWCQNBnuHz455ASqCn5RKZZ8AVo-a49RZ81Jy9SmPbc=.632f17eb-61b6-4de3-ae1b-8d9facb2b992@github.com>

On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions:
> 
> 1) We have large object arrays on the heap
> 
> 2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.cpp#L142-L144). This, in turn, is more likely to happen with large object arrays.
> 
> Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two.
> 
> The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code.
> 
> In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results:
> 
> - Object Array size of 200,000 elements: BFS only, all is well (~5 seconds)
> - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds)
> - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds)
> 
> The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever.
> 
> Examining the problem more closely, I see:
> 
> - BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem.
> 
> - It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size.
> 
> The reason for this is a missing mark check at the border between BFS and DFS. 
> 
> Tests:
> 
> - I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs...

Looks reasonable. I will run some tests before approving.

(It?s the end of the year, and I have vacation time I must take, so my availability for reviews is limited.)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3645483611

From stuefe at openjdk.org  Fri Dec 12 09:11:50 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Fri, 12 Dec 2025 09:11:50 GMT
Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for
 large object arrays
In-Reply-To: <aWCQNBnuHz455ASqCn5RKZZ8AVo-a49RZ81Jy9SmPbc=.632f17eb-61b6-4de3-ae1b-8d9facb2b992@github.com>
References: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>
 <aWCQNBnuHz455ASqCn5RKZZ8AVo-a49RZ81Jy9SmPbc=.632f17eb-61b6-4de3-ae1b-8d9facb2b992@github.com>
Message-ID: <0gQ7TMLhG1xkyrgjazvcoq_pCRewwhKaVnX7_wEQhcw=.9e18ff5c-41d9-4f6f-9b9d-2cc37694a669@github.com>

On Fri, 12 Dec 2025 08:34:47 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

> Looks reasonable. I will run some tests before approving.
> 
> (It?s the end of the year, and I have vacation time I must take, so my availability for reviews is limited.)

Same here :-)

Hope you have a nice vacation, btw!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3645590241

From duke at openjdk.org  Fri Dec 12 13:37:07 2025
From: duke at openjdk.org (Bara' Hasheesh)
Date: Fri, 12 Dec 2025 13:37:07 GMT
Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath
Message-ID: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>

A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks  

A new test was added that fails without the change & passes with it 

I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86

-------------

Commit messages:
 - exception message
 - Move shutdown check to PlatformRecorder.start & missing space
 - 8373439: Fix deadlock between recorder start & VMDeath hook

Changes: https://git.openjdk.org/jdk/pull/28767/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28767&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8373439
  Stats: 66 lines in 3 files changed: 64 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/28767.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28767/head:pull/28767

PR: https://git.openjdk.org/jdk/pull/28767

From fandreuzzi at openjdk.org  Fri Dec 12 13:37:09 2025
From: fandreuzzi at openjdk.org (Francesco Andreuzzi)
Date: Fri, 12 Dec 2025 13:37:09 GMT
Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath
In-Reply-To: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>
References: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>
Message-ID: <7l8_qZiR8JS8IdXLaelvruMTGLYpekzQPYmjPB66GPI=.dfdaae86-1cf8-47d9-913d-6d8b747c643c@github.com>

On Thu, 11 Dec 2025 15:19:51 GMT, Bara' Hasheesh <duke at openjdk.org> wrote:

> A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks  
> 
> A new test was added that fails without the change & passes with it 
> 
> I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86

test/jdk/jdk/jfr/api/recording/deadlock/TestShutdownDeadLock.java line 43:

> 41:         Recording r = new Recording();
> 42:         r.start();
> 43:         r.stop();

Are these two lines needed for the reproducer?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2611098871

From duke at openjdk.org  Fri Dec 12 13:37:09 2025
From: duke at openjdk.org (Bara' Hasheesh)
Date: Fri, 12 Dec 2025 13:37:09 GMT
Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath
In-Reply-To: <7l8_qZiR8JS8IdXLaelvruMTGLYpekzQPYmjPB66GPI=.dfdaae86-1cf8-47d9-913d-6d8b747c643c@github.com>
References: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>
 <7l8_qZiR8JS8IdXLaelvruMTGLYpekzQPYmjPB66GPI=.dfdaae86-1cf8-47d9-913d-6d8b747c643c@github.com>
Message-ID: <qTu3tJ6PVGE8nXtAhzGIcRvGoX8_1M11C6DL57XgE3E=.1e0be2c3-e415-4f8c-bf40-84086ef2e958@github.com>

On Thu, 11 Dec 2025 15:44:48 GMT, Francesco Andreuzzi <fandreuzzi at openjdk.org> wrote:

>> A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks  
>> 
>> A new test was added that fails without the change & passes with it 
>> 
>> I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86
>
> test/jdk/jdk/jfr/api/recording/deadlock/TestShutdownDeadLock.java line 43:
> 
>> 41:         Recording r = new Recording();
>> 42:         r.start();
>> 43:         r.stop();
> 
> Are these two lines needed for the reproducer?

These calls they are made to guarantee that the entire JFR components are fully initialized (internals threads & other structures) & fully functional as recordings are able to be processed normally. 

While not needed as only `new Recording` is needed to start that creation process, but I would personally vote to keep them

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2611162129

From egahlin at openjdk.org  Fri Dec 12 14:19:15 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Fri, 12 Dec 2025 14:19:15 GMT
Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for
 large object arrays
In-Reply-To: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>
References: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>
Message-ID: <yaKkMaFdIJ1hT3zqYYcvJST7TulnrwhyrIDXH3TTfkU=.6ff599dc-a601-4dfa-9d04-255e27a5e694@github.com>

On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions:
> 
> 1) We have large object arrays on the heap
> 
> 2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.cpp#L142-L144). This, in turn, is more likely to happen with large object arrays.
> 
> Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two.
> 
> The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code.
> 
> In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results:
> 
> - Object Array size of 200,000 elements: BFS only, all is well (~5 seconds)
> - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds)
> - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds)
> 
> The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever.
> 
> Examining the problem more closely, I see:
> 
> - BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem.
> 
> - It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size.
> 
> The reason for this is a missing mark check at the border between BFS and DFS. 
> 
> Tests:
> 
> - I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs...

Marked as reviewed by egahlin (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/28781#pullrequestreview-3571992566

From stuefe at openjdk.org  Fri Dec 12 14:49:16 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Fri, 12 Dec 2025 14:49:16 GMT
Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath
In-Reply-To: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>
References: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>
Message-ID: <okNr-dPsFMcKt85NmTlSJc-Ra9cGxsnGF9b5YNCAJ7o=.a56f183e-3579-4cd8-8626-3261ac3fe6dc@github.com>

On Thu, 11 Dec 2025 15:19:51 GMT, Bara' Hasheesh <duke at openjdk.org> wrote:

> A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks  
> 
> A new test was added that fails without the change & passes with it 
> 
> I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86

@Baraa-Hasheesh would this be a solution for https://bugs.openjdk.org/browse/JDK-8373257?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28767#issuecomment-3646763877

From duke at openjdk.org  Fri Dec 12 16:01:29 2025
From: duke at openjdk.org (Bara' Hasheesh)
Date: Fri, 12 Dec 2025 16:01:29 GMT
Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath
In-Reply-To: <okNr-dPsFMcKt85NmTlSJc-Ra9cGxsnGF9b5YNCAJ7o=.a56f183e-3579-4cd8-8626-3261ac3fe6dc@github.com>
References: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>
 <okNr-dPsFMcKt85NmTlSJc-Ra9cGxsnGF9b5YNCAJ7o=.a56f183e-3579-4cd8-8626-3261ac3fe6dc@github.com>
Message-ID: <vhQxc53rikVgACMEIfHh16atGoxRioEP-Ed04pYgAYQ=.d704fbb9-34e8-4ca7-9800-d7904887002d@github.com>

On Fri, 12 Dec 2025 14:47:00 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks  
>> 
>> A new test was added that fails without the change & passes with it 
>> 
>> I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86
>
> @Baraa-Hasheesh would this be a solution for https://bugs.openjdk.org/browse/JDK-8373257?

@tstuefe From the ticket description that looks like a different deadlock 

The deadlock here happens as the `JfrPostBox` will wait infinitely on the `recorderthread` to process it's message, but that thread already exited due to call of `PlatformRecorder.destroy -> JVMSupport.destroyJFR -> JVM.destroyJFR`

Your case seems to be deadlocking somewhere else

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28767#issuecomment-3647155450

From egahlin at openjdk.org  Fri Dec 12 17:10:54 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Fri, 12 Dec 2025 17:10:54 GMT
Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath
In-Reply-To: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>
References: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>
Message-ID: <dSYWkf9SH9BUUpnpUW6LIvk3gk7uaUdVzxDTk-CDrgk=.35455dde-42f2-472f-a751-864571658acc@github.com>

On Thu, 11 Dec 2025 15:19:51 GMT, Bara' Hasheesh <duke at openjdk.org> wrote:

> A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks  
> 
> A new test was added that fails without the change & passes with it 
> 
> I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86

src/jdk.jfr/share/classes/jdk/jfr/internal/PlatformRecording.java line 108:

> 106:         synchronized (recorder) {
> 107:             if (PlatformRecorder.isInShutDown()) {
> 108:                 throw new IllegalStateException("Flight recorder is already shutdown");

I need to think about if throwing ISE is the best alternative here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2614980643

From egahlin at openjdk.org  Fri Dec 12 17:17:52 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Fri, 12 Dec 2025 17:17:52 GMT
Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for
 large object arrays
In-Reply-To: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>
References: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>
Message-ID: <KxXn1AH7yfh2Ze1KLQk6jPp9ujHJVScR7F6AvSrZpbM=.50bf2da4-effb-4095-8c9e-d7998f0415ab@github.com>

On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions:
> 
> 1) We have large object arrays on the heap
> 
> 2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.cpp#L142-L144). This, in turn, is more likely to happen with large object arrays.
> 
> Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two.
> 
> The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code.
> 
> In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results:
> 
> - Object Array size of 200,000 elements: BFS only, all is well (~5 seconds)
> - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds)
> - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds)
> 
> The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever.
> 
> Examining the problem more closely, I see:
> 
> - BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem.
> 
> - It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size.
> 
> The reason for this is a missing mark check at the border between BFS and DFS. 
> 
> Tests:
> 
> - I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs...

A better title for the bug might be "JFR: path-to-gc-roots=true is very slow for large object arrays".

The leak profiler name is not something we have used externally.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3647430575

From duke at openjdk.org  Fri Dec 12 19:03:53 2025
From: duke at openjdk.org (Robert Toyonaga)
Date: Fri, 12 Dec 2025 19:03:53 GMT
Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for
 large object arrays
In-Reply-To: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>
References: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>
Message-ID: <oqGIvyrq7YQY3xK23y8c8_yLdO-Rbz8pzhBzent1G24=.8277cdb9-f884-4e2d-8dd3-732a02bd504d@github.com>

On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions:
> 
> 1) We have large object arrays on the heap
> 
> 2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.cpp#L142-L144). This, in turn, is more likely to happen with large object arrays.
> 
> Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two.
> 
> The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code.
> 
> In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results:
> 
> - Object Array size of 200,000 elements: BFS only, all is well (~5 seconds)
> - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds)
> - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds)
> 
> The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever.
> 
> Examining the problem more closely, I see:
> 
> - BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem.
> 
> - It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size.
> 
> The reason for this is a missing mark check at the border between BFS and DFS. 
> 
> Tests:
> 
> - I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs...

This fix looks good to me

-------------

Marked as reviewed by roberttoyonaga at github.com (no known OpenJDK username).

PR Review: https://git.openjdk.org/jdk/pull/28781#pullrequestreview-3573204588

From duke at openjdk.org  Fri Dec 12 19:25:50 2025
From: duke at openjdk.org (Robert Toyonaga)
Date: Fri, 12 Dec 2025 19:25:50 GMT
Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for
 large object arrays
In-Reply-To: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>
References: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>
Message-ID: <_WHtdpClhCRzQlZJn-mA8KgNaG9g-nYbxKZ8P16ceMA=.bf58d6e9-67e4-491c-91bd-97c4c8bceb9c@github.com>

On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions:
> 
> 1) We have large object arrays on the heap
> 
> 2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.cpp#L142-L144). This, in turn, is more likely to happen with large object arrays.
> 
> Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two.
> 
> The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code.
> 
> In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results:
> 
> - Object Array size of 200,000 elements: BFS only, all is well (~5 seconds)
> - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds)
> - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds)
> 
> The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever.
> 
> Examining the problem more closely, I see:
> 
> - BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem.
> 
> - It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size.
> 
> The reason for this is a missing mark check at the border between BFS and DFS. 
> 
> Tests:
> 
> - I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs...

Is there a way to add a timeout to the test for the failure case?  Time outs in unit tests are not completely reliable, but maybe something generous like 10s might be reasonable.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3647841771

From egahlin at openjdk.org  Fri Dec 12 21:23:52 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Fri, 12 Dec 2025 21:23:52 GMT
Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for
 large object arrays
In-Reply-To: <_WHtdpClhCRzQlZJn-mA8KgNaG9g-nYbxKZ8P16ceMA=.bf58d6e9-67e4-491c-91bd-97c4c8bceb9c@github.com>
References: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>
 <_WHtdpClhCRzQlZJn-mA8KgNaG9g-nYbxKZ8P16ceMA=.bf58d6e9-67e4-491c-91bd-97c4c8bceb9c@github.com>
Message-ID: <5niLk8ntVOpDpjrN6jMsolLGSOuY7pEyDLXZJ513KWY=.3bb31fcd-574a-4120-8975-5f82d94ed998@github.com>

On Fri, 12 Dec 2025 19:23:34 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

> Is there a way to add a timeout to the test for the failure case? Time outs in unit tests are not completely reliable, but maybe something generous like 10s might be reasonable.

I don't think we should have a timeout. Timeouts typically only lead to false positives, when you run on some old hardware, during stress, using some other option etc.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3648179388

From stuefe at openjdk.org  Sun Dec 14 12:00:08 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sun, 14 Dec 2025 12:00:08 GMT
Subject: RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for
 large object arrays
In-Reply-To: <5niLk8ntVOpDpjrN6jMsolLGSOuY7pEyDLXZJ513KWY=.3bb31fcd-574a-4120-8975-5f82d94ed998@github.com>
References: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>
 <_WHtdpClhCRzQlZJn-mA8KgNaG9g-nYbxKZ8P16ceMA=.bf58d6e9-67e4-491c-91bd-97c4c8bceb9c@github.com>
 <5niLk8ntVOpDpjrN6jMsolLGSOuY7pEyDLXZJ513KWY=.3bb31fcd-574a-4120-8975-5f82d94ed998@github.com>
Message-ID: <ojIxohXm8UpL5FxDdbOdMifN-jBmuAo0WIR_NUCB-hk=.3925c2cb-9c77-4bbf-91cc-851cd2f3fd4f@github.com>

On Fri, 12 Dec 2025 21:20:56 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

> Is there a way to add a timeout to the test for the failure case? Time outs in unit tests are not completely reliable, but maybe something generous like 10s might be reasonable.

Adding to what Eric wrote, there is the timeout of jtreg itself, of course. By default 120secs. But folks can increase or decrease that at command line level.

Many thanks for the speedy reviews, @egahlin and @roberttoyonaga !

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3650778670
PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3650779585

From stuefe at openjdk.org  Sun Dec 14 12:00:10 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sun, 14 Dec 2025 12:00:10 GMT
Subject: Integrated: 8373490: JFR Leak Profiler: path-to-gc-root very slow for
 large object arrays
In-Reply-To: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>
References: <w5XHAqg3qV__B8VkGddzS-PylGgO9ioRV-EEyiPsf_0=.0ed7dba6-bac6-429c-99b9-78908478bc40@github.com>
Message-ID: <56ifPVrvcEVK4r7m9yF7KgXUuruA-bWqWB6vbbuzMm8=.c7adcfbe-36fa-470e-aa90-643d74ce327f@github.com>

On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions:
> 
> 1) We have large object arrays on the heap
> 
> 2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.cpp#L142-L144). This, in turn, is more likely to happen with large object arrays.
> 
> Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two.
> 
> The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code.
> 
> In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results:
> 
> - Object Array size of 200,000 elements: BFS only, all is well (~5 seconds)
> - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds)
> - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds)
> 
> The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever.
> 
> Examining the problem more closely, I see:
> 
> - BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem.
> 
> - It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size.
> 
> The reason for this is a missing mark check at the border between BFS and DFS. 
> 
> Tests:
> 
> - I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs...

This pull request has now been integrated.

Changeset: 99f90bef
Author:    Thomas Stuefe <stuefe at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/99f90befafe9476de17e416d45a9875569171935
Stats:     208 lines in 2 files changed: 202 ins; 6 del; 0 mod

8373490: JFR Leak Profiler: path-to-gc-root very slow for large object arrays

Reviewed-by: egahlin

-------------

PR: https://git.openjdk.org/jdk/pull/28781

From egahlin at openjdk.org  Mon Dec 15 09:43:41 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Mon, 15 Dec 2025 09:43:41 GMT
Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath
In-Reply-To: <dSYWkf9SH9BUUpnpUW6LIvk3gk7uaUdVzxDTk-CDrgk=.35455dde-42f2-472f-a751-864571658acc@github.com>
References: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>
 <dSYWkf9SH9BUUpnpUW6LIvk3gk7uaUdVzxDTk-CDrgk=.35455dde-42f2-472f-a751-864571658acc@github.com>
Message-ID: <45-fSOH97K8FKgc1YN6w1tKz11nvfnFBOp25uN1s_TI=.cadf2c62-db15-434e-90dd-f18817ab9697@github.com>

On Fri, 12 Dec 2025 17:08:14 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

>> A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks  
>> 
>> A new test was added that fails without the change & passes with it 
>> 
>> I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86
>
> src/jdk.jfr/share/classes/jdk/jfr/internal/PlatformRecording.java line 108:
> 
>> 106:         synchronized (recorder) {
>> 107:             if (PlatformRecorder.isInShutDown()) {
>> 108:                 throw new IllegalStateException("Flight recorder is already shutdown");
> 
> I need to think about if throwing ISE is the best alternative here.

A user may call recording.start() at any time, and throwing an ISE would break the current API and require callers to guard against shutdown. This would affect not just the Recording class but also RecordingStream and FlightRecorder, and possibly the FlightRecorderMXBean. Another alternative is to add PlatformRecorder.isInShutdown checks in various places (stop, dump etc) to make it a dummy recording, but that becomes complicated. An internal checked exception could help ensure all paths are covered. A third alternative is to avoid destroying native JFR at shutdown. That said, we still need to clean up the disk repository. Markus will be back in January. I think we should wait until he returns to get more input on this.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2618681154

From duke at openjdk.org  Mon Dec 15 10:23:40 2025
From: duke at openjdk.org (Bara' Hasheesh)
Date: Mon, 15 Dec 2025 10:23:40 GMT
Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath
In-Reply-To: <45-fSOH97K8FKgc1YN6w1tKz11nvfnFBOp25uN1s_TI=.cadf2c62-db15-434e-90dd-f18817ab9697@github.com>
References: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>
 <dSYWkf9SH9BUUpnpUW6LIvk3gk7uaUdVzxDTk-CDrgk=.35455dde-42f2-472f-a751-864571658acc@github.com>
 <45-fSOH97K8FKgc1YN6w1tKz11nvfnFBOp25uN1s_TI=.cadf2c62-db15-434e-90dd-f18817ab9697@github.com>
Message-ID: <9MZ4FJ9vdY0QxuvgrnWj-5HlE8rz_xI3bKCyeCj-Upw=.581f8f7b-b8b9-4fc9-a608-e67cb6bb5ef7@github.com>

On Mon, 15 Dec 2025 09:40:37 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

> A third alternative is to avoid destroying native JFR at shutdown 

I think for this option it's better to wait on Markus as you mentioned

-------

For the other options, I don't like the first option of `isInShutdown` as some APIs, are designed to run fine after the flag is set, such as the `stop` API

As for the internal checked exception, the exception by it self won't be sufficient, as the recoding also needs to be transformed into a "dummy recording" as from a user perspective the application continued normally so the recording "started", otherwise this recording becomes a deadlock hazard 

Now the aspect of a "dummy recording" is interesting, what do you think about adding a new `RecordingState`  we call it dummy, which we add checks for where is needed within the various APIs

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2618813671

From egahlin at openjdk.org  Mon Dec 15 10:55:34 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Mon, 15 Dec 2025 10:55:34 GMT
Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath
In-Reply-To: <9MZ4FJ9vdY0QxuvgrnWj-5HlE8rz_xI3bKCyeCj-Upw=.581f8f7b-b8b9-4fc9-a608-e67cb6bb5ef7@github.com>
References: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>
 <dSYWkf9SH9BUUpnpUW6LIvk3gk7uaUdVzxDTk-CDrgk=.35455dde-42f2-472f-a751-864571658acc@github.com>
 <45-fSOH97K8FKgc1YN6w1tKz11nvfnFBOp25uN1s_TI=.cadf2c62-db15-434e-90dd-f18817ab9697@github.com>
 <9MZ4FJ9vdY0QxuvgrnWj-5HlE8rz_xI3bKCyeCj-Upw=.581f8f7b-b8b9-4fc9-a608-e67cb6bb5ef7@github.com>
Message-ID: <N7V_HZ7qWk_si14I_OMAR44mYcb43HSsNf0_2_hTYd0=.29e9f768-d6c9-455b-b8b0-b942ccc11485@github.com>

On Mon, 15 Dec 2025 10:20:34 GMT, Bara' Hasheesh <duke at openjdk.org> wrote:

>> A user may call recording.start() at any time, and throwing an ISE would break the current API and require callers to guard against shutdown. This would affect not just the Recording class but also RecordingStream and FlightRecorder, and possibly the FlightRecorderMXBean. Another alternative is to add PlatformRecorder.isInShutdown checks in various places (stop, dump etc) to make it a dummy recording, but that becomes complicated. An internal checked exception could help ensure all paths are covered. A third alternative is to avoid destroying native JFR at shutdown. That said, we still need to clean up the disk repository. Markus will be back in January. I think we should wait until he returns to get more input on this.
>
>> A third alternative is to avoid destroying native JFR at shutdown 
> 
> I think for this option it's better to wait on Markus as you mentioned
> 
> -------
> 
> For the other options, I don't like the first option of `isInShutdown` as some APIs, are designed to run fine after the flag is set, such as the `stop` API
> 
> As for the internal checked exception, the exception by it self won't be sufficient, as the recoding also needs to be transformed into a "dummy recording" as from a user perspective the application continued normally so the recording "started", otherwise this recording becomes a deadlock hazard 
> 
> Now the aspect of a "dummy recording" is interesting, what do you think about adding a new `RecordingState`  we call it dummy, which we add checks for where is needed within the various APIs

I thought about it, but it doesn't solve the underlying problem and may break things for API users. We might as well use RUNNING and some internal flag.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2618923020

From duke at openjdk.org  Tue Dec 16 09:08:25 2025
From: duke at openjdk.org (Bara' Hasheesh)
Date: Tue, 16 Dec 2025 09:08:25 GMT
Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath [v2]
In-Reply-To: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>
References: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>
Message-ID: <e_z2p-au2lRxMfHw9jK7EjcYf9dRWx_m2dCrdHf_2t0=.22d83f8d-d61f-4aea-899a-9bdd4ab5ec0e@github.com>

> A simple `PlatformRecorder.isInShutDown` check is added to `PlatformRecording.start` to prevent any new recording from start after the JVM initiates it's shutdown hooks  
> 
> A new test was added that fails without the change & passes with it 
> 
> I also ran `tier1`, `tier2` as well as `jdk_jfr` on Linux x86

Bara' Hasheesh has updated the pull request incrementally with one additional commit since the last revision:

  flags

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28767/files
  - new: https://git.openjdk.org/jdk/pull/28767/files/d1eec7c3..e8811e0f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28767&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28767&range=00-01

  Stats: 65 lines in 4 files changed: 53 ins; 6 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/28767.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28767/head:pull/28767

PR: https://git.openjdk.org/jdk/pull/28767

From egahlin at openjdk.org  Tue Dec 16 09:41:03 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Tue, 16 Dec 2025 09:41:03 GMT
Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath [v2]
In-Reply-To: <N7V_HZ7qWk_si14I_OMAR44mYcb43HSsNf0_2_hTYd0=.29e9f768-d6c9-455b-b8b0-b942ccc11485@github.com>
References: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>
 <dSYWkf9SH9BUUpnpUW6LIvk3gk7uaUdVzxDTk-CDrgk=.35455dde-42f2-472f-a751-864571658acc@github.com>
 <45-fSOH97K8FKgc1YN6w1tKz11nvfnFBOp25uN1s_TI=.cadf2c62-db15-434e-90dd-f18817ab9697@github.com>
 <9MZ4FJ9vdY0QxuvgrnWj-5HlE8rz_xI3bKCyeCj-Upw=.581f8f7b-b8b9-4fc9-a608-e67cb6bb5ef7@github.com>
 <N7V_HZ7qWk_si14I_OMAR44mYcb43HSsNf0_2_hTYd0=.29e9f768-d6c9-455b-b8b0-b942ccc11485@github.com>
Message-ID: <hYnmGPgz9L4H4dPQ6Ld6l8faht7r04R6CgeCY4vNBW0=.e2836e59-2311-4b3e-99c2-0479d0bf6149@github.com>

On Mon, 15 Dec 2025 10:53:21 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

>>> A third alternative is to avoid destroying native JFR at shutdown 
>> 
>> I think for this option it's better to wait on Markus as you mentioned
>> 
>> -------
>> 
>> For the other options, I don't like the first option of `isInShutdown` as some APIs, are designed to run fine after the flag is set, such as the `stop` API
>> 
>> As for the internal checked exception, the exception by it self won't be sufficient, as the recoding also needs to be transformed into a "dummy recording" as from a user perspective the application continued normally so the recording "started", otherwise this recording becomes a deadlock hazard 
>> 
>> Now the aspect of a "dummy recording" is interesting, what do you think about adding a new `RecordingState`  we call it dummy, which we add checks for where is needed within the various APIs
>
> I thought about it, but it doesn't solve the underlying problem and may break things for API users. We might as well use RUNNING and some internal flag.

I don't think it makes sense to implement anything until we have resolved the design, which is best done after New Year's. Also, I will be on vacation, so my time for reviews will be limited.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2622512926

From duke at openjdk.org  Tue Dec 16 09:41:04 2025
From: duke at openjdk.org (Bara' Hasheesh)
Date: Tue, 16 Dec 2025 09:41:04 GMT
Subject: RFR: 8373439: Deadlock between flight recorder & VMDeath [v2]
In-Reply-To: <hYnmGPgz9L4H4dPQ6Ld6l8faht7r04R6CgeCY4vNBW0=.e2836e59-2311-4b3e-99c2-0479d0bf6149@github.com>
References: <tPJTqGqqtUCtjq-VJSM_7kFMI0FKGiaknccBswDX744=.e19a6870-43be-42f0-b580-a13d31b691c1@github.com>
 <dSYWkf9SH9BUUpnpUW6LIvk3gk7uaUdVzxDTk-CDrgk=.35455dde-42f2-472f-a751-864571658acc@github.com>
 <45-fSOH97K8FKgc1YN6w1tKz11nvfnFBOp25uN1s_TI=.cadf2c62-db15-434e-90dd-f18817ab9697@github.com>
 <9MZ4FJ9vdY0QxuvgrnWj-5HlE8rz_xI3bKCyeCj-Upw=.581f8f7b-b8b9-4fc9-a608-e67cb6bb5ef7@github.com>
 <N7V_HZ7qWk_si14I_OMAR44mYcb43HSsNf0_2_hTYd0=.29e9f768-d6c9-455b-b8b0-b942ccc11485@github.com>
 <hYnmGPgz9L4H4dPQ6Ld6l8faht7r04R6CgeCY4vNBW0=.e2836e59-2311-4b3e-99c2-0479d0bf6149@github.com>
Message-ID: <Ni2SPpfcVCNMzQPNzvbDekLB_cADqEUYTOdGWGzhdII=.49b4792d-6aff-4179-95ce-98a83132ec79@github.com>

On Tue, 16 Dec 2025 09:36:23 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

>> I thought about it, but it doesn't solve the underlying problem and may break things for API users. We might as well use RUNNING and some internal flag.
>
> I don't think it makes sense to implement anything until we have resolved the design, which is best done after New Year's. Also, I will be on vacation, so my time for reviews will be limited.

Noted, I will put this on hold until next year

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28767#discussion_r2622519564

From stuefe at openjdk.org  Wed Dec 17 09:55:04 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 17 Dec 2025 09:55:04 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive [v5]
In-Reply-To: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
Message-ID: <uAEtGXmacVNGw9LcUTY-ViO1TxL1kmdBTglGnp04qz4=.2987e5e1-abf2-4ee1-bcdf-1c72aebbfa32@github.com>

> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.
> 
> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
> 
> This RFE changes the algorithm to be non-recursive. 
> 
> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.
> 
> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.
> 
> Testing:
> 
> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
> - Ran locally all jtreg tests in jdk/jfr
> - GHAs

Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:

 - Merge branch 'master' into JFR-leak-profiler-path-to-gc-roots-non-recursive
 - final fixes
 - revert accidental checkin
 - test improvements
 - fix
 - remove test output
 - Copyright
 - start

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28659/files
  - new: https://git.openjdk.org/jdk/pull/28659/files/73497c30..10bc510b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=03-04

  Stats: 45414 lines in 841 files changed: 30528 ins; 10541 del; 4345 mod
  Patch: https://git.openjdk.org/jdk/pull/28659.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28659/head:pull/28659

PR: https://git.openjdk.org/jdk/pull/28659

From stuefe at openjdk.org  Wed Dec 17 10:06:59 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 17 Dec 2025 10:06:59 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive [v6]
In-Reply-To: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
Message-ID: <xrWpvN3_d13zw4xjf50yz_IJVc2NpiExAUojkXMhrKs=.9a4b9cc0-ebaa-4c36-93b1-6340df92b58d@github.com>

> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.
> 
> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
> 
> This RFE changes the algorithm to be non-recursive. 
> 
> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.
> 
> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.
> 
> Testing:
> 
> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
> - Ran locally all jtreg tests in jdk/jfr
> - GHAs

Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision:

 - completely revert test changes
 - revert part of the test changes

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28659/files
  - new: https://git.openjdk.org/jdk/pull/28659/files/10bc510b..09886f48

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=04-05

  Stats: 108 lines in 1 file changed: 9 ins; 87 del; 12 mod
  Patch: https://git.openjdk.org/jdk/pull/28659.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28659/head:pull/28659

PR: https://git.openjdk.org/jdk/pull/28659

From stuefe at openjdk.org  Thu Dec 18 10:11:20 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 18 Dec 2025 10:11:20 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive [v7]
In-Reply-To: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
Message-ID: <OsYqVBywfp3e-nJiLyRYCP0gD8JXNTp6oWRIXrgGaQ0=.17056c09-ce40-4119-9d03-ab3a98bdcaf3@github.com>

> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.
> 
> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
> 
> This RFE changes the algorithm to be non-recursive. 
> 
> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.
> 
> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.
> 
> Testing:
> 
> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
> - Ran locally all jtreg tests in jdk/jfr
> - GHAs

Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:

  do strides for arrays

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28659/files
  - new: https://git.openjdk.org/jdk/pull/28659/files/09886f48..94ce9065

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28659&range=05-06

  Stats: 28 lines in 2 files changed: 21 ins; 0 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/28659.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28659/head:pull/28659

PR: https://git.openjdk.org/jdk/pull/28659

From duke at openjdk.org  Thu Dec 18 20:31:07 2025
From: duke at openjdk.org (Robert Toyonaga)
Date: Thu, 18 Dec 2025 20:31:07 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive [v7]
In-Reply-To: <OsYqVBywfp3e-nJiLyRYCP0gD8JXNTp6oWRIXrgGaQ0=.17056c09-ce40-4119-9d03-ab3a98bdcaf3@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
 <OsYqVBywfp3e-nJiLyRYCP0gD8JXNTp6oWRIXrgGaQ0=.17056c09-ce40-4119-9d03-ab3a98bdcaf3@github.com>
Message-ID: <wCQ-9DsMs95bsgTDUQ3ewAEZl1qxPIcmNi2onzxsMX0=.e0dd4726-49fe-425a-82d3-2f3f782694f4@github.com>

On Thu, 18 Dec 2025 10:11:20 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.
>> 
>> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
>> 
>> This RFE changes the algorithm to be non-recursive. 
>> 
>> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.
>> 
>> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.
>> 
>> Testing:
>> 
>> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
>> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
>> - Ran locally all jtreg tests in jdk/jfr
>> - GHAs
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   do strides for arrays

I think adding this array striding is a good idea to avoid flooding the stack due to large arrays. I left a comment about a possible problem below.

src/hotspot/share/jfr/leakprofiler/chains/dfsClosure.cpp line 134:

> 132:             ProbeStackItem psi2 = psi;
> 133:             psi2.chunk ++;
> 134:             _probe_stack.push(psi2);

Could it be a problem that `pointee` has already been marked? To accomplish the striding, the same `pointee` needs to be revisited with the new chunk count to evaluate the next range.  However, the next time it's popped off the stack, it will get skipped over on line 108 since it's already been marked. 

I ran `TestJcmdDumpPathToGCRootsBFSDFS.java` and the test passes even after I add `assert(psi.chunk==0 )` in this block, which might indicate only the first range of each array is ever getting evaluated.

-------------

PR Review: https://git.openjdk.org/jdk/pull/28659#pullrequestreview-3594914234
PR Review Comment: https://git.openjdk.org/jdk/pull/28659#discussion_r2632525732

From haosun at openjdk.org  Fri Dec 19 06:50:19 2025
From: haosun at openjdk.org (Hao Sun)
Date: Fri, 19 Dec 2025 06:50:19 GMT
Subject: RFR: 8373122: JFR build failure with CDS disabled due to
 -Werror=unused-function after JDK-8365400
Message-ID: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com>

`get_source(const AOTClassLocation* cl, JavaThread* jt)` is only used by `JfrClassDefineEvent::on_restoration()` within an `INCLUDE_CDS` guard.

Add the same `INCLUDE_CDS` guard to mitigate the GCC warning.

Tests: JDK build with CDS disabled passed on both x86_64 and aarch64.
       tier1~3 passed on both x86_64 and aarch64.

-------------

Commit messages:
 - 8373122: JFR build failure with CDS disabled due to -Werror=unused-function after JDK-8365400

Changes: https://git.openjdk.org/jdk/pull/28917/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28917&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8373122
  Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/28917.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28917/head:pull/28917

PR: https://git.openjdk.org/jdk/pull/28917

From stuefe at openjdk.org  Fri Dec 19 07:09:58 2025
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Fri, 19 Dec 2025 07:09:58 GMT
Subject: RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be
 non-recursive [v7]
In-Reply-To: <wCQ-9DsMs95bsgTDUQ3ewAEZl1qxPIcmNi2onzxsMX0=.e0dd4726-49fe-425a-82d3-2f3f782694f4@github.com>
References: <c9FfJQcwqSiSom29B99YL-YSvR4nPbSK3etoEF9PSYg=.49e1d683-78fd-454f-9b05-c7d3d4fcc406@github.com>
 <OsYqVBywfp3e-nJiLyRYCP0gD8JXNTp6oWRIXrgGaQ0=.17056c09-ce40-4119-9d03-ab3a98bdcaf3@github.com>
 <wCQ-9DsMs95bsgTDUQ3ewAEZl1qxPIcmNi2onzxsMX0=.e0dd4726-49fe-425a-82d3-2f3f782694f4@github.com>
Message-ID: <W7maE0twAZjWhVSoKH09hmLfEYv4gbDOiCMuz30UhSg=.fc06a3fa-8420-4906-8166-de8764976dfa@github.com>

On Thu, 18 Dec 2025 20:26:14 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   do strides for arrays
>
> src/hotspot/share/jfr/leakprofiler/chains/dfsClosure.cpp line 134:
> 
>> 132:             ProbeStackItem psi2 = psi;
>> 133:             psi2.chunk ++;
>> 134:             _probe_stack.push(psi2);
> 
> Could checking the marked status of `pointee` now become a problem ? To accomplish the striding, the same `pointee` needs to be revisited with the new chunk count to evaluate the next range.  However, the next time it's popped off the stack, it will get skipped over on line 108 since it's already been marked. 
> 
> `TestJcmdDumpPathToGCRootsBFSDFS.java` passes even after I add `assert(psi.chunk==0 )` in this block after line 122, which might indicate only the first range of each array is ever getting evaluated.

Yes, I saw this yesterday too. The current version does not work. I am rethinking the solution.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28659#discussion_r2633959011

From fandreuzzi at openjdk.org  Fri Dec 19 10:18:19 2025
From: fandreuzzi at openjdk.org (Francesco Andreuzzi)
Date: Fri, 19 Dec 2025 10:18:19 GMT
Subject: RFR: 8373122: JFR build failure with CDS disabled due to
 -Werror=unused-function after JDK-8365400
In-Reply-To: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com>
References: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com>
Message-ID: <3__AkRGeel74eUCetVlLDTiL18A3RxM_OlTHuO-vTt0=.3296278a-44a1-4095-b390-1fc510659d25@github.com>

On Fri, 19 Dec 2025 06:43:05 GMT, Hao Sun <haosun at openjdk.org> wrote:

> `get_source(const AOTClassLocation* cl, JavaThread* jt)` is only used by `JfrClassDefineEvent::on_restoration()` within an `INCLUDE_CDS` guard.
> 
> Add the same `INCLUDE_CDS` guard to mitigate the GCC warning.
> 
> Tests: JDK build with CDS disabled passed on both x86_64 and aarch64.
>        tier1~3 passed on both x86_64 and aarch64.

Marked as reviewed by fandreuzzi (Committer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/28917#pullrequestreview-3597965372

From duke at openjdk.org  Fri Dec 19 12:15:42 2025
From: duke at openjdk.org (duke)
Date: Fri, 19 Dec 2025 12:15:42 GMT
Subject: Withdrawn: 8365306: Provide OS Process Size and Libc statistic metrics
 to JFR
In-Reply-To: <r4zqfECC3D8jZuWpPIDcE6PRmCkiKTQqXmpaQX-lG6k=.d57bc06c-92e8-4a01-827c-0013fbb9bed0@github.com>
References: <r4zqfECC3D8jZuWpPIDcE6PRmCkiKTQqXmpaQX-lG6k=.d57bc06c-92e8-4a01-827c-0013fbb9bed0@github.com>
Message-ID: <0qN4sAF2Ov5kJnH-f5HiWvk93t29ycVow454Rw9lsAs=.f548a8b3-387a-4f7b-9eb5-41f259a5e2f7@github.com>

On Wed, 13 Aug 2025 09:42:57 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> This provides the following new metrics:
> - `ProcessSize` event (new, periodic) 
>   - vsize (for analyzing address-space fragmentation issues)
>   - RSS including subtypes (subtypes are useful for excluding atypical issues, e.g. kernel problems that cause large file buffer bloat)
>   - peak RSS 
>   - process swap (if we swap we cannot trust the RSS values, plus it indicates bad sizing)
>   - pte size (to quickly see if we run with a super-large working set but an unsuitably small page size)
> - `LibcStatistics` (new, periodic)
>   - outstanding malloc size (important counterpoint to whatever NMT tries to tell me, which alone is often misleading)
>   - retained malloc size (super-important for the same reason)
>   - number of libc trims the hotspot executed (needed to gauge the usefulness of the retain counter, and to see if a customer employs native heap auto trimming (`-XX:TrimNativeHeapInterval`)
> - `NativeHeapTrim` (new, event-driven) (for both manual and automatic trims)
>    - RSS before and RSS after
>    - RSS recovered by this trim
>    - whether it was an automatic or manual trim
>    - duration
> - `JavaThreadStatistic`
>   - os thread counter (new field) (useful to understand the behavior of third-party code in our process if threads are created that bypass the JVM. E.g. some custom launchers do that.)
>   - nonJava thread counter (new field) (needed to interprete the os thread counter)
> 
> Notes:
> - we already have `ResidentSetSize` event, and the new `ProcessSize` event is a superset of that. I don't know how these cases are handled. I'd prefer to throw the old event out, but JMC has a hard-coded chart for RSS, so I kept it in unless someone tells me to remove it.
> 
> - Obviously, the libc events are very platform-specific. Still, I argue that these metrics are highly useful. We want people to use JFR and JMC; people include developers that are dealing with performance problems that require platform-specific knowledge to understand. See my comment in the JBS issue.
> 
> I provided implementations, as far as possible, to Linux, MacOS and Windows.
> 
> Testing:
> - ran the new tests manually and as part of GHAs

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/26756

From mgronlun at openjdk.org  Sun Dec 21 13:18:53 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Sun, 21 Dec 2025 13:18:53 GMT
Subject: RFR: 8373122: JFR build failure with CDS disabled due to
 -Werror=unused-function after JDK-8365400
In-Reply-To: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com>
References: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com>
Message-ID: <rkA3KyeES9ZPro54a1E6wLsBbWNbtfh0Ty03mBJAnU0=.957d79f1-81e9-4e7b-b83f-39e5c87e6b2b@github.com>

On Fri, 19 Dec 2025 06:43:05 GMT, Hao Sun <haosun at openjdk.org> wrote:

> `get_source(const AOTClassLocation* cl, JavaThread* jt)` is only used by `JfrClassDefineEvent::on_restoration()` within an `INCLUDE_CDS` guard.
> 
> Add the same `INCLUDE_CDS` guard to mitigate the GCC warning.
> 
> Tests: JDK build with CDS disabled passed on both x86_64 and aarch64.
>        tier1~3 passed on both x86_64 and aarch64.

Changes requested by mgronlun (Reviewer).

src/hotspot/share/jfr/support/jfrClassDefineEvent.cpp line 129:

> 127: 
> 128: #if INCLUDE_CDS
> 129: static traceid get_source(const AOTClassLocation* cl, JavaThread* jt) {

You can just move down this static helper routine and put it before line 177; then no new conditionals needs to be introduced.

-------------

PR Review: https://git.openjdk.org/jdk/pull/28917#pullrequestreview-3601700892
PR Review Comment: https://git.openjdk.org/jdk/pull/28917#discussion_r2637817002

From egahlin at openjdk.org  Sun Dec 21 17:15:10 2025
From: egahlin at openjdk.org (Erik Gahlin)
Date: Sun, 21 Dec 2025 17:15:10 GMT
Subject: RFR: 8367949: JFR: MethodTrace double-counts methods that catch their
 own exceptions
Message-ID: <HPjDOiRW6xzZhUzA3NxhdD_8y1lda5mz4KVKTHcxr20=.ab9d0033-729a-42ef-b071-821335cd2944@github.com>

Could I have a review of a PR that changes how the instrumentation of the MethodTrace and MethodTiming events is implemented, so they handle exceptions in a better way? 

For constructors, the current implementation is still used in certain corner cases. A proper implementation would require data-flow analysis, but for all practical purposes this code should work fine.

Testing: jdk/jdk/jfr

Thanks
Erik

-------------

Commit messages:
 - Trailing whitespace
 - Fix trailing whitespaec
 - Fix whitespaces
 - Initial

Changes: https://git.openjdk.org/jdk/pull/28947/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28947&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367949
  Stats: 361 lines in 5 files changed: 331 ins; 1 del; 29 mod
  Patch: https://git.openjdk.org/jdk/pull/28947.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28947/head:pull/28947

PR: https://git.openjdk.org/jdk/pull/28947

From duke at openjdk.org  Sun Dec 21 19:50:50 2025
From: duke at openjdk.org (khanbilal732)
Date: Sun, 21 Dec 2025 19:50:50 GMT
Subject: RFR: 8367949: JFR: MethodTrace double-counts methods that catch
 their own exceptions
In-Reply-To: <HPjDOiRW6xzZhUzA3NxhdD_8y1lda5mz4KVKTHcxr20=.ab9d0033-729a-42ef-b071-821335cd2944@github.com>
References: <HPjDOiRW6xzZhUzA3NxhdD_8y1lda5mz4KVKTHcxr20=.ab9d0033-729a-42ef-b071-821335cd2944@github.com>
Message-ID: <DSBqx90x8ZXunt0cEXgXVkRYnjQr7ANmY1KamI6O-v0=.bb1a4e83-4016-4a03-8218-f4894f90eb29@github.com>

On Sun, 21 Dec 2025 16:22:25 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

> Could I have a review of a PR that changes how the instrumentation of the MethodTrace and MethodTiming events is implemented, so they handle exceptions in a better way? 
> 
> For constructors, the current implementation is still used in certain corner cases. A proper implementation would require data-flow analysis, but for all practical purposes this code should work fine.
> 
> Testing: jdk/jdk/jfr
> 
> Thanks
> Erik

Marked as reviewed by khanbilal732 at github.com (no known OpenJDK username).

-------------

PR Review: https://git.openjdk.org/jdk/pull/28947#pullrequestreview-3601936872

From duke at openjdk.org  Sun Dec 21 19:56:53 2025
From: duke at openjdk.org (khanbilal732)
Date: Sun, 21 Dec 2025 19:56:53 GMT
Subject: RFR: 8367949: JFR: MethodTrace double-counts methods that catch
 their own exceptions
In-Reply-To: <HPjDOiRW6xzZhUzA3NxhdD_8y1lda5mz4KVKTHcxr20=.ab9d0033-729a-42ef-b071-821335cd2944@github.com>
References: <HPjDOiRW6xzZhUzA3NxhdD_8y1lda5mz4KVKTHcxr20=.ab9d0033-729a-42ef-b071-821335cd2944@github.com>
Message-ID: <WMeAHHk1c1FOnuRcQOrk9DuZP743ZJ0Fdidw-_eEjMg=.93caf8e6-1b76-40da-a687-b0dc49baeaf8@github.com>

On Sun, 21 Dec 2025 16:22:25 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

> Could I have a review of a PR that changes how the instrumentation of the MethodTrace and MethodTiming events is implemented, so they handle exceptions in a better way? 
> 
> For constructors, the current implementation is still used in certain corner cases. A proper implementation would require data-flow analysis, but for all practical purposes this code should work fine.
> 
> Testing: jdk/jdk/jfr
> 
> Thanks
> Erik

Marked as reviewed by khanbilal732 at github.com (no known OpenJDK username).

-------------

PR Review: https://git.openjdk.org/jdk/pull/28947#pullrequestreview-3601947729

From haosun at openjdk.org  Tue Dec 23 03:33:08 2025
From: haosun at openjdk.org (Hao Sun)
Date: Tue, 23 Dec 2025 03:33:08 GMT
Subject: RFR: 8373122: JFR build failure with CDS disabled due to
 -Werror=unused-function after JDK-8365400 [v2]
In-Reply-To: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com>
References: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com>
Message-ID: <fPW9KGv8fBiEHRl6nB-KHQUBoPyhc0k-9CbpyZV7no0=.7086a92f-1556-437c-9475-142e37440a03@github.com>

> `get_source(const AOTClassLocation* cl, JavaThread* jt)` is only used by `JfrClassDefineEvent::on_restoration()` within an `INCLUDE_CDS` guard.
> 
> Add the same `INCLUDE_CDS` guard to mitigate the GCC warning.
> 
> Tests: JDK build with CDS disabled passed on both x86_64 and aarch64.
>        tier1~3 passed on both x86_64 and aarch64.

Hao Sun has updated the pull request incrementally with one additional commit since the last revision:

  Move down the helper to avoid introducing new conditions
  
  Address mgronlun's comment

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28917/files
  - new: https://git.openjdk.org/jdk/pull/28917/files/c1cbc737..bccc9eac

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28917&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28917&range=00-01

  Stats: 30 lines in 1 file changed: 14 ins; 16 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/28917.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28917/head:pull/28917

PR: https://git.openjdk.org/jdk/pull/28917

From haosun at openjdk.org  Tue Dec 23 03:33:10 2025
From: haosun at openjdk.org (Hao Sun)
Date: Tue, 23 Dec 2025 03:33:10 GMT
Subject: RFR: 8373122: JFR build failure with CDS disabled due to
 -Werror=unused-function after JDK-8365400 [v2]
In-Reply-To: <rkA3KyeES9ZPro54a1E6wLsBbWNbtfh0Ty03mBJAnU0=.957d79f1-81e9-4e7b-b83f-39e5c87e6b2b@github.com>
References: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com>
 <rkA3KyeES9ZPro54a1E6wLsBbWNbtfh0Ty03mBJAnU0=.957d79f1-81e9-4e7b-b83f-39e5c87e6b2b@github.com>
Message-ID: <g-ybLOVvv7qMvACguY946Avlch_hT-8wVeFs4jlhQNo=.955fcaf2-a136-48e9-b629-b63511030665@github.com>

On Sun, 21 Dec 2025 13:15:26 GMT, Markus Gr?nlund <mgronlun at openjdk.org> wrote:

>> Hao Sun has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Move down the helper to avoid introducing new conditions
>>   
>>   Address mgronlun's comment
>
> src/hotspot/share/jfr/support/jfrClassDefineEvent.cpp line 129:
> 
>> 127: 
>> 128: #if INCLUDE_CDS
>> 129: static traceid get_source(const AOTClassLocation* cl, JavaThread* jt) {
> 
> You can just move down this static helper routine and put it before line 177; then no new conditionals needs to be introduced.

Thanks for your review.

Agree. Updated in the new commit.

Tests: JDK build with CDS disabled passed on both x86_64 and aarch64.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28917#discussion_r2641816038

From jiefu at openjdk.org  Tue Dec 23 03:51:52 2025
From: jiefu at openjdk.org (Jie Fu)
Date: Tue, 23 Dec 2025 03:51:52 GMT
Subject: RFR: 8373122: JFR build failure with CDS disabled due to
 -Werror=unused-function after JDK-8365400 [v2]
In-Reply-To: <fPW9KGv8fBiEHRl6nB-KHQUBoPyhc0k-9CbpyZV7no0=.7086a92f-1556-437c-9475-142e37440a03@github.com>
References: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com>
 <fPW9KGv8fBiEHRl6nB-KHQUBoPyhc0k-9CbpyZV7no0=.7086a92f-1556-437c-9475-142e37440a03@github.com>
Message-ID: <paIa87ZZfuOwajHwUdIdPp14vrpTGJPb9rKQF_1_nlA=.d65bafda-8fcb-48d6-ba33-a06eed92a9fa@github.com>

On Tue, 23 Dec 2025 03:33:08 GMT, Hao Sun <haosun at openjdk.org> wrote:

>> `get_source(const AOTClassLocation* cl, JavaThread* jt)` is only used by `JfrClassDefineEvent::on_restoration()` within an `INCLUDE_CDS` guard.
>> 
>> Add the same `INCLUDE_CDS` guard to mitigate the GCC warning.
>> 
>> Tests: JDK build with CDS disabled passed on both x86_64 and aarch64.
>>        tier1~3 passed on both x86_64 and aarch64.
>
> Hao Sun has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Move down the helper to avoid introducing new conditions
>   
>   Address mgronlun's comment

LGTM

-------------

Marked as reviewed by jiefu (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/28917#pullrequestreview-3606435895

From mgronlun at openjdk.org  Tue Dec 23 07:39:56 2025
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Tue, 23 Dec 2025 07:39:56 GMT
Subject: RFR: 8373122: JFR build failure with CDS disabled due to
 -Werror=unused-function after JDK-8365400 [v2]
In-Reply-To: <fPW9KGv8fBiEHRl6nB-KHQUBoPyhc0k-9CbpyZV7no0=.7086a92f-1556-437c-9475-142e37440a03@github.com>
References: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com>
 <fPW9KGv8fBiEHRl6nB-KHQUBoPyhc0k-9CbpyZV7no0=.7086a92f-1556-437c-9475-142e37440a03@github.com>
Message-ID: <ZDoutTHGiCyF1Z9MZzk-wNEJ3CcOLLhDmVaxq_VsRYU=.181f9e69-d09c-4c05-933c-8b34067066d1@github.com>

On Tue, 23 Dec 2025 03:33:08 GMT, Hao Sun <haosun at openjdk.org> wrote:

>> `get_source(const AOTClassLocation* cl, JavaThread* jt)` is only used by `JfrClassDefineEvent::on_restoration()` within an `INCLUDE_CDS` guard.
>> 
>> Add the same `INCLUDE_CDS` guard to mitigate the GCC warning.
>> 
>> Tests: JDK build with CDS disabled passed on both x86_64 and aarch64.
>>        tier1~3 passed on both x86_64 and aarch64.
>
> Hao Sun has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Move down the helper to avoid introducing new conditions
>   
>   Address mgronlun's comment

Marked as reviewed by mgronlun (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/28917#pullrequestreview-3606957792

From haosun at openjdk.org  Tue Dec 23 08:12:03 2025
From: haosun at openjdk.org (Hao Sun)
Date: Tue, 23 Dec 2025 08:12:03 GMT
Subject: RFR: 8373122: JFR build failure with CDS disabled due to
 -Werror=unused-function after JDK-8365400 [v2]
In-Reply-To: <3__AkRGeel74eUCetVlLDTiL18A3RxM_OlTHuO-vTt0=.3296278a-44a1-4095-b390-1fc510659d25@github.com>
References: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com>
 <3__AkRGeel74eUCetVlLDTiL18A3RxM_OlTHuO-vTt0=.3296278a-44a1-4095-b390-1fc510659d25@github.com>
Message-ID: <71r5X0qqdlMsJQOvYKZUhBS6fh5Pm15_1wwFPdr-0kk=.84de0c94-5041-4a5b-a023-701d87085b7e@github.com>

On Fri, 19 Dec 2025 10:16:02 GMT, Francesco Andreuzzi <fandreuzzi at openjdk.org> wrote:

>> Hao Sun has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Move down the helper to avoid introducing new conditions
>>   
>>   Address mgronlun's comment
>
> Marked as reviewed by fandreuzzi (Committer).

Thanks for your review @fandreuz @DamonFool and @mgronlun 
All GHA tests passed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28917#issuecomment-3685643472

From haosun at openjdk.org  Tue Dec 23 08:12:04 2025
From: haosun at openjdk.org (Hao Sun)
Date: Tue, 23 Dec 2025 08:12:04 GMT
Subject: Integrated: 8373122: JFR build failure with CDS disabled due to
 -Werror=unused-function after JDK-8365400
In-Reply-To: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com>
References: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com>
Message-ID: <K2kvEfmtSry8NPADHgd7wJecVKqCr4dyYavQQA-8aHw=.7f38255f-85b5-4caa-bbdc-ea5cc914b62b@github.com>

On Fri, 19 Dec 2025 06:43:05 GMT, Hao Sun <haosun at openjdk.org> wrote:

> `get_source(const AOTClassLocation* cl, JavaThread* jt)` is only used by `JfrClassDefineEvent::on_restoration()` within an `INCLUDE_CDS` guard.
> 
> Add the same `INCLUDE_CDS` guard to mitigate the GCC warning.
> 
> Tests: JDK build with CDS disabled passed on both x86_64 and aarch64.
>        tier1~3 passed on both x86_64 and aarch64.

This pull request has now been integrated.

Changeset: e1d81c09
Author:    Hao Sun <haosun at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/e1d81c0946364a266a006481a8fbbac24c7e6c6a
Stats:     28 lines in 1 file changed: 14 ins; 14 del; 0 mod

8373122: JFR build failure with CDS disabled due to -Werror=unused-function after JDK-8365400

Reviewed-by: mgronlun, jiefu, fandreuzzi

-------------

PR: https://git.openjdk.org/jdk/pull/28917

From haosun at openjdk.org  Tue Dec 23 08:40:01 2025
From: haosun at openjdk.org (Hao Sun)
Date: Tue, 23 Dec 2025 08:40:01 GMT
Subject: RFR: 8373122: JFR build failure with CDS disabled due to
 -Werror=unused-function after JDK-8365400 [v2]
In-Reply-To: <fPW9KGv8fBiEHRl6nB-KHQUBoPyhc0k-9CbpyZV7no0=.7086a92f-1556-437c-9475-142e37440a03@github.com>
References: <9ZpxX7-3oYs3cIxGUVJP6Jxy1OfklS5W-FyaofTj32E=.adf7b4e3-3582-465d-b851-5479b1d7be7e@github.com>
 <fPW9KGv8fBiEHRl6nB-KHQUBoPyhc0k-9CbpyZV7no0=.7086a92f-1556-437c-9475-142e37440a03@github.com>
Message-ID: <hG0ZngCwMtomUWFvv6Dhdbj0uf2NY3VIn5guwxsMaNg=.8a09d588-791b-4d2a-9a40-19e461619084@github.com>

On Tue, 23 Dec 2025 03:33:08 GMT, Hao Sun <haosun at openjdk.org> wrote:

>> `get_source(const AOTClassLocation* cl, JavaThread* jt)` is only used by `JfrClassDefineEvent::on_restoration()` within an `INCLUDE_CDS` guard.
>> 
>> Add the same `INCLUDE_CDS` guard to mitigate the GCC warning.
>> 
>> Tests: JDK build with CDS disabled passed on both x86_64 and aarch64.
>>        tier1~3 passed on both x86_64 and aarch64.
>
> Hao Sun has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Move down the helper to avoid introducing new conditions
>   
>   Address mgronlun's comment

Seems that I should backport this fix to jdk26 as this issue is a P3 bug and it affects jdk26 version as well.

Will create a backport PR soon.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28917#issuecomment-3685733710

From haosun at openjdk.org  Wed Dec 24 03:53:11 2025
From: haosun at openjdk.org (Hao Sun)
Date: Wed, 24 Dec 2025 03:53:11 GMT
Subject: [jdk26] RFR: 8373122: JFR build failure with CDS disabled due to
 -Werror=unused-function after JDK-8365400
Message-ID: <bf8PYT6GJNX_OWCrzKFb-7rDf_iulAKltERahN_Mz0c=.9db96d5c-b011-455f-94c0-a85b89610af5@github.com>

Hi all,

This pull request contains a backport of commit [e1d81c09](https://github.com/openjdk/jdk/commit/e1d81c0946364a266a006481a8fbbac24c7e6c6a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository.

The commit being backported was authored by Hao Sun on 23 Dec 2025 and was reviewed by Markus Gr?nlund, Jie Fu and Francesco Andreuzzi.

Thanks!

-------------

Commit messages:
 - Backport e1d81c0946364a266a006481a8fbbac24c7e6c6a

Changes: https://git.openjdk.org/jdk/pull/28976/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28976&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8373122
  Stats: 28 lines in 1 file changed: 14 ins; 14 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/28976.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28976/head:pull/28976

PR: https://git.openjdk.org/jdk/pull/28976

From haosun at openjdk.org  Wed Dec 24 09:00:53 2025
From: haosun at openjdk.org (Hao Sun)
Date: Wed, 24 Dec 2025 09:00:53 GMT
Subject: [jdk26] RFR: 8373122: JFR build failure with CDS disabled due to
 -Werror=unused-function after JDK-8365400
In-Reply-To: <bf8PYT6GJNX_OWCrzKFb-7rDf_iulAKltERahN_Mz0c=.9db96d5c-b011-455f-94c0-a85b89610af5@github.com>
References: <bf8PYT6GJNX_OWCrzKFb-7rDf_iulAKltERahN_Mz0c=.9db96d5c-b011-455f-94c0-a85b89610af5@github.com>
Message-ID: <mhHqEgRK9o7fOY4D6eVy_Ez8qfbwvo2rGkeFsiYd1pg=.d52b13ba-084e-48c2-9511-a3daec7275b1@github.com>

On Wed, 24 Dec 2025 03:46:49 GMT, Hao Sun <haosun at openjdk.org> wrote:

> Hi all,
> 
> This pull request contains a backport of commit [e1d81c09](https://github.com/openjdk/jdk/commit/e1d81c0946364a266a006481a8fbbac24c7e6c6a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository.
> 
> The commit being backported was authored by Hao Sun on 23 Dec 2025 and was reviewed by Markus Gr?nlund, Jie Fu and Francesco Andreuzzi.
> 
> Thanks!

GHA test **linux-x64/build** failed. I don't think it's related to this patch. I noticed that this GHA test failed in several other PRs.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28976#issuecomment-3689157673

From duke at openjdk.org  Wed Dec 24 18:51:06 2025
From: duke at openjdk.org (=?UTF-8?B?SmVhbi1Ob8OrbA==?= Rouvignac)
Date: Wed, 24 Dec 2025 18:51:06 GMT
Subject: RFR: 8367949: JFR: MethodTrace double-counts methods that catch
 their own exceptions
In-Reply-To: <HPjDOiRW6xzZhUzA3NxhdD_8y1lda5mz4KVKTHcxr20=.ab9d0033-729a-42ef-b071-821335cd2944@github.com>
References: <HPjDOiRW6xzZhUzA3NxhdD_8y1lda5mz4KVKTHcxr20=.ab9d0033-729a-42ef-b071-821335cd2944@github.com>
Message-ID: <rZsZQNvGnVUaBLBa684HBiEyf7PXxcJVi1pjq9_4en8=.e856ef52-2277-4623-bf06-00570dcc3755@github.com>

On Sun, 21 Dec 2025 16:22:25 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

> Could I have a review of a PR that changes how the instrumentation of the MethodTrace and MethodTiming events is implemented, so they handle exceptions in a better way? 
> 
> For constructors, the current implementation is still used in certain corner cases. A proper implementation would require data-flow analysis, but for all practical purposes this code should work fine.
> 
> Testing: jdk/jdk/jfr
> 
> Thanks
> Erik

src/jdk.jfr/share/classes/jdk/jfr/internal/tracing/Instrumentation.java line 80:

> 78:         byte[] generated = classFile.build(classModel.thisClass().asSymbol(), classBuilder -> {
> 79:             for (var ce : classModel) {
> 80:                 if (modifyClassElement(classModel,classBuilder, ce)) {

Suggestion:

                if (modifyClassElement(classModel, classBuilder, ce)) {

src/jdk.jfr/share/classes/jdk/jfr/internal/tracing/Instrumentation.java line 111:

> 109:                 Modification m = tm.modification();
> 110:                 if (m.tracing() || m.timing()) {
> 111:                     return modifyMethod(classModel,classBuilder, mm, tm);

Suggestion:

                    return modifyMethod(classModel, classBuilder, mm, tm);

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28947#discussion_r2646154444
PR Review Comment: https://git.openjdk.org/jdk/pull/28947#discussion_r2646154577

From ozanctn at amazon.com  Wed Dec 24 09:10:28 2025
From: ozanctn at amazon.com (Cetin, Ozan)
Date: Wed, 24 Dec 2025 09:10:28 -0000
Subject: [JDK21u] JDK-8337994 REDO backport failure analysis - Missing
 prerequisite changes from JDK-8316241
References: <FEA81723-2445-4008-A5F2-5D40CACDFF86@amazon.com>
Message-ID: <0102019b4f9fb0f8-30618190-5828-49d8-ac08-a74d9cc8c2b7-000000@eu-west-1.amazonses.com>

Attachment protected by Amazon:

[JDK-8316241 Analysis.pdf]
https://eu-central-1.secure-attach.amazon.com/c7b94fc4-394c-4db0-82f2-8629a0139ea2/cf788121-393c-4adc-a634-540f090a3e1f

Amazon has replaced the attachment in this email with a download link. Downloads will be available until January 23, 2026, 09:10 (UTC+00:00).
[Tell us what you think] https://amazonexteu.qualtrics.com/jfe/form/SV_ehuz6zGo8YnsRKK
[For more information click here] https://docs.secure-attach.amazon.com/guide


Hi JFR team,

I've been investigating the test failures that caused JDK-8346108 (the revert of JDK-8337994 REDO in JDK21u). This is related to the native memory leak when not recording any JFR events (JDK-8335121).

Summary
Based on our investigation, we believe the JDK-8337994 (REDO) backport to JDK21 failed because it appears to depend on API changes introduced in the original JDK-8316241 fix that were never backported to JDK21. Our theory is that the REDO fix assumes the existence of infrastructure that only exists in later mainline releases.

Root Cause Analysis
The Missing Prerequisite

The original JDK-8316241 fix (commit b2a39c576706622b624314c89fa6d10d0b422f86) introduced several key changes to jfrTypeSetUtils.hpp/.cpp:

  1.  API Change: should_do_loader_klass(const Klass* k) ? should_do_cld_klass(const Klass* k, bool leakp)
  2.  New Data Structure: Added _klass_loader_leakp_set for separate tracking of leakp (leak profiler) path klasses
  3.  New Function: get_cld_klass(CldPtr cld, bool leakp) in jfrTypeSet.cpp that properly enqueues CLD klasses via JfrTraceId::load()

What Happens Without These Changes

The REDO fix attempts to use get_cld_klass() which calls should_do_cld_klass(klass, leakp), but in the JDK21 backport:

  *   JDK21 still has the old API: should_do_loader_klass(const Klass* k) (no leakp parameter)
  *   JDK21 lacks _klass_loader_leakp_set for separate tracking
  *   The get_cld_klass() function doesn't exist in the JDK21 codebase

This causes the assert(IS_SERIALIZED(class_loader_klass)) to fail in write_cld() because the CLD's class_loader_klass is never properly enqueued for serialization during the leakp path.


Test Failure Mechanism (TestChunkIntegrity.java)

1. TestClassLoader loads MyClass

2. Event commits with clazz = MyClass

3. JFR rotation writes MyClass to chunk

4. MyClass's CLD references TestClassLoader Klass

5. BUG: TestClassLoader Klass not serialized (leakp path broken)

6. Chunk written with broken reference

7. In slowdebug: assert(IS_SERIALIZED(class_loader_klass)) fails

8. In release: "Events don't match" when comparing chunks


The Fix
I've been able to get a local jdk21u build passing all tests (including slowdebug) by backporting JDK-8316241 and resolving the resulting conflicts. The key changes are:
1. jfrTypeSetUtils.hpp
// OLD (JDK21 current)
bool should_do_loader_klass(const Klass* k);

// NEW (with leakp support)
bool should_do_cld_klass(const Klass* k, bool leakp);

2. jfrTypeSetUtils.cpp

// Added _klass_loader_leakp_set member

GrowableArray<const Klass*>* _klass_loader_leakp_set;


// Updated implementation

bool JfrArtifactSet::should_do_cld_klass(const Klass* k, bool leakp) {

  assert(k != nullptr, "invariant");

  assert(_klass_loader_set != nullptr, "invariant");

  assert(_klass_loader_leakp_set != nullptr, "invariant");

  return not_in_set(leakp ? _klass_loader_leakp_set : _klass_loader_set, k);

}


3. jfrTypeSet.cpp - Added get_cld_klass()

static inline KlassPtr get_cld_klass(CldPtr cld, bool leakp) {

  if (cld == nullptr) {

    return nullptr;

  }

  assert(leakp ? IS_LEAKP(cld) : used(cld), "invariant");

  KlassPtr cld_klass = cld->class_loader_klass();

  if (cld_klass == nullptr) {

    return nullptr;

  }

  if (should_do_cld_klass(cld_klass, leakp)) {

    if (current_epoch()) {

      // KEY FIX: Enqueue the klass for serialization

      JfrTraceId::load(cld_klass);

    } else {

      artifact_tag(cld_klass, leakp);

    }

    return cld_klass;

  }

  return nullptr;

}


Proposed Action
Based on this, it appears that backporting JDK-8337994 (REDO) alone may not be sufficient, and that some or all the prerequisite infrastructure changes from JDK-8316241 may also need to be backported.
Additionally, there may be other upstream commits (such as 8323631<https://github.com/openjdk/jdk/commit/e2d6023cb9667dc9911e0af421d6dd0c78f6bf58>) in JDK24 that were made on top of JDK-8316241 that could also be required for the fix to not cause other possible errors. We would appreciate guidance on identifying any additional changes that might need to be included in the backport.
If this direction makes sense, I'm happy to prepare a proper patch for review.

References

  *   JDK-8335121<https://bugs.openjdk.org/browse/JDK-8335121>: Native memory leak when JFR is enabled but no events are emitted
  *   JDK-8316241<https://github.com/openjdk/jdk/commit/b2a39c576706622b624314c89fa6d10d0b422f86#diff-e9a35c652aa2e65265e7027d3093298a6c59d2137cbf3fa6a5b25b895d77beb1L73>: Test jdk/jdk/jfr/jvm/TestChunkIntegrity.java failed (original fix)
  *   JDK-8337994<https://github.com/openjdk/jdk/commit/6a9a867d645b8fe86f4ca2b04a43bf5aa8f9d487>: [REDO] Native memory leak when not recording any events
  *   JDK-8346108<https://bugs.openjdk.org/browse/JDK-8346108>: Revert of REDO in JDK21u due to test failures


Best Regards,
Ozan


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-jfr-dev/attachments/20251224/638669b2/attachment-0001.htm>