From jwaters at openjdk.org  Mon Jul  1 04:21:23 2024
From: jwaters at openjdk.org (Julian Waters)
Date: Mon, 1 Jul 2024 04:21:23 GMT
Subject: RFR: 8335283: Build failure due to 'no_sanitize' attribute
 directive ignored
In-Reply-To: <AEE6fRQYsPzCnuCjDiT7JMmMIL9j2NleCZBs33Y9P7o=.65148c51-5815-458d-be42-eeed08ddfba7@github.com>
References: <AEE6fRQYsPzCnuCjDiT7JMmMIL9j2NleCZBs33Y9P7o=.65148c51-5815-458d-be42-eeed08ddfba7@github.com>
Message-ID: <kj0Pm-kX9z8Cdf7UnRFxzU0C2idwoYb9fTOzWEM3VQo=.283058fa-c75d-42f9-835e-8bf5747c2439@github.com>

On Fri, 28 Jun 2024 11:04:16 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> The following build error has been reported with old gcc used 
> installers/linux/universal/tar/corretto-build/buildRoot/src/hotspot/share/utilities/vmError.cpp:2068:44: error: 'no_sanitize' attribute directive ignored [-Werror=attributes]
>  static void ALWAYSINLINE crash_with_sigfpe() {
> 
> We can avoid it by not settings the mentioned attribute in case ubsan is not enabled.

Marked as reviewed by jwaters (Committer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/19937#pullrequestreview-2150364871

From rehn at openjdk.org  Mon Jul  1 06:22:18 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Mon, 1 Jul 2024 06:22:18 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics
In-Reply-To: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
Message-ID: <RzKNgwshdkyAHmHaN6d32EqBHTSeIh6u8RKs7YhsuLc=.3cc0f5e1-fb1e-4c9d-ba23-1a7bc93d8ba3@github.com>

On Sun, 30 Jun 2024 14:02:00 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:

> Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK.

Thank you, looks good!

-------------

Marked as reviewed by rehn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19960#pullrequestreview-2150512867

From mbaesken at openjdk.org  Mon Jul  1 06:39:30 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Mon, 1 Jul 2024 06:39:30 GMT
Subject: Integrated: 8335283: Build failure due to 'no_sanitize' attribute
 directive ignored
In-Reply-To: <AEE6fRQYsPzCnuCjDiT7JMmMIL9j2NleCZBs33Y9P7o=.65148c51-5815-458d-be42-eeed08ddfba7@github.com>
References: <AEE6fRQYsPzCnuCjDiT7JMmMIL9j2NleCZBs33Y9P7o=.65148c51-5815-458d-be42-eeed08ddfba7@github.com>
Message-ID: <NSOQCamAWJrvAKo4rnT1elhRksMwbhA4WWeZKaEoRKU=.10d03b3c-6a72-4e89-bccb-fc6c94f83e6f@github.com>

On Fri, 28 Jun 2024 11:04:16 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> The following build error has been reported with old gcc used 
> installers/linux/universal/tar/corretto-build/buildRoot/src/hotspot/share/utilities/vmError.cpp:2068:44: error: 'no_sanitize' attribute directive ignored [-Werror=attributes]
>  static void ALWAYSINLINE crash_with_sigfpe() {
> 
> We can avoid it by not settings the mentioned attribute in case ubsan is not enabled.

This pull request has now been integrated.

Changeset: 53242cdf
Author:    Matthias Baesken <mbaesken at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/53242cdf9ef17c502ebd541e84370e7c158639c1
Stats:     2 lines in 1 file changed: 2 ins; 0 del; 0 mod

8335283: Build failure due to 'no_sanitize' attribute directive ignored

Reviewed-by: shade, tschatzl, kbarrett, jwaters

-------------

PR: https://git.openjdk.org/jdk/pull/19937

From mbaesken at openjdk.org  Mon Jul  1 06:39:30 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Mon, 1 Jul 2024 06:39:30 GMT
Subject: RFR: 8335283: Build failure due to 'no_sanitize' attribute
 directive ignored
In-Reply-To: <AEE6fRQYsPzCnuCjDiT7JMmMIL9j2NleCZBs33Y9P7o=.65148c51-5815-458d-be42-eeed08ddfba7@github.com>
References: <AEE6fRQYsPzCnuCjDiT7JMmMIL9j2NleCZBs33Y9P7o=.65148c51-5815-458d-be42-eeed08ddfba7@github.com>
Message-ID: <AMdyfaoRH6SfX4yU2FFq31zmaeAv1wzNSdd9bhsk9i4=.2407074b-782f-4ae4-b5db-d25c37552f6e@github.com>

On Fri, 28 Jun 2024 11:04:16 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> The following build error has been reported with old gcc used 
> installers/linux/universal/tar/corretto-build/buildRoot/src/hotspot/share/utilities/vmError.cpp:2068:44: error: 'no_sanitize' attribute directive ignored [-Werror=attributes]
>  static void ALWAYSINLINE crash_with_sigfpe() {
> 
> We can avoid it by not settings the mentioned attribute in case ubsan is not enabled.

Thanks for the reviews !

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19937#issuecomment-2199351560

From luhenry at openjdk.org  Mon Jul  1 06:43:25 2024
From: luhenry at openjdk.org (Ludovic Henry)
Date: Mon, 1 Jul 2024 06:43:25 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics
In-Reply-To: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
Message-ID: <BWV1qtKhP0MV1SrYotttrc0LqUNWLVWjUqwF5ZQQPj0=.c586f3d6-d2f3-4325-a2c8-9de67f67b6ec@github.com>

On Sun, 30 Jun 2024 14:02:00 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:

> Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2348:

> 2346:     __ lwu(keylen, Address(key, arrayOopDesc::length_offset_in_bytes() - arrayOopDesc::base_offset_in_bytes(T_INT)));
> 2347: 
> 2348:     __ vsetivli(temp1, 4, Assembler::e32, Assembler::m1);

There is no use of `temp1` after, should we replace with `x0`?

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2351:

> 2349:     __ vle32_v(res, from);
> 2350:     __ vmv_v_x(vzero, zr);
> 2351:     generate_vle32_pack4(key, vtmp1, vtmp2, vtmp3, vtmp4);

It would be great to add a quick comment mentioning the side effect on `key` of this function call. Same at https://github.com/openjdk/jdk/pull/19960/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R2355 and https://github.com/openjdk/jdk/pull/19960/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R2359

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2362:

> 2360:     generate_rev8_pack2(vtmp1, vtmp2);
> 2361: 
> 2362:     __ mv(temp2, 44);

You could replace `temp2` by `t0`/`t1`/`t2`

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2448:

> 2446:     __ lwu(keylen, Address(key, arrayOopDesc::length_offset_in_bytes() - arrayOopDesc::base_offset_in_bytes(T_INT)));
> 2447: 
> 2448:     __ vsetivli(temp1, 4, Assembler::e32, Assembler::m1);

Same as for encrypt, there is no use of `temp1`, could you replace by `x0`?

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2459:

> 2457:     generate_aesdecrypt_round(res, vzero, vtmp1, vtmp2, vtmp3, vtmp4);
> 2458: 
> 2459:     generate_vle32_pack4(key, vtmp1, vtmp2, vtmp3, vtmp4);

Same as above, please add a comment on the side effect on `key`.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2466:

> 2464:     generate_rev8_pack2(vtmp1, vtmp2);
> 2465: 
> 2466:     __ mv(temp2, 44);

Same as above, could you use `t0`/`t1`/`t2` instead?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1660541398
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1660542460
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1660541850
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1660543748
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1660544520
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1660544242

From duke at openjdk.org  Mon Jul  1 07:47:20 2024
From: duke at openjdk.org (duke)
Date: Mon, 1 Jul 2024 07:47:20 GMT
Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts
 != 0 to 1 [v18]
In-Reply-To: <XHMQAMOt-SyDN5AtlV53ZI39o9T6PtwbiSnnNaw94S0=.778ad97c-0e9f-41e5-b5e8-1ee24cbfb26e@github.com>
References: <M5khiX3nenq64pSpGHB2ClK0pABiaZb0y7YwKYFhgK4=.602e9a58-6792-4bdb-87cf-a95a4346937f@github.com>
 <XHMQAMOt-SyDN5AtlV53ZI39o9T6PtwbiSnnNaw94S0=.778ad97c-0e9f-41e5-b5e8-1ee24cbfb26e@github.com>
Message-ID: <5dDjN6TS_mEh79bvo2LIOx6cRlyM0kO_Lwcj-AKPH94=.08d1abbe-c4d3-4f03-9805-99c28cd0e47d@github.com>

On Sat, 29 Jun 2024 06:47:52 GMT, Suchismith Roy <sroy at openjdk.org> wrote:

>> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732)
>> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for).
>> 
>> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN.
>> Power10 has the "setbc" / "setbcr" instruction.
>> 
>> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both).
>> 
>> The new code for MacroAssembler::verify_secondary_supers_table  also uses the new function.
>
> Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision:
> 
>   default value correction

@suchismith1993 
Your change (at version 0de46f43f5f7fa233fdd2154edf971941b16ab4a) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19886#issuecomment-2199463949

From sroy at openjdk.org  Mon Jul  1 08:10:36 2024
From: sroy at openjdk.org (Suchismith Roy)
Date: Mon, 1 Jul 2024 08:10:36 GMT
Subject: Integrated: JDK-8331732 : [PPC64] Unify and optimize code which
 converts != 0 to 1
In-Reply-To: <M5khiX3nenq64pSpGHB2ClK0pABiaZb0y7YwKYFhgK4=.602e9a58-6792-4bdb-87cf-a95a4346937f@github.com>
References: <M5khiX3nenq64pSpGHB2ClK0pABiaZb0y7YwKYFhgK4=.602e9a58-6792-4bdb-87cf-a95a4346937f@github.com>
Message-ID: <I1xSNawOD5e5syxs03Rx5xvB6TPWnbKwHBaI2Oc86PE=.28e6fe71-1b01-4dcf-818d-64a440200620@github.com>

On Tue, 25 Jun 2024 15:35:43 GMT, Suchismith Roy <sroy at openjdk.org> wrote:

> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732)
> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for).
> 
> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN.
> Power10 has the "setbc" / "setbcr" instruction.
> 
> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both).
> 
> The new code for MacroAssembler::verify_secondary_supers_table  also uses the new function.

This pull request has now been integrated.

Changeset: c7e9ebb4
Author:    Suchismith Roy <sroy at openjdk.org>
Committer: Martin Doerr <mdoerr at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/c7e9ebb4cfff56b7a977eb2942f563f96b3336bd
Stats:     54 lines in 7 files changed: 33 ins; 11 del; 10 mod

8331732: [PPC64] Unify and optimize code which converts != 0 to 1

Reviewed-by: mdoerr, amitkumar

-------------

PR: https://git.openjdk.org/jdk/pull/19886

From aph at openjdk.org  Mon Jul  1 08:49:25 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 1 Jul 2024 08:49:25 GMT
Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well
 [v6]
In-Reply-To: <exEarPhFo6CWfoL6H8P8Ydrdy5XYSfop67bRl9yuzro=.2c5f2707-f786-4991-8595-f1fb04626b3a@github.com>
References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
 <NSOakR2gz_n5Nlsthxp9d3h-1PCriKR89bLyhsvOfJ4=.b3af7ba9-f3f2-47c0-98a9-6f106416a0c6@github.com>
 <CrgtP7a89N6V7kzI-0jMcIGPJCpRJXIBcKupHKIGSb0=.999fdf6e-8231-45e8-8d25-35dcfd8fa6df@github.com>
 <exEarPhFo6CWfoL6H8P8Ydrdy5XYSfop67bRl9yuzro=.2c5f2707-f786-4991-8595-f1fb04626b3a@github.com>
Message-ID: <YgXbsHRi1Zpf8q0vu1J-tricRWTkep-6Nz5pqxgWTj8=.1914f498-15ba-4380-93b8-57aed13cbe25@github.com>

On Sun, 30 Jun 2024 15:34:02 GMT, Amit Kumar <amitkumar at openjdk.org> wrote:

>> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3243:
>> 
>>> 3241:   // Get the first array index that can contain super_klass.
>>> 3242:   if (bit != 0) {
>>> 3243:     pop_count_long(r_array_index, r_array_index, Z_R1_scratch); // all the registers are hardcoded so should be fine
>> 
>> This comment is also rather baffling. You seem to be concerned about something, but what? `pop_count_long` doesn't cause any particular risk, does it?
>
> For machines older than `Z15`, `pop_count_long` clobbers `Z_R1_scratch` register.  That's why I added it there.

Better then just to say what matters: "NB: May clobber Z_R1_scratch" or "Clobbers Z_R1_scratch on older machines."
"Should be fine" is just confusing because "should" is tentative, where you need to be certain in comments.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660691384

From aph at openjdk.org  Mon Jul  1 08:49:26 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 1 Jul 2024 08:49:26 GMT
Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well
 [v4]
In-Reply-To: <M6NsxpbSTah5LZYXLqez-2Y0_rebkaoyIfp1W_if-Ko=.d47e435a-338f-463a-b810-29a63cd04510@github.com>
References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
 <VUjXfo6HMhmcuR_vkWvbTRGqlbcX5E5W37WrkUxN9L8=.48ff6757-8d99-44f1-98fd-ce7d29c3c7db@github.com>
 <M6NsxpbSTah5LZYXLqez-2Y0_rebkaoyIfp1W_if-Ko=.d47e435a-338f-463a-b810-29a63cd04510@github.com>
Message-ID: <OLkjLuRRnaThpE2TIXSckXwFKkx94sNYK5fDQylLvvU=.57f5aedc-3e6f-4967-9b07-9693a7ee417c@github.com>

On Mon, 24 Jun 2024 14:02:15 GMT, Lutz Schmidt <lucy at openjdk.org> wrote:

>> Amit Kumar has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp
>>  - rename: r_scratch to r_result in repne_scan method
>
> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3275:
> 
>> 3273:   call_stub(StubRoutines::lookup_secondary_supers_table_slow_path_stub());
>> 3274: 
>> 3275:   z_bru(L_done); // pass whatever result we got from a slow path
> 
> This one branch could be saved by using "load immediate on condition". But it's after slow path processing.

As @RealLucy says, this is after slow processing. It's not worth optimizing here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660693537

From sgehwolf at openjdk.org  Mon Jul  1 08:50:31 2024
From: sgehwolf at openjdk.org (Severin Gehwolf)
Date: Mon, 1 Jul 2024 08:50:31 GMT
Subject: Integrated: 8261242: [Linux] OSContainer::is_containerized() returns
 true when run outside a container
In-Reply-To: <ElCWVtphfdgi651yIzNsfJS3C5ewhpvOH9ZXjqt3PFE=.de11a3f2-c2fd-4cc4-8c6d-e20f9c8de03d@github.com>
References: <ElCWVtphfdgi651yIzNsfJS3C5ewhpvOH9ZXjqt3PFE=.de11a3f2-c2fd-4cc4-8c6d-e20f9c8de03d@github.com>
Message-ID: <0hEDyLmsRgW-GR23fKnixv3_5edApaB_eoEQ2D_28NU=.32f148cd-e8a2-4bef-b8bc-44ec7cc0dbd0@github.com>

On Mon, 11 Mar 2024 16:55:36 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

> Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line:
> 
> 
> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present
> 
> 
> This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example:
> 
> 
> java -XshowSettings:system --version
> Operating System Metrics:
>     Provider: cgroupv1
>     System not containerized.
> openjdk 23-internal 2024-09-17
> OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk)
> OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)
> 
> 
> The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed.
> 
> Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system.
> 
> Testing:
> 
> - [x] GHA (risc-v failure seems infra related)
> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests)
> - [x] Some manual testing using cri-o
> 
> Thoughts?

This pull request has now been integrated.

Changeset: 0a6ffa57
Author:    Severin Gehwolf <sgehwolf at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/0a6ffa57954ddf4f92205205a5a1bada813d127a
Stats:     411 lines in 20 files changed: 305 ins; 79 del; 27 mod

8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container

Reviewed-by: stuefe, iklam

-------------

PR: https://git.openjdk.org/jdk/pull/18201

From aph at openjdk.org  Mon Jul  1 08:56:22 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 1 Jul 2024 08:56:22 GMT
Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well
 [v13]
In-Reply-To: <LeHv1fkxfIoNdN5KHLJzdI2RPwVd2ty54CdOVcr3Ahs=.789598fd-4c45-4bb3-b848-661ba1cfe5e6@github.com>
References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
 <LeHv1fkxfIoNdN5KHLJzdI2RPwVd2ty54CdOVcr3Ahs=.789598fd-4c45-4bb3-b848-661ba1cfe5e6@github.com>
Message-ID: <c7ni2RO1HmDd-LclUqpqkiuXzXRk3SZ9a8_ZH67DlVw=.4da466e8-916d-4af3-a6ae-e2de177029bf@github.com>

On Sun, 30 Jun 2024 15:56:48 GMT, Amit Kumar <amitkumar at openjdk.org> wrote:

>> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450)
>> 
>> I ran `tier1` test with  `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. 
>> 
>> 
>> Without Patch: 
>> 
>> SecondarySuperCacheHits.test  avgt   15  0.929 ? 0.010  ns/op
>> 
>> SecondarySuperCacheInterContention.test     avgt   15  1.413 ? 0.007  ns/op
>> SecondarySuperCacheInterContention.test:t1  avgt   15  1.415 ? 0.016  ns/op
>> SecondarySuperCacheInterContention.test:t2  avgt   15  1.410 ? 0.017  ns/op
>> 
>> Benchmark                             Mode  Cnt   Score   Error  Units
>> SecondarySupersLookup.testNegative00  avgt   15   1.806 ? 0.325  ns/op
>> SecondarySupersLookup.testNegative01  avgt   15   2.364 ? 0.236  ns/op
>> SecondarySupersLookup.testNegative02  avgt   15   2.903 ? 0.215  ns/op
>> SecondarySupersLookup.testNegative03  avgt   15   3.417 ? 0.199  ns/op
>> SecondarySupersLookup.testNegative04  avgt   15   3.758 ? 0.102  ns/op
>> SecondarySupersLookup.testNegative05  avgt   15   4.352 ? 0.123  ns/op
>> SecondarySupersLookup.testNegative06  avgt   15   4.800 ? 0.099  ns/op
>> SecondarySupersLookup.testNegative07  avgt   15   5.365 ? 0.060  ns/op
>> SecondarySupersLookup.testNegative08  avgt   15   6.316 ? 0.092  ns/op
>> SecondarySupersLookup.testNegative09  avgt   15   6.669 ? 0.164  ns/op
>> SecondarySupersLookup.testNegative10  avgt   15   7.041 ? 0.164  ns/op
>> SecondarySupersLookup.testNegative16  avgt   15   9.336 ? 0.185  ns/op
>> SecondarySupersLookup.testNegative20  avgt   15  11.373 ? 0.029  ns/op
>> SecondarySupersLookup.testNegative30  avgt   15  15.236 ? 0.051  ns/op
>> SecondarySupersLookup.testNegative32  avgt   15  16.031 ? 0.091  ns/op
>> SecondarySupersLookup.testNegative40  avgt   15  19.197 ? 0.279  ns/op
>> SecondarySupersLookup.testNegative50  avgt   15  23.804 ? 2.387  ns/op
>> SecondarySupersLookup.testNegative55  avgt   15  25.610 ? 1.155  ns/op
>> SecondarySupersLookup.testNegative56  avgt   15  26.128 ? 2.203  ns/op
>> SecondarySupersLookup.testNegative57  avgt   15  26.126 ? 0.881  ns/op
>> SecondarySupersLookup.testNegative58  avgt   15  26.314 ? 0.521  ns/op
>> SecondarySupersLookup.testNegative59  avgt   15  26.750 ? 0.837  ns/op
>> SecondarySupersLookup.testNegative60  avgt   15  27.118 ? 0.557 ...
>
> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision:
> 
>   removed unnecessary checks

src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3228:

> 3226: 
> 3227:   // load 0 in r_result by default. In case search fails, r_result will be loaded
> 3228:   // with value 1 (failure) at the end of this method.

Suggestion:

  // Initialize r_result with 0 (indicating success). If searching fails, r_result will be loaded
  // with 1 (failure) at the end of this method.

src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3331:

> 3329:   // wrapped around the end of the array.
> 3330: 
> 3331:   { // This is conventional linear probing, but instead of terminating,

Suggestion:

  { // This is conventional linear probing, but instead of terminating

src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3344:

> 3342: 
> 3343:     // We should only reach here after having found a bit in the bitmap.
> 3344:     // Invariant: array_length == popcount(bitmap)

It turns out this invariant isn't true. When `array_length` is >= 63 we set `SECONDARY_SUPERS_BITMAP_FULL`, for performance reasons.

Suggestion:

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660697339
PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660699570
PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660702697

From sgehwolf at openjdk.org  Mon Jul  1 08:50:29 2024
From: sgehwolf at openjdk.org (Severin Gehwolf)
Date: Mon, 1 Jul 2024 08:50:29 GMT
Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true
 when run outside a container [v8]
In-Reply-To: <qwrLCWWzsXHeQy6jM21G7MSXxKroMi-rpUHhk-KCgfc=.ff4e4746-72b3-496f-bd57-4526858e2e31@github.com>
References: <ElCWVtphfdgi651yIzNsfJS3C5ewhpvOH9ZXjqt3PFE=.de11a3f2-c2fd-4cc4-8c6d-e20f9c8de03d@github.com>
 <qwrLCWWzsXHeQy6jM21G7MSXxKroMi-rpUHhk-KCgfc=.ff4e4746-72b3-496f-bd57-4526858e2e31@github.com>
Message-ID: <x8LUWka1p6cXsWbPjJWo0OmtwtBBrQ0OOMx86EbgGNg=.9aaa99cb-1026-4eec-987a-f4683a7bb6ca@github.com>

On Fri, 28 Jun 2024 15:41:48 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>> Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line:
>> 
>> 
>> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present
>> 
>> 
>> This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example:
>> 
>> 
>> java -XshowSettings:system --version
>> Operating System Metrics:
>>     Provider: cgroupv1
>>     System not containerized.
>> openjdk 23-internal 2024-09-17
>> OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk)
>> OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)
>> 
>> 
>> The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed.
>> 
>> Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system.
>> 
>> Testing:
>> 
>> - [x] GHA (risc-v failure seems infra related)
>> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests)
>> - [x] Some manual testing using cri-o
>> 
>> Thoughts?
>
> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits:
> 
>  - Merge branch 'master' into jdk-8261242-is-containerized-fix
>  - Refactor mount info matching to helper function
>  - Merge branch 'master' into jdk-8261242-is-containerized-fix
>  - Remove problem listing of PlainRead which is reworked here
>  - Merge branch 'master' into jdk-8261242-is-containerized-fix
>  - Merge branch 'master' into jdk-8261242-is-containerized-fix
>  - Add doc for mountinfo scanning.
>  - Unify naming of variables
>  - Merge branch 'master' into jdk-8261242-is-containerized-fix
>  - Merge branch 'master' into jdk-8261242-is-containerized-fix
>  - ... and 8 more: https://git.openjdk.org/jdk/compare/486aa11e...1017da35

Thank you for the review!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18201#issuecomment-2199581201

From aboldtch at openjdk.org  Mon Jul  1 09:27:50 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 1 Jul 2024 09:27:50 GMT
Subject: RFR: 8335397: Improve reliability of TestRecursiveMonitorChurn.java
Message-ID: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>

TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation.

Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT.

-------------

Commit messages:
 - 8335397: Improve reliability of TestRecursiveMonitorChurn.java

Changes: https://git.openjdk.org/jdk/pull/19965/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19965&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335397
  Stats: 77 lines in 5 files changed: 28 ins; 31 del; 18 mod
  Patch: https://git.openjdk.org/jdk/pull/19965.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19965/head:pull/19965

PR: https://git.openjdk.org/jdk/pull/19965

From amitkumar at openjdk.org  Mon Jul  1 09:31:56 2024
From: amitkumar at openjdk.org (Amit Kumar)
Date: Mon, 1 Jul 2024 09:31:56 GMT
Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well
 [v14]
In-Reply-To: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
Message-ID: <PLAQttD3MSbh0TfJ-ae3bRziCW5anWeO_1Ajzy39mDs=.2068ea9a-e248-4a75-b0d8-a5486b069fe1@github.com>

> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450)
> 
> I ran `tier1` test with  `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. 
> 
> 
> Without Patch: 
> 
> SecondarySuperCacheHits.test  avgt   15  0.929 ? 0.010  ns/op
> 
> SecondarySuperCacheInterContention.test     avgt   15  1.413 ? 0.007  ns/op
> SecondarySuperCacheInterContention.test:t1  avgt   15  1.415 ? 0.016  ns/op
> SecondarySuperCacheInterContention.test:t2  avgt   15  1.410 ? 0.017  ns/op
> 
> Benchmark                             Mode  Cnt   Score   Error  Units
> SecondarySupersLookup.testNegative00  avgt   15   1.806 ? 0.325  ns/op
> SecondarySupersLookup.testNegative01  avgt   15   2.364 ? 0.236  ns/op
> SecondarySupersLookup.testNegative02  avgt   15   2.903 ? 0.215  ns/op
> SecondarySupersLookup.testNegative03  avgt   15   3.417 ? 0.199  ns/op
> SecondarySupersLookup.testNegative04  avgt   15   3.758 ? 0.102  ns/op
> SecondarySupersLookup.testNegative05  avgt   15   4.352 ? 0.123  ns/op
> SecondarySupersLookup.testNegative06  avgt   15   4.800 ? 0.099  ns/op
> SecondarySupersLookup.testNegative07  avgt   15   5.365 ? 0.060  ns/op
> SecondarySupersLookup.testNegative08  avgt   15   6.316 ? 0.092  ns/op
> SecondarySupersLookup.testNegative09  avgt   15   6.669 ? 0.164  ns/op
> SecondarySupersLookup.testNegative10  avgt   15   7.041 ? 0.164  ns/op
> SecondarySupersLookup.testNegative16  avgt   15   9.336 ? 0.185  ns/op
> SecondarySupersLookup.testNegative20  avgt   15  11.373 ? 0.029  ns/op
> SecondarySupersLookup.testNegative30  avgt   15  15.236 ? 0.051  ns/op
> SecondarySupersLookup.testNegative32  avgt   15  16.031 ? 0.091  ns/op
> SecondarySupersLookup.testNegative40  avgt   15  19.197 ? 0.279  ns/op
> SecondarySupersLookup.testNegative50  avgt   15  23.804 ? 2.387  ns/op
> SecondarySupersLookup.testNegative55  avgt   15  25.610 ? 1.155  ns/op
> SecondarySupersLookup.testNegative56  avgt   15  26.128 ? 2.203  ns/op
> SecondarySupersLookup.testNegative57  avgt   15  26.126 ? 0.881  ns/op
> SecondarySupersLookup.testNegative58  avgt   15  26.314 ? 0.521  ns/op
> SecondarySupersLookup.testNegative59  avgt   15  26.750 ? 0.837  ns/op
> SecondarySupersLookup.testNegative60  avgt   15  27.118 ? 0.557  ns/op
> SecondarySupersLookup.testNegative61  avgt   15  27.763 ? 1.628  ns...

Amit Kumar has updated the pull request incrementally with three additional commits since the last revision:

 - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp
   
   Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com>
 - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp
   
   Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com>
 - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp
   
   Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19544/files
  - new: https://git.openjdk.org/jdk/pull/19544/files/98a8f5ad..6df0922f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=13
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=12-13

  Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/19544.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544

PR: https://git.openjdk.org/jdk/pull/19544

From amitkumar at openjdk.org  Mon Jul  1 09:52:48 2024
From: amitkumar at openjdk.org (Amit Kumar)
Date: Mon, 1 Jul 2024 09:52:48 GMT
Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well
 [v15]
In-Reply-To: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
Message-ID: <B3QnMNkIaw59FI-SRMX1R31gCni7ibyWczmIKYZlD2Y=.c0001db2-1a44-497e-aa9f-4f62147b41f5@github.com>

> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450)
> 
> I ran `tier1` test with  `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. 
> 
> 
> Without Patch: 
> 
> SecondarySuperCacheHits.test  avgt   15  0.929 ? 0.010  ns/op
> 
> SecondarySuperCacheInterContention.test     avgt   15  1.413 ? 0.007  ns/op
> SecondarySuperCacheInterContention.test:t1  avgt   15  1.415 ? 0.016  ns/op
> SecondarySuperCacheInterContention.test:t2  avgt   15  1.410 ? 0.017  ns/op
> 
> Benchmark                             Mode  Cnt   Score   Error  Units
> SecondarySupersLookup.testNegative00  avgt   15   1.806 ? 0.325  ns/op
> SecondarySupersLookup.testNegative01  avgt   15   2.364 ? 0.236  ns/op
> SecondarySupersLookup.testNegative02  avgt   15   2.903 ? 0.215  ns/op
> SecondarySupersLookup.testNegative03  avgt   15   3.417 ? 0.199  ns/op
> SecondarySupersLookup.testNegative04  avgt   15   3.758 ? 0.102  ns/op
> SecondarySupersLookup.testNegative05  avgt   15   4.352 ? 0.123  ns/op
> SecondarySupersLookup.testNegative06  avgt   15   4.800 ? 0.099  ns/op
> SecondarySupersLookup.testNegative07  avgt   15   5.365 ? 0.060  ns/op
> SecondarySupersLookup.testNegative08  avgt   15   6.316 ? 0.092  ns/op
> SecondarySupersLookup.testNegative09  avgt   15   6.669 ? 0.164  ns/op
> SecondarySupersLookup.testNegative10  avgt   15   7.041 ? 0.164  ns/op
> SecondarySupersLookup.testNegative16  avgt   15   9.336 ? 0.185  ns/op
> SecondarySupersLookup.testNegative20  avgt   15  11.373 ? 0.029  ns/op
> SecondarySupersLookup.testNegative30  avgt   15  15.236 ? 0.051  ns/op
> SecondarySupersLookup.testNegative32  avgt   15  16.031 ? 0.091  ns/op
> SecondarySupersLookup.testNegative40  avgt   15  19.197 ? 0.279  ns/op
> SecondarySupersLookup.testNegative50  avgt   15  23.804 ? 2.387  ns/op
> SecondarySupersLookup.testNegative55  avgt   15  25.610 ? 1.155  ns/op
> SecondarySupersLookup.testNegative56  avgt   15  26.128 ? 2.203  ns/op
> SecondarySupersLookup.testNegative57  avgt   15  26.126 ? 0.881  ns/op
> SecondarySupersLookup.testNegative58  avgt   15  26.314 ? 0.521  ns/op
> SecondarySupersLookup.testNegative59  avgt   15  26.750 ? 0.837  ns/op
> SecondarySupersLookup.testNegative60  avgt   15  27.118 ? 0.557  ns/op
> SecondarySupersLookup.testNegative61  avgt   15  27.763 ? 1.628  ns...

Amit Kumar has updated the pull request incrementally with one additional commit since the last revision:

  updates comment

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19544/files
  - new: https://git.openjdk.org/jdk/pull/19544/files/6df0922f..6d05364f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=14
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=13-14

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/19544.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544

PR: https://git.openjdk.org/jdk/pull/19544

From aboldtch at openjdk.org  Mon Jul  1 10:27:23 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 1 Jul 2024 10:27:23 GMT
Subject: [jdk23] RFR: 8326820: Metadata artificially kept alive
In-Reply-To: <9401T6FMpCnxvfgCCxHR-7-wEcwchAqf_ETKFbQSXg0=.096ce771-f64a-4e41-bb3a-94a1b232965c@github.com>
References: <9401T6FMpCnxvfgCCxHR-7-wEcwchAqf_ETKFbQSXg0=.096ce771-f64a-4e41-bb3a-94a1b232965c@github.com>
Message-ID: <zwf_jzY4OB6ioEcaMEeNUv7XIVO6vHXpycxJuD7pkks=.e175bd1b-3475-4a7e-8f0b-774d81bd5c3d@github.com>

On Thu, 27 Jun 2024 14:30:43 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> Hi all,
> 
> This pull request contains a backport of commit [5909d541](https://github.com/openjdk/jdk/commit/5909d54147355dd7da5786ff39ead4c15816705c) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository.
> 
> The commit being backported was authored by Axel Boldt-Christmas on 27 Jun 2024 and was reviewed by Erik ?sterlund, Stefan Karlsson and Coleen Phillimore.
> 
> Thanks!

Thanks for the review. This has been running through the JDK 24 CI over the weekend. No issues found.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19929#issuecomment-2199778367

From aboldtch at openjdk.org  Mon Jul  1 10:27:24 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 1 Jul 2024 10:27:24 GMT
Subject: [jdk23] Integrated: 8326820: Metadata artificially kept alive
In-Reply-To: <9401T6FMpCnxvfgCCxHR-7-wEcwchAqf_ETKFbQSXg0=.096ce771-f64a-4e41-bb3a-94a1b232965c@github.com>
References: <9401T6FMpCnxvfgCCxHR-7-wEcwchAqf_ETKFbQSXg0=.096ce771-f64a-4e41-bb3a-94a1b232965c@github.com>
Message-ID: <ms7RUHs4I44jaHAR2lbrj4vU-wIoT65NDeFUYb1zZxw=.0cd04f0d-83cb-4bcd-af2f-2f43a481ad11@github.com>

On Thu, 27 Jun 2024 14:30:43 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> Hi all,
> 
> This pull request contains a backport of commit [5909d541](https://github.com/openjdk/jdk/commit/5909d54147355dd7da5786ff39ead4c15816705c) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository.
> 
> The commit being backported was authored by Axel Boldt-Christmas on 27 Jun 2024 and was reviewed by Erik ?sterlund, Stefan Karlsson and Coleen Phillimore.
> 
> Thanks!

This pull request has now been integrated.

Changeset: e5fbc631
Author:    Axel Boldt-Christmas <aboldtch at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/e5fbc631ca06b40a682149b0903221e190f592aa
Stats:     80 lines in 6 files changed: 31 ins; 25 del; 24 mod

8326820: Metadata artificially kept alive

Reviewed-by: stefank
Backport-of: 5909d54147355dd7da5786ff39ead4c15816705c

-------------

PR: https://git.openjdk.org/jdk/pull/19929

From rkennke at openjdk.org  Mon Jul  1 12:17:18 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Mon, 1 Jul 2024 12:17:18 GMT
Subject: RFR: 8335397: Improve reliability of
 TestRecursiveMonitorChurn.java
In-Reply-To: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
Message-ID: <Trc5NZuM5oyxoD_2DUhYNsGNudFbt-7m7A7jAD6FILQ=.9ffd6838-474c-4246-85f9-79769b59aecd@github.com>

On Mon, 1 Jul 2024 09:21:13 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation.
> 
> Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT.

Looks good to me! Thank you!

-------------

Marked as reviewed by rkennke (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19965#pullrequestreview-2151229563

From coleenp at openjdk.org  Mon Jul  1 12:19:21 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Mon, 1 Jul 2024 12:19:21 GMT
Subject: [jdk23] RFR: 8333542: Breakpoint in parallel code does not work
In-Reply-To: <PwvQso402YjRkww_XCi4EjEzV0YCe7WtVRnvCIRKpQo=.f8f95d7a-fa5c-4322-8375-17eff213e073@github.com>
References: <PwvQso402YjRkww_XCi4EjEzV0YCe7WtVRnvCIRKpQo=.f8f95d7a-fa5c-4322-8375-17eff213e073@github.com>
Message-ID: <ssQ1swZKVUBp1gwCQHH8rnp6OFZ5w8mHeTiU7BYeZk0=.5db1bfe3-448b-414e-abe1-56aa14659e1e@github.com>

On Fri, 28 Jun 2024 12:14:55 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> Clean backport of JDK-8333542.  After this, we need a backport for JDK-8335134 to fix the test.

Thank you Chris.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19938#issuecomment-2199990801

From coleenp at openjdk.org  Mon Jul  1 12:19:22 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Mon, 1 Jul 2024 12:19:22 GMT
Subject: [jdk23] Integrated: 8333542: Breakpoint in parallel code does not work
In-Reply-To: <PwvQso402YjRkww_XCi4EjEzV0YCe7WtVRnvCIRKpQo=.f8f95d7a-fa5c-4322-8375-17eff213e073@github.com>
References: <PwvQso402YjRkww_XCi4EjEzV0YCe7WtVRnvCIRKpQo=.f8f95d7a-fa5c-4322-8375-17eff213e073@github.com>
Message-ID: <cp0Fet3QgVnMzTlRFx1CXA5UkTDNomotKUIDfhjWDiQ=.835e462f-8651-467a-af48-084dfa3f3f1c@github.com>

On Fri, 28 Jun 2024 12:14:55 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> Clean backport of JDK-8333542.  After this, we need a backport for JDK-8335134 to fix the test.

This pull request has now been integrated.

Changeset: 7040de19
Author:    Coleen Phillimore <coleenp at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/7040de19bdb29a3abacf2a39b7c7c30a07c61135
Stats:     516 lines in 16 files changed: 339 ins; 129 del; 48 mod

8333542: Breakpoint in parallel code does not work

Reviewed-by: cjplummer
Backport-of: b3bf31a0a08da679ec2fd21613243fb17b1135a9

-------------

PR: https://git.openjdk.org/jdk/pull/19938

From ayang at openjdk.org  Mon Jul  1 12:33:20 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Mon, 1 Jul 2024 12:33:20 GMT
Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing [v2]
In-Reply-To: <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
 <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com>
Message-ID: <_RvHrV4nbtsAGnJq9S_98XOXciT71gMb6DfbCr6WVC0=.1278784f-b6c5-4d5e-92e6-fa21db95bc51@github.com>

On Fri, 28 Jun 2024 19:22:32 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> See JBS issue.
>> 
>> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure.
>> 
>> The patch:
>> - exposes os::available_memory via Whitebox
>> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException`
>> 
>> I have some misgivings about this solution, though:
>> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. 
>> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions)
>> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException.
>> 
>> Despite my doubts, I think this is the best we can come up with if we want to have such a test.
>> 
>> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice.
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update TestAlwaysPreTouchBehavior.java

test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 101:

> 99:  * @build jdk.test.whitebox.WhiteBox
> 100:  * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox
> 101:  * @run main/othervm -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -Xmx64m gc.TestAlwaysPreTouchBehavior -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC

Why the explicit `-Xmx64m`? As I understand this is essentially the launcher, whose heap-size is of little importance.
Also, why does the launch require `WhiteBoxAPI`?

test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 161:

> 159:         System.out.println("RSS: " + rss + " available: " + avail + " committed " + committed);
> 160: 
> 161:         if (args[0].equals("run")) {

When will this branch be taken? I can't find where the `run` arg is specified.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1660980458
PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1660970079

From stuefe at openjdk.org  Mon Jul  1 12:39:20 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Mon, 1 Jul 2024 12:39:20 GMT
Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing [v2]
In-Reply-To: <_RvHrV4nbtsAGnJq9S_98XOXciT71gMb6DfbCr6WVC0=.1278784f-b6c5-4d5e-92e6-fa21db95bc51@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
 <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com>
 <_RvHrV4nbtsAGnJq9S_98XOXciT71gMb6DfbCr6WVC0=.1278784f-b6c5-4d5e-92e6-fa21db95bc51@github.com>
Message-ID: <hxa4YcsrUexAD_cqjQV9kz556tbr62l25sEdAKJf9hk=.3c1f219b-582b-46af-89c1-0b3dbcdc2c16@github.com>

On Mon, 1 Jul 2024 12:30:31 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

> Why the explicit -Xmx64m? As I understand this is essentially the launcher, whose heap-size is of little importance.

No particular reason, just don't like launchers to use large heaps. I can remove it.

> Also, why does the launch require WhiteBoxAPI?

Because the launcher needs to access hostAvailableMemory in order to decide before starting the test whether it makes sense to start the test.

> test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 161:
> 
>> 159:         System.out.println("RSS: " + rss + " available: " + avail + " committed " + committed);
>> 160: 
>> 161:         if (args[0].equals("run")) {
> 
> When will this branch be taken? I can't find where the `run` arg is specified.

See line 141

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1660988240
PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1660985382

From pchilanomate at openjdk.org  Mon Jul  1 12:45:27 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Mon, 1 Jul 2024 12:45:27 GMT
Subject: RFR: 8329665: fatal error: memory leak: allocating without
 ResourceMark [v2]
In-Reply-To: <c7P6CcVzkWEXcWmPSaJs7_vigReuBsGop5J-Tr5AQDY=.ef8428fe-8951-4804-a81d-7e02d05564ca@github.com>
References: <riJhOUThyVVgaGs098dEOrDTvXnAOjxlK48aA-yE_2s=.a235716d-5221-46c0-bd0b-ef0cf7ab9ccf@github.com>
 <BytvXQLrd23w2e_K_YnaTbAI5-HpFSq_loPL3sA33NQ=.ffce43c4-d601-49e3-a013-d741baec9852@github.com>
 <c7P6CcVzkWEXcWmPSaJs7_vigReuBsGop5J-Tr5AQDY=.ef8428fe-8951-4804-a81d-7e02d05564ca@github.com>
Message-ID: <6Y8MH5mOmcBplADeqhAWSYRd3JA2Yul7UjA8D3cc-5Y=.144cefac-7c3b-4cfa-b40a-d7862282a614@github.com>

On Thu, 27 Jun 2024 13:56:15 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   take ResourceMark out of debug only
>
> I am a bit concerned about this fix. Introducing an RM into `frame::oops_interpreted_do` means we cannot assemble anything in RA in the closure code and keep the memory across the RM. But closure code is opaque to the iteration site. Do we have any safeguards against OopClosure using and retaining RA memory? (Because even if no closure does this today, this could sneak in easily)

@tstuefe I added checks and I found there is one case in JFR code where we can allocate and retain memory from the resource area across the closure (https://github.com/openjdk/jdk/blob/d9bcf061450ebfb7fe02b5a50c855db1d9178e5d/src/hotspot/share/jfr/leakprofiler/checkpoint/objectSampleWriter.cpp#L462). It can be triggered by running tests in jdk/jfr/event/oldobject.

There are a couple of options I can think of here:

1 - Fix this case and add a safeguard to prevent this allocations. Maybe have some ResetRM object before mask.iterate_oop to reset the nesting in the RA on construction and then restore it on destruction.
2 - Allocate the _bit_mask in the C heap instead. There is a comment in InterpreterOopMap::resource_copy() that this has a significant performance cost. But we are already allocating an OopMapCacheEntry from the C heap, and this allocation for the bit_mask should be a rare case so I doubt this.
3 - Have the callers of frame::oops_interpreted_do() declare the ResourceMark instead. I wanted to avoid this because this is an implementation detail of InterpreterOopMap.
4 - Restore the code as it was before this change and add an object before NEW_RESOURCE_ARRAY(...) in InterpreterOopMap::resource_copy() that increments the nesting in the RA on construction and decrements it on destruction, i.e basically mark this particular allocation as okay without a RM since we check for it in ~InterpreterOopMap.

What do you think? I don't actually see a particular reason to disallow this allocations so I'm now not convinced on 1.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18632#issuecomment-2200044049

From stuefe at openjdk.org  Mon Jul  1 13:08:26 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Mon, 1 Jul 2024 13:08:26 GMT
Subject: RFR: 8329665: fatal error: memory leak: allocating without
 ResourceMark [v2]
In-Reply-To: <6Y8MH5mOmcBplADeqhAWSYRd3JA2Yul7UjA8D3cc-5Y=.144cefac-7c3b-4cfa-b40a-d7862282a614@github.com>
References: <riJhOUThyVVgaGs098dEOrDTvXnAOjxlK48aA-yE_2s=.a235716d-5221-46c0-bd0b-ef0cf7ab9ccf@github.com>
 <BytvXQLrd23w2e_K_YnaTbAI5-HpFSq_loPL3sA33NQ=.ffce43c4-d601-49e3-a013-d741baec9852@github.com>
 <c7P6CcVzkWEXcWmPSaJs7_vigReuBsGop5J-Tr5AQDY=.ef8428fe-8951-4804-a81d-7e02d05564ca@github.com>
 <6Y8MH5mOmcBplADeqhAWSYRd3JA2Yul7UjA8D3cc-5Y=.144cefac-7c3b-4cfa-b40a-d7862282a614@github.com>
Message-ID: <NR6CysHCjWX2AB1k8ccdDh5kHyNqKexk-oU5z88cNDE=.6f10c851-936c-4482-aefc-fddaefd213c6@github.com>

On Mon, 1 Jul 2024 12:42:44 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> I am a bit concerned about this fix. Introducing an RM into `frame::oops_interpreted_do` means we cannot assemble anything in RA in the closure code and keep the memory across the RM. But closure code is opaque to the iteration site. Do we have any safeguards against OopClosure using and retaining RA memory? (Because even if no closure does this today, this could sneak in easily)
>
> @tstuefe I added checks and I found there is one case in JFR code where we can allocate and retain memory from the resource area across the closure (https://github.com/openjdk/jdk/blob/d9bcf061450ebfb7fe02b5a50c855db1d9178e5d/src/hotspot/share/jfr/leakprofiler/checkpoint/objectSampleWriter.cpp#L462). It can be triggered by running tests in jdk/jfr/event/oldobject.
> 
> There are a couple of options I can think of here:
> 
> 1 - Fix this case and add a safeguard to prevent this allocations. Maybe have some ResetRM object before mask.iterate_oop to reset the nesting in the RA on construction and then restore it on destruction.
> 2 - Allocate the _bit_mask in the C heap instead. There is a comment in InterpreterOopMap::resource_copy() that this has a significant performance cost. But we are already allocating an OopMapCacheEntry from the C heap, and this allocation for the bit_mask should be a rare case so I doubt this.
> 3 - Have the callers of frame::oops_interpreted_do() declare the ResourceMark instead. I wanted to avoid this because this is an implementation detail of InterpreterOopMap.
> 4 - Restore the code as it was before this change and add an object before NEW_RESOURCE_ARRAY(...) in InterpreterOopMap::resource_copy() that increments the nesting in the RA on construction and decrements it on destruction, i.e basically mark this particular allocation as okay without a RM since we check for it in ~InterpreterOopMap.
> 
> What do you think? I don't actually see a particular reason to disallow this allocations so I'm now not convinced on 1.

Hi @pchilano 

> @tstuefe I added checks and I found there is one case in JFR code where we can allocate and retain memory from the resource area across the closure (
> 
> https://github.com/openjdk/jdk/blob/d9bcf061450ebfb7fe02b5a50c855db1d9178e5d/src/hotspot/share/jfr/leakprofiler/checkpoint/objectSampleWriter.cpp#L462
> ). It can be triggered by running tests in jdk/jfr/event/oldobject.
> 

Yes, I was afraid of something like that. Please open a follow-up bug, and chain it to 8329665, since the fix has already been downported to JDK21. We should fix this before the October update. I added a little comment to 8329665 to ensure this does not get lost.

> There are a couple of options I can think of here:
> 
> 1 - Fix this case and add a safeguard to prevent this allocations. Maybe have some ResetRM object before mask.iterate_oop to reset the nesting in the RA on construction and then restore it on destruction. 

Safeguarding is not so easy. You could observe the RA state at the end and compare it with the beginning, but the code may use RM in a benign way. 

> 2 - Allocate the _bit_mask in the C heap instead. There is a comment in InterpreterOopMap::resource_copy() that this has a significant performance cost. But we are already allocating an OopMapCacheEntry from the C heap, and this allocation for the bit_mask should be a rare case so I doubt this.

I was thinking along of this line.

The performance comment makes sense for lookup, but not so much here. Plus, I would wager this case is rare anyway since the InterpreterOopMap can hold what, (4*64)/2bit = 128 entries?

About the other comment (` // Due to the invariants above it's tricky to allocate a temporary OopMapCacheEntry on the stack`), that is weird. I don't get it. It is ages old (comes from initial commit), so you may be more able than I to dig out the history behind it. I tried to just allocate an InterpreterOopMap entry on the stack. I did a small test, nothing bad happened.

So, if this comment is not up to date, you may pay for the C-heap allocation by removing this allocation :)

> 3 - Have the callers of frame::oops_interpreted_do() declare the ResourceMark instead. I wanted to avoid this because this is an implementation detail of InterpreterOopMap. 

Yes, I don't like this either.

> 4 - Restore the code as it was before this change and add an object before NEW_RESOURCE_ARRAY(...) in InterpreterOopMap::resource_copy() that increments the nesting in the RA on construction and decrements it on destruction, i.e basically mark this particular allocation as okay without a RM since we check for it in ~InterpreterOopMap.

Not a fan, overly complicated.

> 
> What do you think? I don't actually see a particular reason to disallow this allocations so I'm now not convinced on 1.

I prefer 2.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18632#issuecomment-2200093475

From ayang at openjdk.org  Mon Jul  1 13:13:21 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Mon, 1 Jul 2024 13:13:21 GMT
Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing [v2]
In-Reply-To: <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
 <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com>
Message-ID: <GXA1RAdCoZKi_8VnJ9qtOsqPCuuyhO-gfjfnlvW04d8=.17a66849-d08e-4e71-ba1d-0db0c742e691@github.com>

On Fri, 28 Jun 2024 19:22:32 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> See JBS issue.
>> 
>> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure.
>> 
>> The patch:
>> - exposes os::available_memory via Whitebox
>> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException`
>> 
>> I have some misgivings about this solution, though:
>> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. 
>> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions)
>> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException.
>> 
>> Despite my doubts, I think this is the best we can come up with if we want to have such a test.
>> 
>> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice.
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update TestAlwaysPreTouchBehavior.java

Some readability suggestions.

test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 164:

> 162:             if (rss < committed) {
> 163:                 if (avail < requiredAvailableDuring) {
> 164:                     throw new SkippedException("Not enough memory for this  test (" + avail + ")");

This is essentially an early-return; why is this inside the `rss < committed` comparison? Does it work if it's lifted up? The structure I have in mind is like:


if (avail < ....) {
  skip-test;
}
assert(rss >= committed, error-msg);

-------------

Marked as reviewed by ayang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19803#pullrequestreview-2151337452
PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1661031216

From ayang at openjdk.org  Mon Jul  1 13:13:22 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Mon, 1 Jul 2024 13:13:22 GMT
Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing [v2]
In-Reply-To: <hxa4YcsrUexAD_cqjQV9kz556tbr62l25sEdAKJf9hk=.3c1f219b-582b-46af-89c1-0b3dbcdc2c16@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
 <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com>
 <_RvHrV4nbtsAGnJq9S_98XOXciT71gMb6DfbCr6WVC0=.1278784f-b6c5-4d5e-92e6-fa21db95bc51@github.com>
 <hxa4YcsrUexAD_cqjQV9kz556tbr62l25sEdAKJf9hk=.3c1f219b-582b-46af-89c1-0b3dbcdc2c16@github.com>
Message-ID: <j9wD-xHe-y5DPpuECd_cRVryshOhzq602wuMqqpXZi0=.389f4f4c-845e-46f4-9f46-d187a534cdf9@github.com>

On Mon, 1 Jul 2024 12:37:03 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> just don't like launchers to use large heaps.

Could you add a comment at the start of this file explaining the test setup, launcher creating another VM + real test flags? There, the rational for small heap (64M) can be covered as well.

Some text on these fields can also help understand this test.

    final static long expectedMaxNonHeapRSS = M * 256;
    final static  long requiredAvailableBefore = heapsize * 2 + expectedMaxNonHeapRSS;
    final static  long requiredAvailableDuring = expectedMaxNonHeapRSS;

>> test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 161:
>> 
>>> 159:         System.out.println("RSS: " + rss + " available: " + avail + " committed " + committed);
>>> 160: 
>>> 161:         if (args[0].equals("run")) {
>> 
>> When will this branch be taken? I can't find where the `run` arg is specified.
>
> See line 141

I see; thanks. Can you add a comment referencing `prepareOptions`?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1661021604
PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1661022484

From pchilanomate at openjdk.org  Mon Jul  1 13:54:28 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Mon, 1 Jul 2024 13:54:28 GMT
Subject: RFR: 8329665: fatal error: memory leak: allocating without
 ResourceMark [v2]
In-Reply-To: <NR6CysHCjWX2AB1k8ccdDh5kHyNqKexk-oU5z88cNDE=.6f10c851-936c-4482-aefc-fddaefd213c6@github.com>
References: <riJhOUThyVVgaGs098dEOrDTvXnAOjxlK48aA-yE_2s=.a235716d-5221-46c0-bd0b-ef0cf7ab9ccf@github.com>
 <BytvXQLrd23w2e_K_YnaTbAI5-HpFSq_loPL3sA33NQ=.ffce43c4-d601-49e3-a013-d741baec9852@github.com>
 <c7P6CcVzkWEXcWmPSaJs7_vigReuBsGop5J-Tr5AQDY=.ef8428fe-8951-4804-a81d-7e02d05564ca@github.com>
 <6Y8MH5mOmcBplADeqhAWSYRd3JA2Yul7UjA8D3cc-5Y=.144cefac-7c3b-4cfa-b40a-d7862282a614@github.com>
 <NR6CysHCjWX2AB1k8ccdDh5kHyNqKexk-oU5z88cNDE=.6f10c851-936c-4482-aefc-fddaefd213c6@github.com>
Message-ID: <0fgUhYcCEUgcOxiB3Ot_-sc86HQTVCgTbXknZH-_cL8=.8314c987-02c7-4541-a40f-c35c3abab3f3@github.com>

On Mon, 1 Jul 2024 13:05:36 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> Yes, I was afraid of something like that. Please open a follow-up bug, and chain it to 8329665, since the fix has already been downported to JDK21. We should fix this before the October update. I added a little comment to 8329665 to ensure this does not get lost.
> 
Filed https://bugs.openjdk.org/browse/JDK-8335409.

> I was thinking along of this line.
> 
> The performance comment makes sense for lookup, but not so much here. Plus, I would wager this case is rare anyway since the InterpreterOopMap can hold what, (4*64)/2bit = 128 entries?
> 
Yes. I actually run tiers1-6 and found there are very few methods where we hit this code path where 128 entries are not enough so we need to allocate (~ less than 10). Most are tests to exercise this corner case. One of the only legit ones I found was TimeZoneNames_xx.java.getContents(). So it is a rare case.

> About the other comment (` // Due to the invariants above it's tricky to allocate a temporary OopMapCacheEntry on the stack`), that is weird. I don't get it. It is ages old (comes from initial commit), so you may be more able than I to dig out the history behind it. I tried to just allocate an InterpreterOopMap entry on the stack. I did a small test, nothing bad happened.
> 
> So, if this comment is not up to date, you may pay for the C-heap allocation by removing this allocation :)
> 
Sounds we should be able to allocate in the stack. I'll check this.
 
> > What do you think? I don't actually see a particular reason to disallow this allocations so I'm now not convinced on 1.
> 
> I prefer 2.
>
Okay, I was thinking the same. I'll work on this approach.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18632#issuecomment-2200217314

From aph at openjdk.org  Mon Jul  1 14:11:33 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 1 Jul 2024 14:11:33 GMT
Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well
 [v15]
In-Reply-To: <B3QnMNkIaw59FI-SRMX1R31gCni7ibyWczmIKYZlD2Y=.c0001db2-1a44-497e-aa9f-4f62147b41f5@github.com>
References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
 <B3QnMNkIaw59FI-SRMX1R31gCni7ibyWczmIKYZlD2Y=.c0001db2-1a44-497e-aa9f-4f62147b41f5@github.com>
Message-ID: <lGch3Yk2EtgNE8W8TTVqcc2oXzVMXzgfpL9rkLqtWsg=.d5b5d340-8de1-425d-b830-9b1be5e453b7@github.com>

On Mon, 1 Jul 2024 09:52:48 GMT, Amit Kumar <amitkumar at openjdk.org> wrote:

>> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450)
>> 
>> I ran `tier1` test with  `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. 
>> 
>> 
>> Without Patch: 
>> 
>> SecondarySuperCacheHits.test  avgt   15  0.929 ? 0.010  ns/op
>> 
>> SecondarySuperCacheInterContention.test     avgt   15  1.413 ? 0.007  ns/op
>> SecondarySuperCacheInterContention.test:t1  avgt   15  1.415 ? 0.016  ns/op
>> SecondarySuperCacheInterContention.test:t2  avgt   15  1.410 ? 0.017  ns/op
>> 
>> Benchmark                             Mode  Cnt   Score   Error  Units
>> SecondarySupersLookup.testNegative00  avgt   15   1.806 ? 0.325  ns/op
>> SecondarySupersLookup.testNegative01  avgt   15   2.364 ? 0.236  ns/op
>> SecondarySupersLookup.testNegative02  avgt   15   2.903 ? 0.215  ns/op
>> SecondarySupersLookup.testNegative03  avgt   15   3.417 ? 0.199  ns/op
>> SecondarySupersLookup.testNegative04  avgt   15   3.758 ? 0.102  ns/op
>> SecondarySupersLookup.testNegative05  avgt   15   4.352 ? 0.123  ns/op
>> SecondarySupersLookup.testNegative06  avgt   15   4.800 ? 0.099  ns/op
>> SecondarySupersLookup.testNegative07  avgt   15   5.365 ? 0.060  ns/op
>> SecondarySupersLookup.testNegative08  avgt   15   6.316 ? 0.092  ns/op
>> SecondarySupersLookup.testNegative09  avgt   15   6.669 ? 0.164  ns/op
>> SecondarySupersLookup.testNegative10  avgt   15   7.041 ? 0.164  ns/op
>> SecondarySupersLookup.testNegative16  avgt   15   9.336 ? 0.185  ns/op
>> SecondarySupersLookup.testNegative20  avgt   15  11.373 ? 0.029  ns/op
>> SecondarySupersLookup.testNegative30  avgt   15  15.236 ? 0.051  ns/op
>> SecondarySupersLookup.testNegative32  avgt   15  16.031 ? 0.091  ns/op
>> SecondarySupersLookup.testNegative40  avgt   15  19.197 ? 0.279  ns/op
>> SecondarySupersLookup.testNegative50  avgt   15  23.804 ? 2.387  ns/op
>> SecondarySupersLookup.testNegative55  avgt   15  25.610 ? 1.155  ns/op
>> SecondarySupersLookup.testNegative56  avgt   15  26.128 ? 2.203  ns/op
>> SecondarySupersLookup.testNegative57  avgt   15  26.126 ? 0.881  ns/op
>> SecondarySupersLookup.testNegative58  avgt   15  26.314 ? 0.521  ns/op
>> SecondarySupersLookup.testNegative59  avgt   15  26.750 ? 0.837  ns/op
>> SecondarySupersLookup.testNegative60  avgt   15  27.118 ? 0.557 ...
>
> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision:
> 
>   updates comment

src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3177:

> 3175: 
> 3176:   // z_brct above doesn't change CC.
> 3177:   // If the search operation is unsuccessful, then it's a failure case.

Suggestion:

  // If we reach here, then the value in r_value is not present. Set r_result to 1.

"Failure" is not a good word to use here, because it's unclear what failed.

src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3362:

> 3360:     z_bre(L_done); // success
> 3361: 
> 3362:     // look-ahead check (Bit 2), if bit-2 is also 0, we're done

Suggestion:

    // look-ahead check: if Bit 2 is 0, we're done

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1661115340
PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1661108285

From amitkumar at openjdk.org  Mon Jul  1 14:11:32 2024
From: amitkumar at openjdk.org (Amit Kumar)
Date: Mon, 1 Jul 2024 14:11:32 GMT
Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well
 [v16]
In-Reply-To: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
Message-ID: <bqd-eFTpYldNYZ-qOu-R7ft6O_kI6kWnT-5YIUy762E=.bb6bf3cd-f223-4234-8f71-edad7822ba68@github.com>

> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450)
> 
> I ran `tier1` test with  `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. 
> 
> 
> Without Patch: 
> 
> SecondarySuperCacheHits.test  avgt   15  0.929 ? 0.010  ns/op
> 
> SecondarySuperCacheInterContention.test     avgt   15  1.413 ? 0.007  ns/op
> SecondarySuperCacheInterContention.test:t1  avgt   15  1.415 ? 0.016  ns/op
> SecondarySuperCacheInterContention.test:t2  avgt   15  1.410 ? 0.017  ns/op
> 
> Benchmark                             Mode  Cnt   Score   Error  Units
> SecondarySupersLookup.testNegative00  avgt   15   1.806 ? 0.325  ns/op
> SecondarySupersLookup.testNegative01  avgt   15   2.364 ? 0.236  ns/op
> SecondarySupersLookup.testNegative02  avgt   15   2.903 ? 0.215  ns/op
> SecondarySupersLookup.testNegative03  avgt   15   3.417 ? 0.199  ns/op
> SecondarySupersLookup.testNegative04  avgt   15   3.758 ? 0.102  ns/op
> SecondarySupersLookup.testNegative05  avgt   15   4.352 ? 0.123  ns/op
> SecondarySupersLookup.testNegative06  avgt   15   4.800 ? 0.099  ns/op
> SecondarySupersLookup.testNegative07  avgt   15   5.365 ? 0.060  ns/op
> SecondarySupersLookup.testNegative08  avgt   15   6.316 ? 0.092  ns/op
> SecondarySupersLookup.testNegative09  avgt   15   6.669 ? 0.164  ns/op
> SecondarySupersLookup.testNegative10  avgt   15   7.041 ? 0.164  ns/op
> SecondarySupersLookup.testNegative16  avgt   15   9.336 ? 0.185  ns/op
> SecondarySupersLookup.testNegative20  avgt   15  11.373 ? 0.029  ns/op
> SecondarySupersLookup.testNegative30  avgt   15  15.236 ? 0.051  ns/op
> SecondarySupersLookup.testNegative32  avgt   15  16.031 ? 0.091  ns/op
> SecondarySupersLookup.testNegative40  avgt   15  19.197 ? 0.279  ns/op
> SecondarySupersLookup.testNegative50  avgt   15  23.804 ? 2.387  ns/op
> SecondarySupersLookup.testNegative55  avgt   15  25.610 ? 1.155  ns/op
> SecondarySupersLookup.testNegative56  avgt   15  26.128 ? 2.203  ns/op
> SecondarySupersLookup.testNegative57  avgt   15  26.126 ? 0.881  ns/op
> SecondarySupersLookup.testNegative58  avgt   15  26.314 ? 0.521  ns/op
> SecondarySupersLookup.testNegative59  avgt   15  26.750 ? 0.837  ns/op
> SecondarySupersLookup.testNegative60  avgt   15  27.118 ? 0.557  ns/op
> SecondarySupersLookup.testNegative61  avgt   15  27.763 ? 1.628  ns...

Amit Kumar has updated the pull request incrementally with one additional commit since the last revision:

  Update src/hotspot/cpu/s390/macroAssembler_s390.cpp
  
  Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19544/files
  - new: https://git.openjdk.org/jdk/pull/19544/files/6d05364f..7b533b41

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=15
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=14-15

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/19544.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544

PR: https://git.openjdk.org/jdk/pull/19544

From amitkumar at openjdk.org  Mon Jul  1 14:14:50 2024
From: amitkumar at openjdk.org (Amit Kumar)
Date: Mon, 1 Jul 2024 14:14:50 GMT
Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well
 [v17]
In-Reply-To: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
Message-ID: <NQ1QNuTBkNsmBReCpdhY1lrdIYz9s8UiNd1As1sLQ7M=.17c8f789-2bf1-4beb-891f-debccad29164@github.com>

> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450)
> 
> I ran `tier1` test with  `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. 
> 
> 
> Without Patch: 
> 
> SecondarySuperCacheHits.test  avgt   15  0.929 ? 0.010  ns/op
> 
> SecondarySuperCacheInterContention.test     avgt   15  1.413 ? 0.007  ns/op
> SecondarySuperCacheInterContention.test:t1  avgt   15  1.415 ? 0.016  ns/op
> SecondarySuperCacheInterContention.test:t2  avgt   15  1.410 ? 0.017  ns/op
> 
> Benchmark                             Mode  Cnt   Score   Error  Units
> SecondarySupersLookup.testNegative00  avgt   15   1.806 ? 0.325  ns/op
> SecondarySupersLookup.testNegative01  avgt   15   2.364 ? 0.236  ns/op
> SecondarySupersLookup.testNegative02  avgt   15   2.903 ? 0.215  ns/op
> SecondarySupersLookup.testNegative03  avgt   15   3.417 ? 0.199  ns/op
> SecondarySupersLookup.testNegative04  avgt   15   3.758 ? 0.102  ns/op
> SecondarySupersLookup.testNegative05  avgt   15   4.352 ? 0.123  ns/op
> SecondarySupersLookup.testNegative06  avgt   15   4.800 ? 0.099  ns/op
> SecondarySupersLookup.testNegative07  avgt   15   5.365 ? 0.060  ns/op
> SecondarySupersLookup.testNegative08  avgt   15   6.316 ? 0.092  ns/op
> SecondarySupersLookup.testNegative09  avgt   15   6.669 ? 0.164  ns/op
> SecondarySupersLookup.testNegative10  avgt   15   7.041 ? 0.164  ns/op
> SecondarySupersLookup.testNegative16  avgt   15   9.336 ? 0.185  ns/op
> SecondarySupersLookup.testNegative20  avgt   15  11.373 ? 0.029  ns/op
> SecondarySupersLookup.testNegative30  avgt   15  15.236 ? 0.051  ns/op
> SecondarySupersLookup.testNegative32  avgt   15  16.031 ? 0.091  ns/op
> SecondarySupersLookup.testNegative40  avgt   15  19.197 ? 0.279  ns/op
> SecondarySupersLookup.testNegative50  avgt   15  23.804 ? 2.387  ns/op
> SecondarySupersLookup.testNegative55  avgt   15  25.610 ? 1.155  ns/op
> SecondarySupersLookup.testNegative56  avgt   15  26.128 ? 2.203  ns/op
> SecondarySupersLookup.testNegative57  avgt   15  26.126 ? 0.881  ns/op
> SecondarySupersLookup.testNegative58  avgt   15  26.314 ? 0.521  ns/op
> SecondarySupersLookup.testNegative59  avgt   15  26.750 ? 0.837  ns/op
> SecondarySupersLookup.testNegative60  avgt   15  27.118 ? 0.557  ns/op
> SecondarySupersLookup.testNegative61  avgt   15  27.763 ? 1.628  ns...

Amit Kumar has updated the pull request incrementally with one additional commit since the last revision:

  Update src/hotspot/cpu/s390/macroAssembler_s390.cpp
  
  Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19544/files
  - new: https://git.openjdk.org/jdk/pull/19544/files/7b533b41..e935834b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=16
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=15-16

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/19544.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544

PR: https://git.openjdk.org/jdk/pull/19544

From mli at openjdk.org  Mon Jul  1 14:17:44 2024
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 1 Jul 2024 14:17:44 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding
Message-ID: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>

Hi,
Can you help to review the patch?

I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it.

Thanks.

## Test
benchmarks run on CanVM-K230

I've tried several implementations, respectively with vector group
* m2+m1
* m2
* m1
The best one is combination of m2+m1, it have best performance in all source size.

###this implementation (m2+m1)
<google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic
-- | -- | -- | -- | -- | -- | -- | -- | --
Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063
Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443
Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364
Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856
Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146
Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423
Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862
Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469
Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631
Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698
Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876

</google-sheets-html-origin>

###vector with only m2
<google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsics | Score +intrinsic, m2 | Error | Units | -intrinsic/+intrinsic
-- | -- | -- | -- | -- | -- | -- | -- | --
Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.797 | 86.872 | 0.374 | ns/op | 0.9991366608
Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.971 | 94.203 | 1.918 | ns/op | 0.9975372334
Base64Encode.testBase64Encode | 3 | avgt | 10 | 122.074 | 123.978 | 1.009 | ns/op | 0.9846424366
Base64Encode.testBase64Encode | 6 | avgt | 10 | 138.999 | 138.344 | 2.175 | ns/op | 1.004734575
Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.857 | 157.494 | 1.036 | ns/op | 1.021353194
Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.511 | 154.998 | 1.727 | ns/op | 1.042019897
Base64Encode.testBase64Encode | 10 | avgt | 10 | 186.228 | 175.38 | 0.62 | ns/op | 1.061854259
Base64Encode.testBase64Encode | 48 | avgt | 10 | 408.461 | 349.558 | 15.377 | ns/op | 1.168507086
Base64Encode.testBase64Encode | 512 | avgt | 10 | 3679.283 | 1103.717 | 3.911 | ns/op | 3.333538398
Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7206.265 | 1988.927 | 224.732 | ns/op | 3.623192304
Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135695.875 | 33930.292 | 97.85 | ns/op | 3.99925456

</google-sheets-html-origin>

###vector with only m1
<google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score +intrinsic, m1 | Error | Units | -intrinsic/+intrinsic
-- | -- | -- | -- | -- | -- | -- | -- | --
Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.837 | 87.137 | 0.527 | ns/op | 0.9965571456
Base64Encode.testBase64Encode | 2 | avgt | 10 | 94.723 | 94.125 | 5.122 | ns/op | 1.006353254
Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.51 | 123.082 | 0.854 | ns/op | 0.9872280268
Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.045 | 137.175 | 0.201 | ns/op | 1.013632222
Base64Encode.testBase64Encode | 7 | avgt | 10 | 161.216 | 159.387 | 2.385 | ns/op | 1.011475214
Base64Encode.testBase64Encode | 9 | avgt | 10 | 160.541 | 154.19 | 1.665 | ns/op | 1.041189442
Base64Encode.testBase64Encode | 10 | avgt | 10 | 184.874 | 174.766 | 5.569 | ns/op | 1.057837337
Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.124 | 199.333 | 1.584 | ns/op | 2.032398047
Base64Encode.testBase64Encode | 512 | avgt | 10 | 3659.335 | 1185.626 | 24.686 | ns/op | 3.086415952
Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7239.269 | 2164.709 | 1022.367 | ns/op | 3.344222711
Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135048.828 | 38248.645 | 319.978 | ns/op | 3.530813392

</google-sheets-html-origin>

-------------

Commit messages:
 - clean code
 - Initial commit

Changes: https://git.openjdk.org/jdk/pull/19973/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8314125
  Stats: 245 lines in 3 files changed: 245 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/19973.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19973/head:pull/19973

PR: https://git.openjdk.org/jdk/pull/19973

From fjiang at openjdk.org  Mon Jul  1 14:40:26 2024
From: fjiang at openjdk.org (Feilong Jiang)
Date: Mon, 1 Jul 2024 14:40:26 GMT
Subject: RFR: 8335411: RISC-V: Optimize encode_heap_oop when oop is not null
Message-ID: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com>

Hi, please review this enhancement that adds two more `encode_heap_oop_not_null` methods.

Currently, `encode_heap_oop` will check if the oop pointer is `null` at first. We can skip the null check of the oop to reduce the unnecessary branch instruction when encoding non-null oop pointer into compressed form.


Testing:
- [x] Tier1~3 on linux-riscv64 with release build
- [x] renaissance & dacapo benchmark suits for functionality

-------------

Commit messages:
 - add encode_heap_oop_not_null for riscv

Changes: https://git.openjdk.org/jdk/pull/19974/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19974&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335411
  Stats: 62 lines in 3 files changed: 59 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/19974.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19974/head:pull/19974

PR: https://git.openjdk.org/jdk/pull/19974

From sgehwolf at openjdk.org  Mon Jul  1 14:43:58 2024
From: sgehwolf at openjdk.org (Severin Gehwolf)
Date: Mon, 1 Jul 2024 14:43:58 GMT
Subject: RFR: 8333446: Add tests for hierarchical container support [v3]
In-Reply-To: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com>
References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com>
Message-ID: <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com>

> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges.
> 
> I'm adding those tests in order to not regress another time.
> 
> Testing:
> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1.
> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to  JDK-8322420)
> - [x] GHA

Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:

 - Merge branch 'master' into jdk-8333446-systemd-slice-tests
 - Merge branch 'master' into jdk-8333446-systemd-slice-tests
 - Fix comments
 - 8333446: Add tests for hierarchical container support

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19530/files
  - new: https://git.openjdk.org/jdk/pull/19530/files/00b528ae..22141a48

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=01-02

  Stats: 26334 lines in 522 files changed: 18610 ins; 5830 del; 1894 mod
  Patch: https://git.openjdk.org/jdk/pull/19530.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19530/head:pull/19530

PR: https://git.openjdk.org/jdk/pull/19530

From duke at openjdk.org  Mon Jul  1 14:45:24 2024
From: duke at openjdk.org (Yuri Gaevsky)
Date: Mon, 1 Jul 2024 14:45:24 GMT
Subject: RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic
In-Reply-To: <dxSBhJiLeVkLF8PvHW3MMg69vwXU0VshECCMz5HnhhI=.e0cbda8b-f7f6-44ff-806b-1f21496911be@github.com>
References: <dxSBhJiLeVkLF8PvHW3MMg69vwXU0VshECCMz5HnhhI=.e0cbda8b-f7f6-44ff-806b-1f21496911be@github.com>
Message-ID: <7Yi0Fbxcg9ZlXCiU6j12NsOGgNsn9Y8IA8ad9dUV3ko=.7b3f4f37-3ec9-4152-8bf6-0b57853e073d@github.com>

On Wed, 7 Feb 2024 14:35:55 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:

> Hello All,
> 
> Please review these changes to enable the __vectorizedMismatch_ intrinsic on RISC-V platform with RVV instructions supported.
> 
> Thank you,
> -Yuri Gaevsky
> 
> **Correctness checks:**
>   hotspot/jtreg/compiler/{intrinsic/c1/c2}/ under QEMU-8.1 with RVV v1.0.0 and -XX:TieredStopAtLevel=1/2/3/4.

.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-2200351378

From duke at openjdk.org  Mon Jul  1 15:26:38 2024
From: duke at openjdk.org (duke)
Date: Mon, 1 Jul 2024 15:26:38 GMT
Subject: RFR: 8280481: Duplicated stubs to interpreter for static calls
 [v2]
In-Reply-To: <SEOOihMeBukAoQInq9Lt5xDYNB037oRdkZc8F9R4_FA=.793f18eb-4572-43d7-ad15-6cf09b27576a@github.com>
References: <9N1GcHDRvyX1bnPrRcyw96zWIgrrAm4mfrzp8dQ-BBk=.6d55c5fd-7d05-4058-99b6-7d40a92450bf@github.com>
 <SEOOihMeBukAoQInq9Lt5xDYNB037oRdkZc8F9R4_FA=.793f18eb-4572-43d7-ad15-6cf09b27576a@github.com>
Message-ID: <XgfLEbpFT_CgKaZwmScqPMcPyOLdEkl1pqachAXXTn4=.394d33c0-58b7-405f-ab52-fe3078351613@github.com>

On Wed, 29 Jun 2022 14:50:59 GMT, Evgeny Astigeevich <eastigeevich at openjdk.org> wrote:

>> ## Problem
>> Calls of Java methods have stubs to the interpreter for the cases when an invoked Java method is not compiled. Calls of static Java methods and final Java methods have statically bound information about a callee during compilation. Such calls can share stubs to the interpreter.
>> 
>> Each stub to the interpreter has a relocation record (accessed via `relocInfo`) which provides the address of the stub and the address of its owner. `relocInfo` has an offset which is an offset from the previously known relocatable address. The address of a stub is calculated as the address provided by the previous `relocInfo` plus the offset.
>> 
>> Each Java call has:
>> - A relocation for a call site.
>> - A relocation for a stub to the interpreter.
>> - A stub to the interpreter.
>> - If far jumps are used (arm64 case):
>>   - A trampoline relocation.
>>   - A trampoline.
>> 
>> We cannot avoid creating relocations. They are needed to support patching call sites.
>> With shared stubs there will be multiple relocations having the same stub address but different owners' addresses.
>> If we try to generate relocations as we go there will be a case which requires negative offsets:
>> 
>> reloc1  ---> 0x0: stub1
>> reloc2  ---> 0x4: stub2 (reloc2.addr = reloc1.addr + reloc2.offset = 0x0 + 4)
>> reloc3  ---> 0x0: stub1 (reloc3.addr = reloc2.addr + reloc3.offset = 0x4 - 4)
>> 
>> 
>> `CodeSection` does not support negative offsets. It [assumes](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/codeBuffer.hpp#L195) addresses relocations pointing at grow upward.
>> Negative offsets reduce the offset range by half. This can increase filler records, the empty `relocInfo` records to reduce offset values. Also negative offsets are only needed for `static_stub_type`, but other 13 types don?t need them.
>> 
>> ## Solution
>> In this PR creation of stubs is done in two stages. First we collect requests for creating shared stubs: a callee `ciMethod*` and an offset of a call in `CodeBuffer` (see [src/hotspot/share/asm/codeBuffer.hpp](https://github.com/openjdk/jdk/pull/8816/files#diff-deb8ab083311ba60c0016dc34d6518579bbee4683c81e8d348982bac897fe8ae)). Then we have the finalisation phase (see [src/hotspot/share/ci/ciEnv.cpp](https://github.com/openjdk/jdk/pull/8816/files#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450)), where `CodeBuffer::finalize_stubs()` creates shared stubs in `CodeBuffer`: a stub and multiple relocations sharing it. The first relocation will ...
>
> Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8280481C
>  - Use call offset instead of caller pc
>  - Simplify test
>  - Fix x86 build failures
>  - Remove UseSharedStubs and clarify shared stub use cases
>  - Make SharedStubToInterpRequest ResourceObj and set initial size of SharedStubToInterpRequests to 8
>  - Update copyright year and add Unimplemented guards
>  - Set UseSharedStubs to true for X86
>  - Set UseSharedStubs to true for AArch64
>  - Fix x86 build failure
>  - ... and 10 more: https://git.openjdk.org/jdk/compare/073960fa...da3bfb5b

@eastig 
Your change (at version da3bfb5b86dd272a0bf3919ea710e12b2fd66bcc) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/8816#issuecomment-1173782159

From eastigeevich at openjdk.org  Mon Jul  1 15:26:38 2024
From: eastigeevich at openjdk.org (Evgeny Astigeevich)
Date: Mon, 1 Jul 2024 15:26:38 GMT
Subject: RFR: 8280481: Duplicated stubs to interpreter for static calls
In-Reply-To: <rl8hny_U1ThjuAiduYmmsoGqN343SqZv9v6b3M7Ugiw=.a03e5ac9-668b-4d94-a2d5-54dfdcc8eb3a@github.com>
References: <9N1GcHDRvyX1bnPrRcyw96zWIgrrAm4mfrzp8dQ-BBk=.6d55c5fd-7d05-4058-99b6-7d40a92450bf@github.com>
 <t1Gigc1XLRETSXriG2Bw9-zZGbaJxxIj-Hda3sRBGf8=.60774236-39cc-4f39-b757-076d33af675b@github.com>
 <B8WuOWl_39ppfyZuz8fAoCmlwSPvDzLFF1ikhF-N0S8=.fd498ddb-7854-4616-802c-a0675dbb031c@github.com>
 <ONVSGWta7auQsl98toVizl8e6L9USdeZNzwdrg48gmQ=.d71dd705-43cb-43b7-a3f1-a18e40ec103f@github.com>
 <eCO5C4rdv0svuNfSPaRfva15HuqEzUTzJ08Fc2gMjF0=.df9bf477-2fb1-4172-8559-4f6b8cf26a52@github.com>
 <GeRhcXEA5_PBba-cRJ5sWHFvrrjgGUGO0cVv82tPn5g=.d8372a44-afa6-44dc-87be-8a305ea7b9e7@github.com>
 <SaHwtSn1Sh0Q4fvFbbrDhWCvRIXbk2z7ka13ezI9PyQ=.7ab81aa0-678e-4108-838c-5757ad1aa2ac@github.com>
 <rl8hny_U1ThjuAiduYmmsoGqN343SqZv9v6b3M7Ugiw=.a03e5ac9-668b-4d94-a2d5-54dfdcc8eb3a@github.com>
Message-ID: <7L47Ho5J_hz_XkmE9M25CtLwtRdSJ0k3INrQQx6YZ0U=.b4fe9454-fdd5-4513-ab77-3eda4ad4ad09@github.com>

On Wed, 12 Jun 2024 07:17:25 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

>>> Hi @eastig ,
>>> I would like to recurring your experimental data and I would be very grateful if you could provide a small patch to help me get the result of `Saved bytes` and `Nmethods with shared stubs`.
>>> Thank you!
>> 
>> 
>> diff --git a/src/hotspot/share/asm/codeBuffer.inline.hpp b/src/hotspot/share/asm/codeBuffer.inline.hpp
>> index 045cff13f25..9af26730cbd 100644
>> --- a/src/hotspot/share/asm/codeBuffer.inline.hpp
>> +++ b/src/hotspot/share/asm/codeBuffer.inline.hpp
>> @@ -45,6 +45,7 @@ bool emit_shared_stubs_to_interp(CodeBuffer* cb, SharedStubToInterpRequests* sha
>>    };
>>    shared_stub_to_interp_requests->sort(by_shared_method);
>>    MacroAssembler masm(cb);
>> +  bool has_shared = false;
>>    for (int i = 0; i < shared_stub_to_interp_requests->length();) {
>>      address stub = masm.start_a_stub(CompiledStaticCall::to_interp_stub_size());
>>      if (stub == NULL) {
>> @@ -53,13 +54,22 @@ bool emit_shared_stubs_to_interp(CodeBuffer* cb, SharedStubToInterpRequests* sha
>>      }
>> 
>>      ciMethod* method = shared_stub_to_interp_requests->at(i).shared_method();
>> +    int shared = 0;
>>      do {
>>        address caller_pc = cb->insts_begin() + shared_stub_to_interp_requests->at(i).call_offset();
>>        masm.relocate(static_stub_Relocation::spec(caller_pc), relocate_format);
>>        ++i;
>> +      ++shared;
>>      } while (i < shared_stub_to_interp_requests->length() && shared_stub_to_interp_requests->at(i).shared_method() == method);
>>      masm.emit_static_call_stub();
>>      masm.end_a_stub();
>> +    if (UseNewCode && shared > 1) {
>> +      has_shared = true;
>> +      tty->print_cr("Saved: %d", (shared - 1) * CompiledStaticCall::to_interp_stub_size());
>> +    }
>> +  }
>> +  if (has_shared) {
>> +    tty->print_cr("nm_has_shared");
>>    }
>>    return true;
>>  }
>> 
>> 
>> You will need to use `-XX:+UseNewCode` in your runs.
>> `grep nm_has_shared  run.log | wc -l` is a number of nmethods having a shared stub.
>> `grep Saved: run.log | awk '{print $2}' | grep -o '[0-9]*' | paste -s -d+ - | bc` prints a number of saved bytes.
>
> @eastig as I understand, this optimization is about saving code cache memory. The sharing is within an nmethod, not across nmethods, correct?
> I'm trying to prioritize an effort to adopt this optimization in Graal. In addition to the numbers you present for code cache bytes saved in the benchmarks, can you say anything about how much that is relative to the code cache used in the benchmarks?
> 
> More context: https://github.com/openjdk/jdk/pull/19672

Hi @dougxc 
> @eastig as I understand, this optimization is about saving code cache memory. The sharing is within an nmethod, not across nmethods, correct?

Yes, you are correct. Sharing across nmethods would need non-simple maintanance, maybe similar to the inline cache. I am not sure it is worth.

> In addition to the numbers you present for code cache bytes saved in the benchmarks, can you say anything about how much that is relative to the code cache used in the benchmarks?

In a service application with more than 50 000 nmethods I saw between 1% - 2% reduction in CodeCache usage.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/8816#issuecomment-2200450633

From mli at openjdk.org  Mon Jul  1 15:33:29 2024
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 1 Jul 2024 15:33:29 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2]
In-Reply-To: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
Message-ID: <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com>

> Hi,
> Can you help to review the patch?
> 
> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it.
> 
> Thanks.
> 
> ## Test
> benchmarks run on CanVM-K230
> 
> I've tried several implementations, respectively with vector group
> * m2+m1
> * m2
> * m1
> The best one is combination of m2+m1, it have best performance in all source size.
> 
> ###this implementation (m2+m1)
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic
> -- | -- | -- | -- | -- | -- | -- | -- | --
> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063
> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443
> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364
> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856
> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146
> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423
> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862
> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469
> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631
> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698
> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876
> 
> </google-sheets-html-origin>
> 
> ###vector with only m2
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -web...

Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:

  use pure scalar version when rvv is not supported

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19973/files
  - new: https://git.openjdk.org/jdk/pull/19973/files/fc32d9fa..cf732984

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=00-01

  Stats: 13 lines in 2 files changed: 4 ins; 7 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/19973.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19973/head:pull/19973

PR: https://git.openjdk.org/jdk/pull/19973

From mli at openjdk.org  Mon Jul  1 15:38:18 2024
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 1 Jul 2024 15:38:18 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2]
In-Reply-To: <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
 <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com>
Message-ID: <S-BpiX60ySY6FNDfcskTHuuDsQQIno54AaOvSFlm67c=.24e8cf29-de2c-4f8e-bcdb-7cd1c7927c30@github.com>

On Mon, 1 Jul 2024 15:33:29 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> 
>> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it.
>> 
>> Thanks.
>> 
>> ## Test
>> benchmarks run on CanVM-K230
>> 
>> I've tried several implementations, respectively with vector group
>> * m2+m1+scalar
>> * m2+scalar
>> * m1+scalar
>> * pure scalar
>> The best one is combination of m2+m1, it have best performance in all source size.
>> 
>> this implementation (m2+m1)
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic
>> -- | -- | -- | -- | -- | -- | -- | -- | --
>> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063
>> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443
>> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364
>> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856
>> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146
>> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423
>> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862
>> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469
>> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631
>> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698
>> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876
>> 
>> </google-sheets-html-origin>
>> 
>> vector with only m2
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: st...
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   use pure scalar version when rvv is not supported

with pure scalar impelmentation, it also bring some performance imrpovement in all source size, so also enable the intrinsic when rvv is not supported.

performance data
<google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score +instrinsic, scalar | Error | Units | Perf opt
-- | -- | -- | -- | -- | -- | -- | -- | --
Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.75 | 0.38 | ns/op | 1
Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.71 | 93.824 | 1.954 | ns/op | 0.999
Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.824 | 123.487 | 0.559 | ns/op | 0.987
Base64Encode.testBase64Encode | 6 | avgt | 10 | 138.984 | 137.697 | 0.273 | ns/op | 1.009
Base64Encode.testBase64Encode | 7 | avgt | 10 | 161.243 | 157.696 | 0.875 | ns/op | 1.022
Base64Encode.testBase64Encode | 9 | avgt | 10 | 169.724 | 155.223 | 1.908 | ns/op | 1.093
Base64Encode.testBase64Encode | 10 | avgt | 10 | 185.92 | 176.339 | 5.875 | ns/op | 1.054
Base64Encode.testBase64Encode | 48 | avgt | 10 | 408.467 | 347.269 | 1.799 | ns/op | 1.176
Base64Encode.testBase64Encode | 512 | avgt | 10 | 3665.34 | 2718.442 | 26.954 | ns/op | 1.348
Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7022.025 | 5290.003 | 33.216 | ns/op | 1.327
Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135819.7 | 101988.94 | 2209.887 | ns/op | 1.332

</google-sheets-html-origin>

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2200477845

From dnsimon at openjdk.org  Mon Jul  1 15:41:33 2024
From: dnsimon at openjdk.org (Doug Simon)
Date: Mon, 1 Jul 2024 15:41:33 GMT
Subject: RFR: 8280481: Duplicated stubs to interpreter for static calls
 [v2]
In-Reply-To: <SEOOihMeBukAoQInq9Lt5xDYNB037oRdkZc8F9R4_FA=.793f18eb-4572-43d7-ad15-6cf09b27576a@github.com>
References: <9N1GcHDRvyX1bnPrRcyw96zWIgrrAm4mfrzp8dQ-BBk=.6d55c5fd-7d05-4058-99b6-7d40a92450bf@github.com>
 <SEOOihMeBukAoQInq9Lt5xDYNB037oRdkZc8F9R4_FA=.793f18eb-4572-43d7-ad15-6cf09b27576a@github.com>
Message-ID: <vaLmP2lZkrQoRcSEfIHkELyRUoyFgSS7H9DtJ8uqf8c=.c4e50a71-602e-4f36-a3d7-b36ecf0a640c@github.com>

On Wed, 29 Jun 2022 14:50:59 GMT, Evgeny Astigeevich <eastigeevich at openjdk.org> wrote:

>> ## Problem
>> Calls of Java methods have stubs to the interpreter for the cases when an invoked Java method is not compiled. Calls of static Java methods and final Java methods have statically bound information about a callee during compilation. Such calls can share stubs to the interpreter.
>> 
>> Each stub to the interpreter has a relocation record (accessed via `relocInfo`) which provides the address of the stub and the address of its owner. `relocInfo` has an offset which is an offset from the previously known relocatable address. The address of a stub is calculated as the address provided by the previous `relocInfo` plus the offset.
>> 
>> Each Java call has:
>> - A relocation for a call site.
>> - A relocation for a stub to the interpreter.
>> - A stub to the interpreter.
>> - If far jumps are used (arm64 case):
>>   - A trampoline relocation.
>>   - A trampoline.
>> 
>> We cannot avoid creating relocations. They are needed to support patching call sites.
>> With shared stubs there will be multiple relocations having the same stub address but different owners' addresses.
>> If we try to generate relocations as we go there will be a case which requires negative offsets:
>> 
>> reloc1  ---> 0x0: stub1
>> reloc2  ---> 0x4: stub2 (reloc2.addr = reloc1.addr + reloc2.offset = 0x0 + 4)
>> reloc3  ---> 0x0: stub1 (reloc3.addr = reloc2.addr + reloc3.offset = 0x4 - 4)
>> 
>> 
>> `CodeSection` does not support negative offsets. It [assumes](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/codeBuffer.hpp#L195) addresses relocations pointing at grow upward.
>> Negative offsets reduce the offset range by half. This can increase filler records, the empty `relocInfo` records to reduce offset values. Also negative offsets are only needed for `static_stub_type`, but other 13 types don?t need them.
>> 
>> ## Solution
>> In this PR creation of stubs is done in two stages. First we collect requests for creating shared stubs: a callee `ciMethod*` and an offset of a call in `CodeBuffer` (see [src/hotspot/share/asm/codeBuffer.hpp](https://github.com/openjdk/jdk/pull/8816/files#diff-deb8ab083311ba60c0016dc34d6518579bbee4683c81e8d348982bac897fe8ae)). Then we have the finalisation phase (see [src/hotspot/share/ci/ciEnv.cpp](https://github.com/openjdk/jdk/pull/8816/files#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450)), where `CodeBuffer::finalize_stubs()` creates shared stubs in `CodeBuffer`: a stub and multiple relocations sharing it. The first relocation will ...
>
> Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8280481C
>  - Use call offset instead of caller pc
>  - Simplify test
>  - Fix x86 build failures
>  - Remove UseSharedStubs and clarify shared stub use cases
>  - Make SharedStubToInterpRequest ResourceObj and set initial size of SharedStubToInterpRequests to 8
>  - Update copyright year and add Unimplemented guards
>  - Set UseSharedStubs to true for X86
>  - Set UseSharedStubs to true for AArch64
>  - Fix x86 build failure
>  - ... and 10 more: https://git.openjdk.org/jdk/compare/134ea4b0...da3bfb5b

Ok, thanks for the numbers. Was there any noticeable increase in throughput (or any other interesting metrics)?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/8816#issuecomment-2200483943

From eastigeevich at openjdk.org  Mon Jul  1 15:45:32 2024
From: eastigeevich at openjdk.org (Evgeny Astigeevich)
Date: Mon, 1 Jul 2024 15:45:32 GMT
Subject: RFR: 8280481: Duplicated stubs to interpreter for static calls
 [v2]
In-Reply-To: <vaLmP2lZkrQoRcSEfIHkELyRUoyFgSS7H9DtJ8uqf8c=.c4e50a71-602e-4f36-a3d7-b36ecf0a640c@github.com>
References: <9N1GcHDRvyX1bnPrRcyw96zWIgrrAm4mfrzp8dQ-BBk=.6d55c5fd-7d05-4058-99b6-7d40a92450bf@github.com>
 <SEOOihMeBukAoQInq9Lt5xDYNB037oRdkZc8F9R4_FA=.793f18eb-4572-43d7-ad15-6cf09b27576a@github.com>
 <vaLmP2lZkrQoRcSEfIHkELyRUoyFgSS7H9DtJ8uqf8c=.c4e50a71-602e-4f36-a3d7-b36ecf0a640c@github.com>
Message-ID: <JSLKkazCvQcLa9xigxiNCgM0gAAmlXjbcGhV9WMI8dQ=.79987e73-72b0-4ff5-99f4-52e2a30aa173@github.com>

On Mon, 1 Jul 2024 15:39:05 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

> Was there any noticeable increase in throughput (or any other interesting metrics)?

Nothing I can recall. I did not test it alone for performance.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/8816#issuecomment-2200492049

From mli at openjdk.org  Mon Jul  1 16:54:55 2024
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 1 Jul 2024 16:54:55 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
Message-ID: <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>

> Hi,
> Can you help to review the patch?
> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294).
> 
> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time.
> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
> 
> Besides of the code changes, one important task is to handle the legal process.
> 
> Thanks!
> 
> ## Performance
> NOTE: 
> * `Src` means implementation in this pr, i.e. without depenency on external sleef.
> * `Disabled` means disable intrinsics by `-XX:-UseVectorStubs` 
> * `system_sleef` means implementation in [previous pr 18294](https://github.com/openjdk/jdk/pull/18294), i.e. build and run jdk with depenency on external sleef.
> 
> Basically, the perf data below shows that 
> * this implementation has better performance than previous version in [pr 18294](https://github.com/openjdk/jdk/pull/18294), 
> * and both sleef versions has much better performance compared with non-sleef version.
> 
> |Benchmark                     |(size)|Src      |Units|system_sleef|(system_sleef-Src)/Src|Diabled  |(Disable-Src)/Src|
> |------------------------------|------|---------|-----|------------|----------------------|---------|-----------------|
> |3472:Double128Vector.ACOS     |1024  |8546.842 |ns/op|8516.007    |-0.004                |16799.273|0.966            |
> |3473:Double128Vector.ASIN     |1024  |6864.656 |ns/op|6987.328    |0.018                 |16602.442|1.419            |
> |3474:Double128Vector.ATAN     |1024  |11489.255|ns/op|12261.800   |0.067                 |26329.320|1.292            |
> |3475:Double128Vector.ATAN2    |1024  |16661.170|ns/op|17234.472   |0.034                 |42084.100|1.526            |
> |3476:Double128Vector.CBRT     |1024  |18999.387|ns/op|20298.458   |0.068                 |35998.688|0.895            |
> |3477:Double128Vector.COS      |1024  |14081.857|ns/op|14846.117   |0.054                 |24420.692|0.734            |
> |3478:Double128Vector.COSH     |1024  |12202.306|ns/op|12237.772   |0.003                 |21343.863|0.749            |
> |3479:Double128Vector.EXP      |1024  |4553.108 |ns/op|4777.638    |0.049                 |20155.903|3.427            |
> |3480:D...

Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits:

 - Merge branch 'master' into sleef-aarch64-integrate-source
 - merge master
 - sleef 3.6.1 for riscv
 - sleef 3.6.1
 - update header files for arm
 - add inline header file for riscv64
 - remove notes about sleef changes
 - fix performance issue
 - disable unused-function warnings; add log msg
 - minor
 - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863

-------------

Changes: https://git.openjdk.org/jdk/pull/18605/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=08
  Stats: 21668 lines in 21 files changed: 21624 ins; 1 del; 43 mod
  Patch: https://git.openjdk.org/jdk/pull/18605.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18605/head:pull/18605

PR: https://git.openjdk.org/jdk/pull/18605

From luhenry at openjdk.org  Mon Jul  1 17:16:22 2024
From: luhenry at openjdk.org (Ludovic Henry)
Date: Mon, 1 Jul 2024 17:16:22 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2]
In-Reply-To: <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
 <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com>
Message-ID: <hvqUkBLtcQL_zyScuU4YzgupWTLVXQDrgNGCtPRahQ8=.089771da-3a1b-4ff7-bf55-16b801b8ed11@github.com>

On Mon, 1 Jul 2024 15:33:29 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> 
>> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it.
>> 
>> Thanks.
>> 
>> ## Test
>> benchmarks run on CanVM-K230
>> 
>> I've tried several implementations, respectively with vector group
>> * m2+m1+scalar
>> * m2+scalar
>> * m1+scalar
>> * pure scalar
>> The best one is combination of m2+m1, it have best performance in all source size.
>> 
>> this implementation (m2+m1)
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic
>> -- | -- | -- | -- | -- | -- | -- | -- | --
>> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063
>> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443
>> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364
>> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856
>> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146
>> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423
>> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862
>> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469
>> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631
>> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698
>> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876
>> 
>> </google-sheets-html-origin>
>> 
>> vector with only m2
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: st...
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   use pure scalar version when rvv is not supported

Changes requested by luhenry (Committer).

src/hotspot/cpu/riscv/assembler_riscv.hpp line 1828:

> 1826:   }
> 1827: 
> 1828:   // Vector Unit-Stride Instructions

Suggestion:

  // Vector Unit-Stride Load Instructions

src/hotspot/cpu/riscv/assembler_riscv.hpp line 1831:

> 1829:   INSN(vlseg3e8_v, 0b0000111, 0b000, 0b00000, 0b00, 0b0, g3);
> 1830: 
> 1831:   INSN(vsseg4e8_v, 0b0100111, 0b000, 0b00000, 0b00, 0b0, g4);

Suggestion:

  // Vector Unit-Stride Store Instructions
  INSN(vsseg4e8_v, 0b0100111, 0b000, 0b00000, 0b00, 0b0, g4);

src/hotspot/cpu/riscv/assembler_riscv.hpp line 1832:

> 1830: 
> 1831:   INSN(vsseg4e8_v, 0b0100111, 0b000, 0b00000, 0b00, 0b0, g4);
> 1832: #undef INSN

Blank like before the `#undef INSN`

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5115:

> 5113:    * NOTE: each field will occupy a vector register group
> 5114:    */
> 5115:   void encodeVector(Register src, Register dst, Register codec, Register step,

Suggestion:

  void generate_base64_encodeVector(Register src, Register dst, Register codec, Register step,

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5230:

> 5228: 
> 5229:     // vector version
> 5230:     {

You should not even generate the vectorized code if `UseRVV` is false. You can then remove https://github.com/openjdk/jdk/pull/19973/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R5225-R5227 

Suggestion:

    if (UseRVV) {

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5263:

> 5261: 
> 5262:     // scalar version
> 5263:     __ BIND(ProcessScalar);

You can move that in the previous block at https://github.com/openjdk/jdk/pull/19973/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R5260 as it's the only block where it's used.

-------------

PR Review: https://git.openjdk.org/jdk/pull/19973#pullrequestreview-2151861929
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1661333608
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1661333775
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1661333402
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1661334220
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1661335985
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1661336651

From tschatzl at openjdk.org  Tue Jul  2 07:47:58 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Tue, 2 Jul 2024 07:47:58 GMT
Subject: RFR: 8331385: G1: Prefix HeapRegion helper classes with G1
Message-ID: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com>

Hi all,

  after [JDK-8330694](https://bugs.openjdk.org/browse/JDK-8330694) which renamed HeapRegion to G1HeapRegion, there were  a few related helper classes in this CR that were not renamed.

It's purely mechanical renaming without even further renaming of files etc.

This change updates them.

(Fwiw, the "Viewed" checkbox at the top right of the file change helps a lot review this change incrementally)

Testing: tier1, tier4, tier5

Thanks,
  Thomas

-------------

Commit messages:
 - 8331385

Changes: https://git.openjdk.org/jdk/pull/19967/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19967&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8331385
  Stats: 887 lines in 68 files changed: 163 ins; 165 del; 559 mod
  Patch: https://git.openjdk.org/jdk/pull/19967.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19967/head:pull/19967

PR: https://git.openjdk.org/jdk/pull/19967

From fyang at openjdk.org  Tue Jul  2 08:08:17 2024
From: fyang at openjdk.org (Fei Yang)
Date: Tue, 2 Jul 2024 08:08:17 GMT
Subject: RFR: 8335411: RISC-V: Optimize encode_heap_oop when oop is not
 null
In-Reply-To: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com>
References: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com>
Message-ID: <lIVaw53Tr_xZacCfEuFiDJdmau6xpqBPQ33vqcAMlWg=.94875ebb-c758-4cd3-b2f6-2d6c9b5602cf@github.com>

On Mon, 1 Jul 2024 14:32:03 GMT, Feilong Jiang <fjiang at openjdk.org> wrote:

> Hi, please review this enhancement that adds two more `encode_heap_oop_not_null` methods.
> 
> Currently, `encode_heap_oop` will check if the oop pointer is `null` at first. We can skip the null check of the oop to reduce the unnecessary branch instruction when encoding non-null oop pointer into compressed form.
> 
> 
> Testing:
> - [x] Tier1~3 on linux-riscv64 with release build
> - [x] renaissance & dacapo benchmark suits for functionality

Looks good. Thanks.

-------------

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19974#pullrequestreview-2153023485

From ayang at openjdk.org  Tue Jul  2 10:24:18 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Tue, 2 Jul 2024 10:24:18 GMT
Subject: RFR: 8331385: G1: Prefix HeapRegion helper classes with G1
In-Reply-To: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com>
References: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com>
Message-ID: <BbCLtLUIqyaA9lNeheVeZJV2fb49kWP2p5t8vRAJ1Uw=.6f72af7b-c5dd-4814-95b4-04e91f32b2c7@github.com>

On Mon, 1 Jul 2024 09:35:00 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

> Hi all,
> 
>   after [JDK-8330694](https://bugs.openjdk.org/browse/JDK-8330694) which renamed HeapRegion to G1HeapRegion, there were  a few related helper classes in this CR that were not renamed.
> 
> It's purely mechanical renaming without even further renaming of files etc.
> 
> This change updates them.
> 
> (Fwiw, the "Viewed" checkbox at the top right of the file change helps a lot review this change incrementally)
> 
> Testing: tier1, tier4, tier5
> 
> Thanks,
>   Thomas

Marked as reviewed by ayang (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/19967#pullrequestreview-2153390622

From mli at openjdk.org  Tue Jul  2 13:53:33 2024
From: mli at openjdk.org (Hamlin Li)
Date: Tue, 2 Jul 2024 13:53:33 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v3]
In-Reply-To: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
Message-ID: <xvE-5_bUzUxvsTKVf5g470H8_CDVwIIla5D8hrU3vBI=.0e946b91-eb68-4239-b99a-4f9d3a9282c6@github.com>

> Hi,
> Can you help to review the patch?
> 
> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it.
> 
> Thanks.
> 
> ## Test
> benchmarks run on CanVM-K230
> 
> I've tried several implementations, respectively with vector group
> * m2+m1+scalar
> * m2+scalar
> * m1+scalar
> * pure scalar
> The best one is combination of m2+m1, it have best performance in all source size.
> 
> this implementation (m2+m1)
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic
> -- | -- | -- | -- | -- | -- | -- | -- | --
> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063
> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443
> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364
> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856
> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146
> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423
> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862
> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469
> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631
> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698
> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876
> 
> </google-sheets-html-origin>
> 
> vector with only m2
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: ...

Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:

  refine code

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19973/files
  - new: https://git.openjdk.org/jdk/pull/19973/files/cf732984..264b354b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=01-02

  Stats: 11 lines in 2 files changed: 2 ins; 4 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/19973.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19973/head:pull/19973

PR: https://git.openjdk.org/jdk/pull/19973

From mli at openjdk.org  Tue Jul  2 13:53:34 2024
From: mli at openjdk.org (Hamlin Li)
Date: Tue, 2 Jul 2024 13:53:34 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2]
In-Reply-To: <hvqUkBLtcQL_zyScuU4YzgupWTLVXQDrgNGCtPRahQ8=.089771da-3a1b-4ff7-bf55-16b801b8ed11@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
 <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com>
 <hvqUkBLtcQL_zyScuU4YzgupWTLVXQDrgNGCtPRahQ8=.089771da-3a1b-4ff7-bf55-16b801b8ed11@github.com>
Message-ID: <J4UUOh8pLl6AFLHJk-ee4yt6NDUCpLJ6S17vBeJbkDk=.385a1bc1-c432-4a72-9f7f-467ce446481a@github.com>

On Mon, 1 Jul 2024 17:13:07 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:

>> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   use pure scalar version when rvv is not supported
>
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5230:
> 
>> 5228: 
>> 5229:     // vector version
>> 5230:     {
> 
> You should not even generate the vectorized code if `UseRVV` is false. You can then remove https://github.com/openjdk/jdk/pull/19973/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R5225-R5227 
> 
> Suggestion:
> 
>     if (UseRVV) {

good catch, thanks!

> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5263:
> 
>> 5261: 
>> 5262:     // scalar version
>> 5263:     __ BIND(ProcessScalar);
> 
> You can move that in the previous block at https://github.com/openjdk/jdk/pull/19973/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R5260 as it's the only block where it's used.

I think that block is for vector vesion only. Or maybe I misunderstood you?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1662575854
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1662577899

From luhenry at openjdk.org  Tue Jul  2 13:57:23 2024
From: luhenry at openjdk.org (Ludovic Henry)
Date: Tue, 2 Jul 2024 13:57:23 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2]
In-Reply-To: <J4UUOh8pLl6AFLHJk-ee4yt6NDUCpLJ6S17vBeJbkDk=.385a1bc1-c432-4a72-9f7f-467ce446481a@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
 <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com>
 <hvqUkBLtcQL_zyScuU4YzgupWTLVXQDrgNGCtPRahQ8=.089771da-3a1b-4ff7-bf55-16b801b8ed11@github.com>
 <J4UUOh8pLl6AFLHJk-ee4yt6NDUCpLJ6S17vBeJbkDk=.385a1bc1-c432-4a72-9f7f-467ce446481a@github.com>
Message-ID: <ZdaJ73eas0VRALPh57TZi61e7gS1V6nzsMiXZCRKOmk=.f62e2422-15a7-4f81-a86f-04262d7ae3c3@github.com>

On Tue, 2 Jul 2024 13:51:02 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5263:
>> 
>>> 5261: 
>>> 5262:     // scalar version
>>> 5263:     __ BIND(ProcessScalar);
>> 
>> You can move that in the previous block at https://github.com/openjdk/jdk/pull/19973/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R5260 as it's the only block where it's used.
>
> I think that block is for vector vesion only. Or maybe I misunderstood you?

That `ProcessScalar` Label is only ever jumped to if we're in the `UseRVV` block, so please move that `__ BIND(ProcessScalar)` to L5256, inside the `if (UseRVV) { ... }` block.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1662583734

From mli at openjdk.org  Tue Jul  2 14:11:20 2024
From: mli at openjdk.org (Hamlin Li)
Date: Tue, 2 Jul 2024 14:11:20 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2]
In-Reply-To: <ZdaJ73eas0VRALPh57TZi61e7gS1V6nzsMiXZCRKOmk=.f62e2422-15a7-4f81-a86f-04262d7ae3c3@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
 <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com>
 <hvqUkBLtcQL_zyScuU4YzgupWTLVXQDrgNGCtPRahQ8=.089771da-3a1b-4ff7-bf55-16b801b8ed11@github.com>
 <J4UUOh8pLl6AFLHJk-ee4yt6NDUCpLJ6S17vBeJbkDk=.385a1bc1-c432-4a72-9f7f-467ce446481a@github.com>
 <ZdaJ73eas0VRALPh57TZi61e7gS1V6nzsMiXZCRKOmk=.f62e2422-15a7-4f81-a86f-04262d7ae3c3@github.com>
Message-ID: <aS4fTbrCKE8KdlaRhcxzlrtcXb4-tTIOiHuVgppdz4s=.a231d455-5295-4e8c-8b54-04b9708ee981@github.com>

On Tue, 2 Jul 2024 13:54:43 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:

>> I think that block is for vector vesion only. Or maybe I misunderstood you?
>
> That `ProcessScalar` Label is only ever jumped to if we're in the `UseRVV` block, so please move that `__ BIND(ProcessScalar)` to L5256, inside the `if (UseRVV) { ... }` block.

I see, You're right!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1662607335

From mli at openjdk.org  Tue Jul  2 14:16:35 2024
From: mli at openjdk.org (Hamlin Li)
Date: Tue, 2 Jul 2024 14:16:35 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v4]
In-Reply-To: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
Message-ID: <FZMjsZWO9NKx4v5svo8qQPE5HKqvoiM-lc0oiDCah80=.2d250429-524a-4e93-a453-bf1db0238626@github.com>

> Hi,
> Can you help to review the patch?
> 
> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it.
> 
> Thanks.
> 
> ## Test
> benchmarks run on CanVM-K230
> 
> I've tried several implementations, respectively with vector group
> * m2+m1+scalar
> * m2+scalar
> * m1+scalar
> * pure scalar
> The best one is combination of m2+m1, it have best performance in all source size.
> 
> this implementation (m2+m1)
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic
> -- | -- | -- | -- | -- | -- | -- | -- | --
> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063
> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443
> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364
> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856
> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146
> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423
> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862
> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469
> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631
> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698
> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876
> 
> </google-sheets-html-origin>
> 
> vector with only m2
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: ...

Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:

  move label

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19973/files
  - new: https://git.openjdk.org/jdk/pull/19973/files/264b354b..8645a6a1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=02-03

  Stats: 6 lines in 1 file changed: 3 ins; 3 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/19973.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19973/head:pull/19973

PR: https://git.openjdk.org/jdk/pull/19973

From coleenp at openjdk.org  Tue Jul  2 14:52:19 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Tue, 2 Jul 2024 14:52:19 GMT
Subject: RFR: 8335397: Improve reliability of
 TestRecursiveMonitorChurn.java
In-Reply-To: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
Message-ID: <ckYjpnygwhRaEcnSITi5UXcp2vi4OUPriVJ0VzSDYfk=.f10b36bd-33e4-428b-8cd4-22506455701c@github.com>

On Mon, 1 Jul 2024 09:21:13 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation.
> 
> Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT.

This is an improvement.

test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java line 58:

> 56:     public static void main(String[] args) {
> 57:         if (WB.getIntVMFlag("LockingMode") == LM_MONITOR) {
> 58:             throw new SkippedException("LM_MONITOR always infaltes. Invalid test.");

typo: inflates

test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java line 85:

> 83:                     long reserved = Long.parseLong(m.group(1));
> 84:                     long committed = Long.parseLong(m.group(2));
> 85:                     System.out.println(">>>>> " + line + ": " + reserved + " - " + committed);

Oh so it just measures how much memory we use for ObjectMonitors?  yes, this doesn't seem very reliable.

-------------

Marked as reviewed by coleenp (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19965#pullrequestreview-2154070059
PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1662670355
PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1662683507

From coleenp at openjdk.org  Tue Jul  2 15:02:18 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Tue, 2 Jul 2024 15:02:18 GMT
Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450
In-Reply-To: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
Message-ID: <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com>

On Sat, 29 Jun 2024 19:58:23 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> Hi all, 
>     This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. 
> 
> 
> (gdb) ptype /ox Klass
> /* offset      |    size */  type = class Klass : public Metadata {
>                              public:
>                                static const uint KLASS_KIND_COUNT;
>                              protected:
> /* 0x000c      |  0x0004 */    jint _layout_helper;
> /* 0x0010      |  0x0004 */    const enum Klass::KlassKind _kind;
> /* 0x0014      |  0x0004 */    jint _modifier_flags;
> /* 0x0018      |  0x0004 */    juint _super_check_offset;
> /* XXX  4-byte hole      */
> /* 0x0020      |  0x0008 */    class Symbol *_name;
> /* 0x0028      |  0x0008 */    class Klass *_secondary_super_cache;
> /* 0x0030      |  0x0008 */    class Array<Klass*> *_secondary_supers;
> /* 0x0038      |  0x0040 */    class Klass *_primary_supers[8];
> /* 0x0078      |  0x0008 */    class OopHandle {
>                                  private:
> /* 0x0078      |  0x0008 */        class oop *_obj;
> 
>                                    /* total size (bytes):    8 */
>                                } _java_mirror;
> /* 0x0080      |  0x0008 */    class Klass *_super;
> /* 0x0088      |  0x0008 */    class Klass * volatile _subklass;
> /* 0x0090      |  0x0008 */    class Klass * volatile _next_sibling;
> /* 0x0098      |  0x0008 */    class Klass *_next_link;
> /* 0x00a0      |  0x0008 */    class ClassLoaderData *_class_loader_data;
> /* 0x00a8      |  0x0008 */    uintx _bitmap;
> /* 0x00b0      |  0x0001 */    uint8_t _hash_slot;
> /* XXX  3-byte hole      */
> /* 0x00b4      |  0x0004 */    int _vtable_len;
> /* 0x00b8      |  0x0004 */    class AccessFlags {
>                                  private:
> /* 0x00b8      |  0x0004 */        jint _flags;
> 
>                                    /* total size (bytes):    4 */
>                                } _access_flags;
> /* XXX  4-byte hole      */
> /* 0x00c0      |  0x0008 */    traceid _trace_id;
>                              private:
> /* 0x00c8      |  0x0002 */    s2 _shared_class_path_index;
> /* 0x00ca      |  0x0002 */    u2 _shared_class_flags;
> /* 0x00cc      |  0x0004 */    int _archived_mirror_index;
>                              public:
>                                static const int SECONDARY_SUPERS_TABLE_SIZE;
>                                static const int SECONDARY_SUPERS_TABLE_MASK;
>                                static const...

I have a couple of questions about this and a request.  Thanks.

src/hotspot/share/oops/klass.hpp line 166:

> 164:   uintx    _bitmap;
> 165: 
> 166:   static uint8_t compute_hash_slot(Symbol* s);

We don't usually put functions in with nonstatic member declarations.  Since you moved hash_slot, can you move this function to the first private section where hash_insert is?

Where you moved hash_slot looks fine.  Doesn't look like it will have negative cache effects, unless it needs to be on a cache line with _bitmap?

src/hotspot/share/oops/klass.hpp line 176:

> 174:   JFR_ONLY(DEFINE_TRACE_ID_FIELD;)
> 175:   uint8_t  _hash_slot;
> 176:   DEFINE_PAD_MINUS_SIZE(1, 4, sizeof(uint8_t)); //3 bytes padding after 1 byte _hash_slot for better layout

How does this help? Doesn't the compiler add this padding?

-------------

Changes requested by coleenp (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19958#pullrequestreview-2154109976
PR Review Comment: https://git.openjdk.org/jdk/pull/19958#discussion_r1662698438
PR Review Comment: https://git.openjdk.org/jdk/pull/19958#discussion_r1662701347

From sgehwolf at openjdk.org  Tue Jul  2 15:17:28 2024
From: sgehwolf at openjdk.org (Severin Gehwolf)
Date: Tue, 2 Jul 2024 15:17:28 GMT
Subject: RFR: 8322475: Extend printing for System.map [v6]
In-Reply-To: <-Qkoj2CJIqS0pNR-3JxXULeaty66oPIAJZgFx7IskTA=.9e679c42-24e4-4fb2-a3fd-d27be65aeac0@github.com>
References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com>
 <-Qkoj2CJIqS0pNR-3JxXULeaty66oPIAJZgFx7IskTA=.9e679c42-24e4-4fb2-a3fd-d27be65aeac0@github.com>
Message-ID: <I2AjfniaetQsEpBX3ir4NBP1Ja1de1NZDbVmmkuAPwc=.5b2bb425-659f-4cd8-a5f3-360c55020d0b@github.com>

On Thu, 20 Jun 2024 09:31:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This is an expansion on the new `System.map` command introduced with JDK-8318636.
>> 
>> We now print valuable information per memory region, such as:
>> 
>> - the actual resident set size
>> - the actual number of huge pages
>> - the actual used page size
>> - the THP state of the region (was advised, is eligible, uses THP, ...)
>> - whether the region is shared
>> - whether the region had been committed (backed by swap)
>> - whether the region has been swapped out.
>> 
>> Example output:
>> 
>> [system-map-thp1.txt](https://github.com/user-attachments/files/15587748/system-map-thp1.txt)
>> 
>> 
>> from                 to                       size        rss    hugetlb pgsz prot notes            vm info/file                                                                                                                                                                    
>> 0x00000000c0000000 - 0x00000000ffe00000 1071644672          0    4194304 2M   rw-p huge             JAVAHEAP /anon_hugepage                                                                                                                                                         
>> 0x00000000ffe00000 - 0x0000000100000000    2097152          0          0 2M   rw-p huge             JAVAHEAP /anon_hugepage                                                                                                                                                         
>> 0x0000558016b67000 - 0x0000558016b68000       4096       4096          0 4K   r--p                  /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java
>> 0x0000558016b68000 - 0x0000558016b69000       4096       4096          0 4K   r-xp                  /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java
>> 0x00007f3a749f2000 - 0x00007f3a74c62000    2555904    2555904          0 4K   rwxp                  CODE(CodeHeap 'profiled nmethods')                               
>> 0x00007f3a74c62000 - 0x00007f3a7be51000  119468032          0          0 4K   ---p nores            CODE(CodeHeap 'profiled nmethods')                               
>> 0x00007f3a7be51000 - 0x00007f3a7c1c1000    3604480    3604480          0 4K   rwxp                  CODE(CodeHeap 'profiled nmethods')                               
>> 0x00007f3a7c1c1000 - 0x00007f3a7c592000    4001792          0          0 4K   ---p nores            CODE(CodeHeap 'non-nmethods')                                    
>> 0x00007f3a7c592000 - 0x00007f3a7c802000    2555904    2...
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits:
> 
>  - feedback johan
>  - fix merge errors
>  - Merge branch 'master' into System.maps-more-info
>  - copyrights
>  - Merge branch 'master' into System.maps-more-info
>  - fix merge issue
>  - Merge branch 'master' into System.maps-more-info
>  - fix whitespace issue
>  - wip
>  - exhuming
>  - ... and 13 more: https://git.openjdk.org/jdk/compare/c6f3bf4b...940199de

This seems fine. Mostly nits.

src/hotspot/os/linux/procMapsParser.hpp line 66:

> 64:     from = to = nullptr;
> 65:     prot[0] = filename[0] = '\0';
> 66:     kernelpagesize = rss = private_hugetlb = anonhugepages = swap = 0;

`private_hugetlb` and `shared_hugetlb` missing in reset. Intentional?

src/hotspot/share/nmt/memMapPrinter.cpp line 262:

> 260:             print_thread_details_for_supposed_stack_address(vma_from, vma_to, _out);
> 261:           }
> 262:           num_printed ++;

Style: No space before `++`.

test/hotspot/jtreg/serviceability/dcmd/vm/SystemDumpMapTest.java line 31:

> 29: 
> 30: import java.io.*;
> 31: import java.lang.StringBuilder;

Nit: `java.lang.*` are imported by default. I don't see it used, so maybe a left over?

test/hotspot/jtreg/serviceability/dcmd/vm/SystemMapTestBase.java line 53:

> 51:         regexBase_committed + "\\[stack\\]",
> 52:         // we should see the hs-perf data file, and it should appear as shared as well as committed
> 53:         regexBase_shared_and_committed + "hsperfdata_.*"

Suggestion: Should the test run with `-XX:+UsePerfData` since it's expecting this file. It's default on, but that might change.

-------------

PR Review: https://git.openjdk.org/jdk/pull/17158#pullrequestreview-2154058332
PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1662723988
PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1662661856
PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1662705054
PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1662700550

From eosterlund at openjdk.org  Tue Jul  2 15:48:45 2024
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Tue, 2 Jul 2024 15:48:45 GMT
Subject: RFR: 8334890: Missing unconditional cross modifying fence in nmethod
 entry barriers
Message-ID: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com>

On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code.

We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher.

However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification.

There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully.

This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly.

-------------

Commit messages:
 - 8334890: Missing unconditional cross modifying fence in nmethod entry barriers

Changes: https://git.openjdk.org/jdk/pull/19990/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19990&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8334890
  Stats: 21 lines in 1 file changed: 1 ins; 17 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/19990.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19990/head:pull/19990

PR: https://git.openjdk.org/jdk/pull/19990

From xpeng at openjdk.org  Tue Jul  2 16:53:22 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Tue, 2 Jul 2024 16:53:22 GMT
Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450
In-Reply-To: <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com>
References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
 <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com>
Message-ID: <19BMqaE29h4XHAYBm2QkjfdUQpInAxgrdWFHnP_JcmA=.505b9844-9656-4c14-83d5-fa9acf6efa2d@github.com>

On Tue, 2 Jul 2024 14:58:34 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Hi all, 
>>     This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. 
>> 
>> 
>> (gdb) ptype /ox Klass
>> /* offset      |    size */  type = class Klass : public Metadata {
>>                              public:
>>                                static const uint KLASS_KIND_COUNT;
>>                              protected:
>> /* 0x000c      |  0x0004 */    jint _layout_helper;
>> /* 0x0010      |  0x0004 */    const enum Klass::KlassKind _kind;
>> /* 0x0014      |  0x0004 */    jint _modifier_flags;
>> /* 0x0018      |  0x0004 */    juint _super_check_offset;
>> /* XXX  4-byte hole      */
>> /* 0x0020      |  0x0008 */    class Symbol *_name;
>> /* 0x0028      |  0x0008 */    class Klass *_secondary_super_cache;
>> /* 0x0030      |  0x0008 */    class Array<Klass*> *_secondary_supers;
>> /* 0x0038      |  0x0040 */    class Klass *_primary_supers[8];
>> /* 0x0078      |  0x0008 */    class OopHandle {
>>                                  private:
>> /* 0x0078      |  0x0008 */        class oop *_obj;
>> 
>>                                    /* total size (bytes):    8 */
>>                                } _java_mirror;
>> /* 0x0080      |  0x0008 */    class Klass *_super;
>> /* 0x0088      |  0x0008 */    class Klass * volatile _subklass;
>> /* 0x0090      |  0x0008 */    class Klass * volatile _next_sibling;
>> /* 0x0098      |  0x0008 */    class Klass *_next_link;
>> /* 0x00a0      |  0x0008 */    class ClassLoaderData *_class_loader_data;
>> /* 0x00a8      |  0x0008 */    uintx _bitmap;
>> /* 0x00b0      |  0x0001 */    uint8_t _hash_slot;
>> /* XXX  3-byte hole      */
>> /* 0x00b4      |  0x0004 */    int _vtable_len;
>> /* 0x00b8      |  0x0004 */    class AccessFlags {
>>                                  private:
>> /* 0x00b8      |  0x0004 */        jint _flags;
>> 
>>                                    /* total size (bytes):    4 */
>>                                } _access_flags;
>> /* XXX  4-byte hole      */
>> /* 0x00c0      |  0x0008 */    traceid _trace_id;
>>                              private:
>> /* 0x00c8      |  0x0002 */    s2 _shared_class_path_index;
>> /* 0x00ca      |  0x0002 */    u2 _shared_class_flags;
>> /* 0x00cc      |  0x0004 */    int _archived_mirror_index;
>>                              public:
>>                                static const int SECONDARY_SUPERS_TABLE_SIZE;
>>                         ...
>
> src/hotspot/share/oops/klass.hpp line 176:
> 
>> 174:   JFR_ONLY(DEFINE_TRACE_ID_FIELD;)
>> 175:   uint8_t  _hash_slot;
>> 176:   DEFINE_PAD_MINUS_SIZE(1, 4, sizeof(uint8_t)); //3 bytes padding after 1 byte _hash_slot for better layout
> 
> How does this help? Doesn't the compiler add this padding?

Compiler doesn't seem to add padding, there will be two smaller holes after _hash_slot, here is what I got from gdb:

/* 0x00b8      |  0x0001 */    uint8_t _hash_slot;
                             private:
/* XXX  1-byte hole      */
/* 0x00ba      |  0x0002 */    s2 _shared_class_path_index;
/* 0x00bc      |  0x0002 */    u2 _shared_class_flags;
/* XXX  2-byte hole      */
/* 0x00c0      |  0x0004 */    int _archived_mirror_index;


I think we can remove it to avoid confusion, I doesn't really change the cache line, here is output from ```pahole``` with cacheline boundaries:
With padding:

protected:

	jint                       _layout_helper;       /*     8     4 */
	const enum KlassKind       _kind;                /*    12     4 */
	jint                       _modifier_flags;      /*    16     4 */
	juint                      _super_check_offset;  /*    20     4 */
	class Symbol *             _name;                /*    24     8 */
	class Klass *              _secondary_super_cache; /*    32     8 */
	class Array<Klass*> *      _secondary_supers;    /*    40     8 */
	class Klass *              _primary_supers[8];   /*    48    64 */
	/* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */
	class OopHandle           _java_mirror;          /*   112     8 */
	class Klass *              _super;               /*   120     8 */
	/* --- cacheline 2 boundary (128 bytes) --- */
	volatile class Klass *     _subklass;            /*   128     8 */
	volatile class Klass *     _next_sibling;        /*   136     8 */
	class Klass *              _next_link;           /*   144     8 */
	class ClassLoaderData *    _class_loader_data;   /*   152     8 */
	uintx                      _bitmap;              /*   160     8 */
	int                        _vtable_len;          /*   168     4 */
	class AccessFlags         _access_flags;         /*   172     4 */
	traceid                    _trace_id;            /*   176     8 */
	uint8_t                    _hash_slot;           /*   184     1 */
	char                       _pad_buf1[3];         /*   185     3 */
	s2                         _shared_class_path_index; /*   188     2 */
	u2                         _shared_class_flags;  /*   190     2 */
	/* --- cacheline 3 boundary (192 bytes) --- */
	int                        _archived_mirror_index; /*   192     4 */


w/o padding:

protected:

	jint                       _layout_helper;       /*     8     4 */
	const enum KlassKind       _kind;                /*    12     4 */
	jint                       _modifier_flags;      /*    16     4 */
	juint                      _super_check_offset;  /*    20     4 */
	class Symbol *             _name;                /*    24     8 */
	class Klass *              _secondary_super_cache; /*    32     8 */
	class Array<Klass*> *      _secondary_supers;    /*    40     8 */
	class Klass *              _primary_supers[8];   /*    48    64 */
	/* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */
	class OopHandle           _java_mirror;          /*   112     8 */
	class Klass *              _super;               /*   120     8 */
	/* --- cacheline 2 boundary (128 bytes) --- */
	volatile class Klass *     _subklass;            /*   128     8 */
	volatile class Klass *     _next_sibling;        /*   136     8 */
	class Klass *              _next_link;           /*   144     8 */
	class ClassLoaderData *    _class_loader_data;   /*   152     8 */
	uintx                      _bitmap;              /*   160     8 */
	int                        _vtable_len;          /*   168     4 */
	class AccessFlags         _access_flags;         /*   172     4 */
	traceid                    _trace_id;            /*   176     8 */
	uint8_t                    _hash_slot;           /*   184     1 */

	/* XXX 1 byte hole, try to pack */

	s2                         _shared_class_path_index; /*   186     2 */
	u2                         _shared_class_flags;  /*   188     2 */

	/* XXX 2 bytes hole, try to pack */

	/* --- cacheline 3 boundary (192 bytes) --- */
	int                        _archived_mirror_index; /*   192     4 */

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19958#discussion_r1662866957

From xpeng at openjdk.org  Tue Jul  2 17:07:19 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Tue, 2 Jul 2024 17:07:19 GMT
Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450
In-Reply-To: <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com>
References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
 <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com>
Message-ID: <IUdBaJ9KRr0pUDrYzKzQUDRqn1IxiLeJKuFqPEIuJKY=.1de59e3e-4e95-44ae-ba8c-9792a2e71632@github.com>

On Tue, 2 Jul 2024 14:56:56 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Hi all, 
>>     This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. 
>> 
>> 
>> (gdb) ptype /ox Klass
>> /* offset      |    size */  type = class Klass : public Metadata {
>>                              public:
>>                                static const uint KLASS_KIND_COUNT;
>>                              protected:
>> /* 0x000c      |  0x0004 */    jint _layout_helper;
>> /* 0x0010      |  0x0004 */    const enum Klass::KlassKind _kind;
>> /* 0x0014      |  0x0004 */    jint _modifier_flags;
>> /* 0x0018      |  0x0004 */    juint _super_check_offset;
>> /* XXX  4-byte hole      */
>> /* 0x0020      |  0x0008 */    class Symbol *_name;
>> /* 0x0028      |  0x0008 */    class Klass *_secondary_super_cache;
>> /* 0x0030      |  0x0008 */    class Array<Klass*> *_secondary_supers;
>> /* 0x0038      |  0x0040 */    class Klass *_primary_supers[8];
>> /* 0x0078      |  0x0008 */    class OopHandle {
>>                                  private:
>> /* 0x0078      |  0x0008 */        class oop *_obj;
>> 
>>                                    /* total size (bytes):    8 */
>>                                } _java_mirror;
>> /* 0x0080      |  0x0008 */    class Klass *_super;
>> /* 0x0088      |  0x0008 */    class Klass * volatile _subklass;
>> /* 0x0090      |  0x0008 */    class Klass * volatile _next_sibling;
>> /* 0x0098      |  0x0008 */    class Klass *_next_link;
>> /* 0x00a0      |  0x0008 */    class ClassLoaderData *_class_loader_data;
>> /* 0x00a8      |  0x0008 */    uintx _bitmap;
>> /* 0x00b0      |  0x0001 */    uint8_t _hash_slot;
>> /* XXX  3-byte hole      */
>> /* 0x00b4      |  0x0004 */    int _vtable_len;
>> /* 0x00b8      |  0x0004 */    class AccessFlags {
>>                                  private:
>> /* 0x00b8      |  0x0004 */        jint _flags;
>> 
>>                                    /* total size (bytes):    4 */
>>                                } _access_flags;
>> /* XXX  4-byte hole      */
>> /* 0x00c0      |  0x0008 */    traceid _trace_id;
>>                              private:
>> /* 0x00c8      |  0x0002 */    s2 _shared_class_path_index;
>> /* 0x00ca      |  0x0002 */    u2 _shared_class_flags;
>> /* 0x00cc      |  0x0004 */    int _archived_mirror_index;
>>                              public:
>>                                static const int SECONDARY_SUPERS_TABLE_SIZE;
>>                         ...
>
> src/hotspot/share/oops/klass.hpp line 166:
> 
>> 164:   uintx    _bitmap;
>> 165: 
>> 166:   static uint8_t compute_hash_slot(Symbol* s);
> 
> We don't usually put functions in with nonstatic member declarations.  Since you moved hash_slot, can you move this function to the first private section where hash_insert is?
> 
> Where you moved hash_slot looks fine.  Doesn't look like it will have negative cache effects, unless it needs to be on a cache line with _bitmap?

Thanks Coleen! Good catch! 
I have checked the cacheline boundaries  with pahole, both are in same cache line after moving, that shouldn't be a concern.
But yes, they are related fields and should be stay together. Also checked the layer after moving both  _bitmap and _hash_slot, it looks good, I'll update the PR:

protected:

	jint                       _layout_helper;       /*     8     4 */
	const enum KlassKind       _kind;                /*    12     4 */
	jint                       _modifier_flags;      /*    16     4 */
	juint                      _super_check_offset;  /*    20     4 */
	class Symbol *             _name;                /*    24     8 */
	class Klass *              _secondary_super_cache; /*    32     8 */
	class Array<Klass*> *      _secondary_supers;    /*    40     8 */
	class Klass *              _primary_supers[8];   /*    48    64 */
	/* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */
	class OopHandle           _java_mirror;          /*   112     8 */
	class Klass *              _super;               /*   120     8 */
	/* --- cacheline 2 boundary (128 bytes) --- */
	volatile class Klass *     _subklass;            /*   128     8 */
	volatile class Klass *     _next_sibling;        /*   136     8 */
	class Klass *              _next_link;           /*   144     8 */
	class ClassLoaderData *    _class_loader_data;   /*   152     8 */
	int                        _vtable_len;          /*   160     4 */
	class AccessFlags         _access_flags;         /*   164     4 */
	traceid                    _trace_id;            /*   168     8 */
	uintx                      _bitmap;              /*   176     8 */
	uint8_t                    _hash_slot;           /*   184     1 */

	/* XXX 1 byte hole, try to pack */

	s2                         _shared_class_path_index; /*   186     2 */
	u2                         _shared_class_flags;  /*   188     2 */

	/* XXX 2 bytes hole, try to pack */

	/* --- cacheline 3 boundary (192 bytes) --- */
	int                        _archived_mirror_index; /*   192     4 */

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19958#discussion_r1662884573

From xpeng at openjdk.org  Tue Jul  2 17:25:53 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Tue, 2 Jul 2024 17:25:53 GMT
Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2]
In-Reply-To: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
Message-ID: <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com>

> Hi all, 
>     This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. 
> 
> 
> (gdb) ptype /ox Klass
> /* offset      |    size */  type = class Klass : public Metadata {
>                              public:
>                                static const uint KLASS_KIND_COUNT;
>                              protected:
> /* 0x000c      |  0x0004 */    jint _layout_helper;
> /* 0x0010      |  0x0004 */    const enum Klass::KlassKind _kind;
> /* 0x0014      |  0x0004 */    jint _modifier_flags;
> /* 0x0018      |  0x0004 */    juint _super_check_offset;
> /* XXX  4-byte hole      */
> /* 0x0020      |  0x0008 */    class Symbol *_name;
> /* 0x0028      |  0x0008 */    class Klass *_secondary_super_cache;
> /* 0x0030      |  0x0008 */    class Array<Klass*> *_secondary_supers;
> /* 0x0038      |  0x0040 */    class Klass *_primary_supers[8];
> /* 0x0078      |  0x0008 */    class OopHandle {
>                                  private:
> /* 0x0078      |  0x0008 */        class oop *_obj;
> 
>                                    /* total size (bytes):    8 */
>                                } _java_mirror;
> /* 0x0080      |  0x0008 */    class Klass *_super;
> /* 0x0088      |  0x0008 */    class Klass * volatile _subklass;
> /* 0x0090      |  0x0008 */    class Klass * volatile _next_sibling;
> /* 0x0098      |  0x0008 */    class Klass *_next_link;
> /* 0x00a0      |  0x0008 */    class ClassLoaderData *_class_loader_data;
> /* 0x00a8      |  0x0008 */    uintx _bitmap;
> /* 0x00b0      |  0x0001 */    uint8_t _hash_slot;
> /* XXX  3-byte hole      */
> /* 0x00b4      |  0x0004 */    int _vtable_len;
> /* 0x00b8      |  0x0004 */    class AccessFlags {
>                                  private:
> /* 0x00b8      |  0x0004 */        jint _flags;
> 
>                                    /* total size (bytes):    4 */
>                                } _access_flags;
> /* XXX  4-byte hole      */
> /* 0x00c0      |  0x0008 */    traceid _trace_id;
>                              private:
> /* 0x00c8      |  0x0002 */    s2 _shared_class_path_index;
> /* 0x00ca      |  0x0002 */    u2 _shared_class_flags;
> /* 0x00cc      |  0x0004 */    int _archived_mirror_index;
>                              public:
>                                static const int SECONDARY_SUPERS_TABLE_SIZE;
>                                static const int SECONDARY_SUPERS_TABLE_MASK;
>                                static const...

Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:

  Move both _bitmap and _hash_slot together

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19958/files
  - new: https://git.openjdk.org/jdk/pull/19958/files/00fbff70..ce1560c1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19958&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19958&range=00-01

  Stats: 10 lines in 1 file changed: 4 ins; 6 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/19958.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19958/head:pull/19958

PR: https://git.openjdk.org/jdk/pull/19958

From xpeng at openjdk.org  Tue Jul  2 17:25:53 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Tue, 2 Jul 2024 17:25:53 GMT
Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2]
In-Reply-To: <IUdBaJ9KRr0pUDrYzKzQUDRqn1IxiLeJKuFqPEIuJKY=.1de59e3e-4e95-44ae-ba8c-9792a2e71632@github.com>
References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
 <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com>
 <IUdBaJ9KRr0pUDrYzKzQUDRqn1IxiLeJKuFqPEIuJKY=.1de59e3e-4e95-44ae-ba8c-9792a2e71632@github.com>
Message-ID: <WbcFyXAITuG0372YiVLYfb5qaln_c89apoobqzP04Co=.1aa6e5f1-2022-4b3e-81c7-4332982b3e3b@github.com>

On Tue, 2 Jul 2024 17:04:40 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> src/hotspot/share/oops/klass.hpp line 166:
>> 
>>> 164:   uintx    _bitmap;
>>> 165: 
>>> 166:   static uint8_t compute_hash_slot(Symbol* s);
>> 
>> We don't usually put functions in with nonstatic member declarations.  Since you moved hash_slot, can you move this function to the first private section where hash_insert is?
>> 
>> Where you moved hash_slot looks fine.  Doesn't look like it will have negative cache effects, unless it needs to be on a cache line with _bitmap?
>
> Thanks Coleen! Good catch! 
> I have checked the cacheline boundaries  with pahole, both are in same cache line after moving, that shouldn't be a concern.
> But yes, they are related fields and should be stay together. Also checked the layer after moving both  _bitmap and _hash_slot, it looks good, I'll update the PR:
> 
> protected:
> 
> 	jint                       _layout_helper;       /*     8     4 */
> 	const enum KlassKind       _kind;                /*    12     4 */
> 	jint                       _modifier_flags;      /*    16     4 */
> 	juint                      _super_check_offset;  /*    20     4 */
> 	class Symbol *             _name;                /*    24     8 */
> 	class Klass *              _secondary_super_cache; /*    32     8 */
> 	class Array<Klass*> *      _secondary_supers;    /*    40     8 */
> 	class Klass *              _primary_supers[8];   /*    48    64 */
> 	/* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */
> 	class OopHandle           _java_mirror;          /*   112     8 */
> 	class Klass *              _super;               /*   120     8 */
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	volatile class Klass *     _subklass;            /*   128     8 */
> 	volatile class Klass *     _next_sibling;        /*   136     8 */
> 	class Klass *              _next_link;           /*   144     8 */
> 	class ClassLoaderData *    _class_loader_data;   /*   152     8 */
> 	int                        _vtable_len;          /*   160     4 */
> 	class AccessFlags         _access_flags;         /*   164     4 */
> 	traceid                    _trace_id;            /*   168     8 */
> 	uintx                      _bitmap;              /*   176     8 */
> 	uint8_t                    _hash_slot;           /*   184     1 */
> 
> 	/* XXX 1 byte hole, try to pack */
> 
> 	s2                         _shared_class_path_index; /*   186     2 */
> 	u2                         _shared_class_flags;  /*   188     2 */
> 
> 	/* XXX 2 bytes hole, try to pack */
> 
> 	/* --- cacheline 3 boundary (192 bytes) --- */
> 	int                        _archived_mirror_index; /*   192     4 */

I have updated PR to reflect what we have discussed, thanks!

>> src/hotspot/share/oops/klass.hpp line 176:
>> 
>>> 174:   JFR_ONLY(DEFINE_TRACE_ID_FIELD;)
>>> 175:   uint8_t  _hash_slot;
>>> 176:   DEFINE_PAD_MINUS_SIZE(1, 4, sizeof(uint8_t)); //3 bytes padding after 1 byte _hash_slot for better layout
>> 
>> How does this help? Doesn't the compiler add this padding?
>
> Compiler doesn't seem to add padding, there will be two smaller holes after _hash_slot, here is what I got from gdb:
> 
> /* 0x00b8      |  0x0001 */    uint8_t _hash_slot;
>                              private:
> /* XXX  1-byte hole      */
> /* 0x00ba      |  0x0002 */    s2 _shared_class_path_index;
> /* 0x00bc      |  0x0002 */    u2 _shared_class_flags;
> /* XXX  2-byte hole      */
> /* 0x00c0      |  0x0004 */    int _archived_mirror_index;
> 
> 
> I think we can remove it to avoid confusion, I doesn't really change the cache line, here is output from ```pahole``` with cacheline boundaries:
> With padding:
> 
> protected:
> 
> 	jint                       _layout_helper;       /*     8     4 */
> 	const enum KlassKind       _kind;                /*    12     4 */
> 	jint                       _modifier_flags;      /*    16     4 */
> 	juint                      _super_check_offset;  /*    20     4 */
> 	class Symbol *             _name;                /*    24     8 */
> 	class Klass *              _secondary_super_cache; /*    32     8 */
> 	class Array<Klass*> *      _secondary_supers;    /*    40     8 */
> 	class Klass *              _primary_supers[8];   /*    48    64 */
> 	/* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */
> 	class OopHandle           _java_mirror;          /*   112     8 */
> 	class Klass *              _super;               /*   120     8 */
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	volatile class Klass *     _subklass;            /*   128     8 */
> 	volatile class Klass *     _next_sibling;        /*   136     8 */
> 	class Klass *              _next_link;           /*   144     8 */
> 	class ClassLoaderData *    _class_loader_data;   /*   152     8 */
> 	uintx                      _bitmap;              /*   160     8 */
> 	int                        _vtable_len;          /*   168     4 */
> 	class AccessFlags         _access_flags;         /*   172     4 */
> 	traceid                    _trace_id;            /*   176     8 */
> 	uint8_t                    _hash_slot;           /*   184     1 */
> 	char                       _pad_buf1[3];         /*   185     3 */
> 	s2                         _shared_class_path_index; /*   188     2 */
> 	u2                         _shared_class_flags;  /*   190     2 */
> 	/* --- cacheline 3 boundary (192 bytes) --- */
> 	int                        _archived_mirror_index; /*   192     4 */
> 
> 
> w/o padding:
> 
> protected:
> 
> 	jint                       _layout_helper;       /*     8     4 */
> 	const enum KlassKind       _kind;                /*    12   ...

I have removed the padding from code, thanks.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19958#discussion_r1662905230
PR Review Comment: https://git.openjdk.org/jdk/pull/19958#discussion_r1662907038

From stuefe at openjdk.org  Tue Jul  2 17:57:20 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 2 Jul 2024 17:57:20 GMT
Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2]
In-Reply-To: <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com>
References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
 <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com>
Message-ID: <_kYyW9f7DYRGXmfBruv4r5ywF8jzc4wu3qRHRljEuwQ=.6a594a8b-4f9a-431b-aeea-9bcc639c61ac@github.com>

On Tue, 2 Jul 2024 17:25:53 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> Hi all, 
>>     This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. 
>> 
>> 
>> (gdb) ptype /ox Klass
>> /* offset      |    size */  type = class Klass : public Metadata {
>>                              public:
>>                                static const uint KLASS_KIND_COUNT;
>>                              protected:
>> /* 0x000c      |  0x0004 */    jint _layout_helper;
>> /* 0x0010      |  0x0004 */    const enum Klass::KlassKind _kind;
>> /* 0x0014      |  0x0004 */    jint _modifier_flags;
>> /* 0x0018      |  0x0004 */    juint _super_check_offset;
>> /* XXX  4-byte hole      */
>> /* 0x0020      |  0x0008 */    class Symbol *_name;
>> /* 0x0028      |  0x0008 */    class Klass *_secondary_super_cache;
>> /* 0x0030      |  0x0008 */    class Array<Klass*> *_secondary_supers;
>> /* 0x0038      |  0x0040 */    class Klass *_primary_supers[8];
>> /* 0x0078      |  0x0008 */    class OopHandle {
>>                                  private:
>> /* 0x0078      |  0x0008 */        class oop *_obj;
>> 
>>                                    /* total size (bytes):    8 */
>>                                } _java_mirror;
>> /* 0x0080      |  0x0008 */    class Klass *_super;
>> /* 0x0088      |  0x0008 */    class Klass * volatile _subklass;
>> /* 0x0090      |  0x0008 */    class Klass * volatile _next_sibling;
>> /* 0x0098      |  0x0008 */    class Klass *_next_link;
>> /* 0x00a0      |  0x0008 */    class ClassLoaderData *_class_loader_data;
>> /* 0x00a8      |  0x0008 */    uintx _bitmap;
>> /* 0x00b0      |  0x0001 */    uint8_t _hash_slot;
>> /* XXX  3-byte hole      */
>> /* 0x00b4      |  0x0004 */    int _vtable_len;
>> /* 0x00b8      |  0x0004 */    class AccessFlags {
>>                                  private:
>> /* 0x00b8      |  0x0004 */        jint _flags;
>> 
>>                                    /* total size (bytes):    4 */
>>                                } _access_flags;
>> /* XXX  4-byte hole      */
>> /* 0x00c0      |  0x0008 */    traceid _trace_id;
>>                              private:
>> /* 0x00c8      |  0x0002 */    s2 _shared_class_path_index;
>> /* 0x00ca      |  0x0002 */    u2 _shared_class_flags;
>> /* 0x00cc      |  0x0004 */    int _archived_mirror_index;
>>                              public:
>>                                static const int SECONDARY_SUPERS_TABLE_SIZE;
>>                         ...
>
> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Move both _bitmap and _hash_slot together

Ok.

> I have checked the cacheline boundaries with pahole, both are in same cache line after moving, that shouldn't be a concern.

Note that in current class space, Klass does usually not start at a cache line boundary.

-------------

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19958#pullrequestreview-2154515609

From aboldtch at openjdk.org  Tue Jul  2 19:18:47 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Tue, 2 Jul 2024 19:18:47 GMT
Subject: RFR: 8335397: Improve reliability of
 TestRecursiveMonitorChurn.java [v2]
In-Reply-To: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
Message-ID: <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com>

> TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation.
> 
> Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT.

Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:

  Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19965/files
  - new: https://git.openjdk.org/jdk/pull/19965/files/81fd7c07..44279523

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19965&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19965&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/19965.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19965/head:pull/19965

PR: https://git.openjdk.org/jdk/pull/19965

From aboldtch at openjdk.org  Tue Jul  2 19:18:47 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Tue, 2 Jul 2024 19:18:47 GMT
Subject: RFR: 8335397: Improve reliability of
 TestRecursiveMonitorChurn.java [v2]
In-Reply-To: <ckYjpnygwhRaEcnSITi5UXcp2vi4OUPriVJ0VzSDYfk=.f10b36bd-33e4-428b-8cd4-22506455701c@github.com>
References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
 <ckYjpnygwhRaEcnSITi5UXcp2vi4OUPriVJ0VzSDYfk=.f10b36bd-33e4-428b-8cd4-22506455701c@github.com>
Message-ID: <_kIYNEJtKJMbKPqcKTEaBqhgM2BFTfJygHgLtOSm6FQ=.e0f441ae-48c7-4399-a3a9-b9e61cdb7851@github.com>

On Tue, 2 Jul 2024 14:42:44 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java
>
> test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java line 58:
> 
>> 56:     public static void main(String[] args) {
>> 57:         if (WB.getIntVMFlag("LockingMode") == LM_MONITOR) {
>> 58:             throw new SkippedException("LM_MONITOR always infaltes. Invalid test.");
> 
> typo: inflates

Suggestion:

            throw new SkippedException("LM_MONITOR always inflates. Invalid test.");

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1663037132

From coleenp at openjdk.org  Tue Jul  2 19:38:19 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Tue, 2 Jul 2024 19:38:19 GMT
Subject: RFR: 8335397: Improve reliability of
 TestRecursiveMonitorChurn.java [v2]
In-Reply-To: <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com>
References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
 <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com>
Message-ID: <n4hjYeBK9hcom7ZZ22E4dcLnI0-lJKYBC9JgPHukoM0=.e2bc985f-9b34-4bf4-a9a1-2417cae95258@github.com>

On Tue, 2 Jul 2024 19:18:47 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation.
>> 
>> Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT.
>
> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java

Marked as reviewed by coleenp (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/19965#pullrequestreview-2154700442

From xpeng at openjdk.org  Tue Jul  2 20:01:18 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Tue, 2 Jul 2024 20:01:18 GMT
Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2]
In-Reply-To: <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com>
References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
 <UAwVP6aRgZKWnjeI3CktNd1atCRK0JCuYFGQU7PsZ_w=.b4f21ee3-000e-441d-b215-804d96206865@github.com>
Message-ID: <g7-ZlhxQqYHzjnC-ow9H1yNFQjJUJ_lAh7_dVPCAsqQ=.288fa0ce-060d-480f-9476-d3f0dd0a9a96@github.com>

On Tue, 2 Jul 2024 14:59:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> I have a couple of questions about this and a request. Thanks.

Thanks so much for the suggestions! Do you have other concerns on the new revision?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19958#issuecomment-2204279047

From xpeng at openjdk.org  Tue Jul  2 20:07:19 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Tue, 2 Jul 2024 20:07:19 GMT
Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2]
In-Reply-To: <_kYyW9f7DYRGXmfBruv4r5ywF8jzc4wu3qRHRljEuwQ=.6a594a8b-4f9a-431b-aeea-9bcc639c61ac@github.com>
References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
 <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com>
 <_kYyW9f7DYRGXmfBruv4r5ywF8jzc4wu3qRHRljEuwQ=.6a594a8b-4f9a-431b-aeea-9bcc639c61ac@github.com>
Message-ID: <hNRmjIpcI6y_332uNIdN3t6RGbTyDCGxzq4abLJkvDM=.6051c3ad-bfb7-43be-b65a-d0972419beb2@github.com>

On Tue, 2 Jul 2024 17:55:03 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> Ok.
> 
> > I have checked the cacheline boundaries with pahole, both are in same cache line after moving, that shouldn't be a concern.
> 
> Note that in current class space, Klass does usually not start at a cache line boundary.

Thank you Thomas for the review and reminding! The whole point is to compact the layout of Klass, which is instantiated often at runtime, more compact will benefit the footprint of cachelines, although the improvement won't be significant.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19958#issuecomment-2204288859

From coleenp at openjdk.org  Tue Jul  2 20:17:18 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Tue, 2 Jul 2024 20:17:18 GMT
Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2]
In-Reply-To: <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com>
References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
 <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com>
Message-ID: <nthulxS9SuMBePGssqh-woC7odTQaU-mxNSg_ceycEU=.2c456585-b528-4131-b68a-fa4c049304ed@github.com>

On Tue, 2 Jul 2024 17:25:53 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> Hi all, 
>>     This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. 
>> 
>> 
>> (gdb) ptype /ox Klass
>> /* offset      |    size */  type = class Klass : public Metadata {
>>                              public:
>>                                static const uint KLASS_KIND_COUNT;
>>                              protected:
>> /* 0x000c      |  0x0004 */    jint _layout_helper;
>> /* 0x0010      |  0x0004 */    const enum Klass::KlassKind _kind;
>> /* 0x0014      |  0x0004 */    jint _modifier_flags;
>> /* 0x0018      |  0x0004 */    juint _super_check_offset;
>> /* XXX  4-byte hole      */
>> /* 0x0020      |  0x0008 */    class Symbol *_name;
>> /* 0x0028      |  0x0008 */    class Klass *_secondary_super_cache;
>> /* 0x0030      |  0x0008 */    class Array<Klass*> *_secondary_supers;
>> /* 0x0038      |  0x0040 */    class Klass *_primary_supers[8];
>> /* 0x0078      |  0x0008 */    class OopHandle {
>>                                  private:
>> /* 0x0078      |  0x0008 */        class oop *_obj;
>> 
>>                                    /* total size (bytes):    8 */
>>                                } _java_mirror;
>> /* 0x0080      |  0x0008 */    class Klass *_super;
>> /* 0x0088      |  0x0008 */    class Klass * volatile _subklass;
>> /* 0x0090      |  0x0008 */    class Klass * volatile _next_sibling;
>> /* 0x0098      |  0x0008 */    class Klass *_next_link;
>> /* 0x00a0      |  0x0008 */    class ClassLoaderData *_class_loader_data;
>> /* 0x00a8      |  0x0008 */    uintx _bitmap;
>> /* 0x00b0      |  0x0001 */    uint8_t _hash_slot;
>> /* XXX  3-byte hole      */
>> /* 0x00b4      |  0x0004 */    int _vtable_len;
>> /* 0x00b8      |  0x0004 */    class AccessFlags {
>>                                  private:
>> /* 0x00b8      |  0x0004 */        jint _flags;
>> 
>>                                    /* total size (bytes):    4 */
>>                                } _access_flags;
>> /* XXX  4-byte hole      */
>> /* 0x00c0      |  0x0008 */    traceid _trace_id;
>>                              private:
>> /* 0x00c8      |  0x0002 */    s2 _shared_class_path_index;
>> /* 0x00ca      |  0x0002 */    u2 _shared_class_flags;
>> /* 0x00cc      |  0x0004 */    int _archived_mirror_index;
>>                              public:
>>                                static const int SECONDARY_SUPERS_TABLE_SIZE;
>>                         ...
>
> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Move both _bitmap and _hash_slot together

Yes, this looks good.

-------------

Marked as reviewed by coleenp (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19958#pullrequestreview-2154783065

From xpeng at openjdk.org  Tue Jul  2 20:30:20 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Tue, 2 Jul 2024 20:30:20 GMT
Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2]
In-Reply-To: <nthulxS9SuMBePGssqh-woC7odTQaU-mxNSg_ceycEU=.2c456585-b528-4131-b68a-fa4c049304ed@github.com>
References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
 <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com>
 <nthulxS9SuMBePGssqh-woC7odTQaU-mxNSg_ceycEU=.2c456585-b528-4131-b68a-fa4c049304ed@github.com>
Message-ID: <SeA81tLNOKSwArcddDfEu629HcJ5akwsPDAytrIWkHI=.31268549-d6c0-4615-931e-af96135b2513@github.com>

On Tue, 2 Jul 2024 20:14:25 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> Yes, this looks good.

Thank you, appreciate it!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19958#issuecomment-2204339711

From duke at openjdk.org  Tue Jul  2 20:30:21 2024
From: duke at openjdk.org (duke)
Date: Tue, 2 Jul 2024 20:30:21 GMT
Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2]
In-Reply-To: <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com>
References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
 <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com>
Message-ID: <-3uGfF3w8n4Tb0_C15jtfdLDD1DB6Z0rF_pu_zA2g4I=.fd01fa84-607f-4705-86f1-e24f91df3378@github.com>

On Tue, 2 Jul 2024 17:25:53 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> Hi all, 
>>     This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. 
>> 
>> 
>> (gdb) ptype /ox Klass
>> /* offset      |    size */  type = class Klass : public Metadata {
>>                              public:
>>                                static const uint KLASS_KIND_COUNT;
>>                              protected:
>> /* 0x000c      |  0x0004 */    jint _layout_helper;
>> /* 0x0010      |  0x0004 */    const enum Klass::KlassKind _kind;
>> /* 0x0014      |  0x0004 */    jint _modifier_flags;
>> /* 0x0018      |  0x0004 */    juint _super_check_offset;
>> /* XXX  4-byte hole      */
>> /* 0x0020      |  0x0008 */    class Symbol *_name;
>> /* 0x0028      |  0x0008 */    class Klass *_secondary_super_cache;
>> /* 0x0030      |  0x0008 */    class Array<Klass*> *_secondary_supers;
>> /* 0x0038      |  0x0040 */    class Klass *_primary_supers[8];
>> /* 0x0078      |  0x0008 */    class OopHandle {
>>                                  private:
>> /* 0x0078      |  0x0008 */        class oop *_obj;
>> 
>>                                    /* total size (bytes):    8 */
>>                                } _java_mirror;
>> /* 0x0080      |  0x0008 */    class Klass *_super;
>> /* 0x0088      |  0x0008 */    class Klass * volatile _subklass;
>> /* 0x0090      |  0x0008 */    class Klass * volatile _next_sibling;
>> /* 0x0098      |  0x0008 */    class Klass *_next_link;
>> /* 0x00a0      |  0x0008 */    class ClassLoaderData *_class_loader_data;
>> /* 0x00a8      |  0x0008 */    uintx _bitmap;
>> /* 0x00b0      |  0x0001 */    uint8_t _hash_slot;
>> /* XXX  3-byte hole      */
>> /* 0x00b4      |  0x0004 */    int _vtable_len;
>> /* 0x00b8      |  0x0004 */    class AccessFlags {
>>                                  private:
>> /* 0x00b8      |  0x0004 */        jint _flags;
>> 
>>                                    /* total size (bytes):    4 */
>>                                } _access_flags;
>> /* XXX  4-byte hole      */
>> /* 0x00c0      |  0x0008 */    traceid _trace_id;
>>                              private:
>> /* 0x00c8      |  0x0002 */    s2 _shared_class_path_index;
>> /* 0x00ca      |  0x0002 */    u2 _shared_class_flags;
>> /* 0x00cc      |  0x0004 */    int _archived_mirror_index;
>>                              public:
>>                                static const int SECONDARY_SUPERS_TABLE_SIZE;
>>                         ...
>
> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Move both _bitmap and _hash_slot together

@pengxiaolong 
Your change (at version ce1560c1fcdca237d594cdbd9e0ea59572964509) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19958#issuecomment-2204347799

From dholmes at openjdk.org  Wed Jul  3 02:21:21 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 3 Jul 2024 02:21:21 GMT
Subject: RFR: 8331385: G1: Prefix HeapRegion helper classes with G1
In-Reply-To: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com>
References: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com>
Message-ID: <SPBNsHY3noBQGgTadY2dtKkM29h59fcloTH9mIMRmzs=.f6dadafb-e5f5-455d-b2a3-04be5554d1d4@github.com>

On Mon, 1 Jul 2024 09:35:00 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

> Hi all,
> 
>   after [JDK-8330694](https://bugs.openjdk.org/browse/JDK-8330694) which renamed HeapRegion to G1HeapRegion, there were  a few related helper classes in this CR that were not renamed.
> 
> It's purely mechanical renaming without even further renaming of files etc.
> 
> This change updates them.
> 
> (Fwiw, the "Viewed" checkbox at the top right of the file change helps a lot review this change incrementally)
> 
> Testing: tier1, tier4, tier5
> 
> Thanks,
>   Thomas

Seems okay. Thanks

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19967#pullrequestreview-2155206936

From dholmes at openjdk.org  Wed Jul  3 02:58:30 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 3 Jul 2024 02:58:30 GMT
Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 [v2]
In-Reply-To: <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com>
References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
 <HmIHMdh599RCc4lq7jPiveOQphvPaRS-BlN3XHaE74E=.f145682f-5cbc-4b1c-b460-4a2bf323430b@github.com>
Message-ID: <tJD6UEGEi8QQfFNbG1dJSZOv0ZYCaRp0up08CXFsW2w=.8783a858-2be5-4b1b-82e3-0ed8e25a418c@github.com>

On Tue, 2 Jul 2024 17:25:53 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> Hi all, 
>>     This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. 
>> 
>> 
>> (gdb) ptype /ox Klass
>> /* offset      |    size */  type = class Klass : public Metadata {
>>                              public:
>>                                static const uint KLASS_KIND_COUNT;
>>                              protected:
>> /* 0x000c      |  0x0004 */    jint _layout_helper;
>> /* 0x0010      |  0x0004 */    const enum Klass::KlassKind _kind;
>> /* 0x0014      |  0x0004 */    jint _modifier_flags;
>> /* 0x0018      |  0x0004 */    juint _super_check_offset;
>> /* XXX  4-byte hole      */
>> /* 0x0020      |  0x0008 */    class Symbol *_name;
>> /* 0x0028      |  0x0008 */    class Klass *_secondary_super_cache;
>> /* 0x0030      |  0x0008 */    class Array<Klass*> *_secondary_supers;
>> /* 0x0038      |  0x0040 */    class Klass *_primary_supers[8];
>> /* 0x0078      |  0x0008 */    class OopHandle {
>>                                  private:
>> /* 0x0078      |  0x0008 */        class oop *_obj;
>> 
>>                                    /* total size (bytes):    8 */
>>                                } _java_mirror;
>> /* 0x0080      |  0x0008 */    class Klass *_super;
>> /* 0x0088      |  0x0008 */    class Klass * volatile _subklass;
>> /* 0x0090      |  0x0008 */    class Klass * volatile _next_sibling;
>> /* 0x0098      |  0x0008 */    class Klass *_next_link;
>> /* 0x00a0      |  0x0008 */    class ClassLoaderData *_class_loader_data;
>> /* 0x00a8      |  0x0008 */    uintx _bitmap;
>> /* 0x00b0      |  0x0001 */    uint8_t _hash_slot;
>> /* XXX  3-byte hole      */
>> /* 0x00b4      |  0x0004 */    int _vtable_len;
>> /* 0x00b8      |  0x0004 */    class AccessFlags {
>>                                  private:
>> /* 0x00b8      |  0x0004 */        jint _flags;
>> 
>>                                    /* total size (bytes):    4 */
>>                                } _access_flags;
>> /* XXX  4-byte hole      */
>> /* 0x00c0      |  0x0008 */    traceid _trace_id;
>>                              private:
>> /* 0x00c8      |  0x0002 */    s2 _shared_class_path_index;
>> /* 0x00ca      |  0x0002 */    u2 _shared_class_flags;
>> /* 0x00cc      |  0x0004 */    int _archived_mirror_index;
>>                              public:
>>                                static const int SECONDARY_SUPERS_TABLE_SIZE;
>>                         ...
>
> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Move both _bitmap and _hash_slot together

Marked as reviewed by dholmes (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/19958#pullrequestreview-2155233872

From xpeng at openjdk.org  Wed Jul  3 02:58:31 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Wed, 3 Jul 2024 02:58:31 GMT
Subject: Integrated: 8334220: Optimize Klass layout after JDK-8180450
In-Reply-To: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
References: <u4OmBM1C_MiwuD8TOlAyJhukY4ZWxHeKx7DPmXymXrQ=.939dd0b2-06e1-4b2d-b567-0c5c3c91255e@github.com>
Message-ID: <T1pobsG9ZcZgm696zN1p5ibrItIG4w8jOz4ntvucjMQ=.98c761df-fc19-4bc4-9510-8214f57ed18d@github.com>

On Sat, 29 Jun 2024 19:58:23 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> Hi all, 
>     This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. 
> 
> 
> (gdb) ptype /ox Klass
> /* offset      |    size */  type = class Klass : public Metadata {
>                              public:
>                                static const uint KLASS_KIND_COUNT;
>                              protected:
> /* 0x000c      |  0x0004 */    jint _layout_helper;
> /* 0x0010      |  0x0004 */    const enum Klass::KlassKind _kind;
> /* 0x0014      |  0x0004 */    jint _modifier_flags;
> /* 0x0018      |  0x0004 */    juint _super_check_offset;
> /* XXX  4-byte hole      */
> /* 0x0020      |  0x0008 */    class Symbol *_name;
> /* 0x0028      |  0x0008 */    class Klass *_secondary_super_cache;
> /* 0x0030      |  0x0008 */    class Array<Klass*> *_secondary_supers;
> /* 0x0038      |  0x0040 */    class Klass *_primary_supers[8];
> /* 0x0078      |  0x0008 */    class OopHandle {
>                                  private:
> /* 0x0078      |  0x0008 */        class oop *_obj;
> 
>                                    /* total size (bytes):    8 */
>                                } _java_mirror;
> /* 0x0080      |  0x0008 */    class Klass *_super;
> /* 0x0088      |  0x0008 */    class Klass * volatile _subklass;
> /* 0x0090      |  0x0008 */    class Klass * volatile _next_sibling;
> /* 0x0098      |  0x0008 */    class Klass *_next_link;
> /* 0x00a0      |  0x0008 */    class ClassLoaderData *_class_loader_data;
> /* 0x00a8      |  0x0008 */    uintx _bitmap;
> /* 0x00b0      |  0x0001 */    uint8_t _hash_slot;
> /* XXX  3-byte hole      */
> /* 0x00b4      |  0x0004 */    int _vtable_len;
> /* 0x00b8      |  0x0004 */    class AccessFlags {
>                                  private:
> /* 0x00b8      |  0x0004 */        jint _flags;
> 
>                                    /* total size (bytes):    4 */
>                                } _access_flags;
> /* XXX  4-byte hole      */
> /* 0x00c0      |  0x0008 */    traceid _trace_id;
>                              private:
> /* 0x00c8      |  0x0002 */    s2 _shared_class_path_index;
> /* 0x00ca      |  0x0002 */    u2 _shared_class_flags;
> /* 0x00cc      |  0x0004 */    int _archived_mirror_index;
>                              public:
>                                static const int SECONDARY_SUPERS_TABLE_SIZE;
>                                static const int SECONDARY_SUPERS_TABLE_MASK;
>                                static const...

This pull request has now been integrated.

Changeset: f9b4ea13
Author:    Xiaolong Peng <xpeng at openjdk.org>
Committer: David Holmes <dholmes at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/f9b4ea13e693da268c9aee27dee49f9c7f798bb1
Stats:     10 lines in 1 file changed: 5 ins; 5 del; 0 mod

8334220: Optimize Klass layout after JDK-8180450

Reviewed-by: coleenp, stuefe, dholmes

-------------

PR: https://git.openjdk.org/jdk/pull/19958

From dholmes at openjdk.org  Wed Jul  3 03:11:20 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 3 Jul 2024 03:11:20 GMT
Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII
 [v3]
In-Reply-To: <nyLYOhw7-wSPlKjeWi3FyuLY0UzFwWJdj-19ijEInU4=.6f539aaf-0cff-4ab8-8ca0-3acd3b44d071@github.com>
References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com>
 <nyLYOhw7-wSPlKjeWi3FyuLY0UzFwWJdj-19ijEInU4=.6f539aaf-0cff-4ab8-8ca0-3acd3b44d071@github.com>
Message-ID: <ZM5LyIYOqAkOZ_wDJS12HFBR_ZsHqE4QLzQN2ygLiQk=.d4680604-5eaa-4edb-b786-5a45b5c09eb4@github.com>

On Thu, 27 Jun 2024 08:05:42 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Motivated by analyzing CDS dump differences for reproducible builds, I found an optional ASCII printout to be valuable. As usual with hex dumps, ascii follows hex printout
>> 
>> Example:
>> 
>> 
>> 
>>    118 0x00000000000001c0:   204b444a6e65704f 53207469422d3436 4d56207265767265 6564747361662820   OpenJDK 64-Bit Server VM (fastde
>>    119 0x00000000000001e0:   692d343220677562 2d6c616e7265746e 68742e636f686461 756f732e73616d6f   bug 24-internal-adhoc.thomas.sou
>>    120 0x0000000000000200:   726f662029656372 612d78756e696c20 45524a203436646d 746e692d34322820   rce) for linux-amd64 JRE (24-int
>>    121 0x0000000000000220:   64612d6c616e7265 6d6f68742e636f68 6372756f732e7361 6c697562202c2965   ernal-adhoc.thomas.source), buil
>>    122 0x0000000000000240:   323032206e6f2074 5430322d36302d34 32313a35343a3031 672068746977205a   t on 2024-06-20T10:45:12Z with g
>>    123 0x0000000000000260:   2e352e3031206363 0000000000000030 0000000000000000 0000000000000000   cc 10.5.0_______________________
>>    124 0x0000000000000280:   0000000000000000 0000000000000000 0000000000000000 0000000000000000   ________________________________
>>    125 0x00000000000002a0:   0000000000000000 0000000000000000 0000000000000000 0000000000000000   ________________________________
>> 
>> 
>> The patch does that.
>> 
>> Small unrelated changes:
>> 
>> - I rewrote and extended the gtests, testing now a real-life printout containing a mixture or readable and non-readable pages, and printable and non-printable characters. I re-enabled tests on Windows, since https://bugs.openjdk.org/browse/JDK-8185734 is long solved.
>> 
>> - The new test uncovered an issue on 32-bit when printing giant words. We shift a signed value by 32 bits upwards, which can result in -1 resp. ffffffff in the upper half of the giant word. One of the pitfalls of intptr_t vs uintptr_t (I think most uses of intptr_t should probably use uintptr_t).
>> 
>> - I got tired of casting constness away from to-be-printed memory range just to be able to feed an address to os::print_hex_dump. The content printed is usually const. os::print_hex_dump does not need non-constness, but since we use address, and address is typedef char*, and one cannot declare a typedef'ed pointer target-const, the issue is there. I therefore changed the input to const uint8_t*. Maybe we need a const_address or something similar.
>> 
>> ----
>> 
>> Ran tests on Linux x64 and x86, Windows x86 and Mac aarch64. Fixed all issues I found. Only little-endian, I don't have big-e...
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   exclude test for AIX

Marked as reviewed by dholmes (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/19835#pullrequestreview-2155244941

From dholmes at openjdk.org  Wed Jul  3 03:11:21 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 3 Jul 2024 03:11:21 GMT
Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII
 [v2]
In-Reply-To: <nnF0zgFpLnwB7QpwiUq39YhjDQN6J9HiV4Yb6B-vhAo=.b746b5b7-81c7-4cd5-a6d1-d29c8dd0137b@github.com>
References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com>
 <Is5Y-4I-mPUfwsGgPw81goRaHtXMORucAqKXdxnufD0=.c13bf441-4678-4801-b874-7a610286822e@github.com>
 <cRYxoxtDjMenRgLdZEDcYtEzxv6JliX_sTBjbmNsw3U=.abf947a0-b1a9-4846-b06f-b62796aa04c4@github.com>
 <nnF0zgFpLnwB7QpwiUq39YhjDQN6J9HiV4Yb6B-vhAo=.b746b5b7-81c7-4cd5-a6d1-d29c8dd0137b@github.com>
Message-ID: <l--dAczzsXMRswqo_9XmPlWtjR_TMK_LuLzcE0e2QAI=.7d3c03a8-eddb-4d10-9e04-15159e339623@github.com>

On Thu, 27 Jun 2024 07:27:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/runtime/os.cpp line 949:
>> 
>>> 947:   uintptr_t i = (uintptr_t)SafeFetchN((intptr_t*)p, errval);
>>> 948:   if (i == errval) {
>>> 949:     i = (uintptr_t)SafeFetchN((intptr_t*)p, ~errval);
>> 
>> Pre-existing but if the initial fetch fails why do we think the second one can succeed ???
>
> There is a one-in-2^(32|64) chance the errval numerical value happend to be in memory. By reading twice, with different errval, we diminish the chance of mistaking a successful read for an error.

Ouch! So all the other places we only use SafetchN once are potentially broken? Or is this is special case where any value in memory could theoretically be valid?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1663414798

From kbarrett at openjdk.org  Wed Jul  3 05:00:26 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Wed, 3 Jul 2024 05:00:26 GMT
Subject: RFR: 8335591: Fix -Wzero-as-null-pointer-constant warnings in
 ConcurrentHashTable
Message-ID: <CIgX0UWlbnyNubbrvO9B3IE5Ilm9QP7f8v0Jho752tc=.03073f61-304d-4d4d-b85a-72c5f8ffcdbf@github.com>

Please review this trivial change to ConcurrentHashTable.  Initialization of
and assignments to the _invisible_epoch member are changed from a value of 0
to nullptr.  This removes some -Wzero-as-null-pointer-constant warnings when
building with that enabled.

Testing: mach5 tier1

-------------

Commit messages:
 - fix _invisible_epoch usage in ConcurrentHashTable

Changes: https://git.openjdk.org/jdk/pull/19996/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19996&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335591
  Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/19996.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19996/head:pull/19996

PR: https://git.openjdk.org/jdk/pull/19996

From dholmes at openjdk.org  Wed Jul  3 05:21:21 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 3 Jul 2024 05:21:21 GMT
Subject: RFR: 8335108: Build error after JDK-8333658 due to class templates
In-Reply-To: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com>
References: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com>
Message-ID: <wng_QvnzLUne1f_oobHsr4DaaPLhq7jcptPWBcAJHUo=.2eac54ed-49f7-4564-91c7-6f0ec20d0237@github.com>

On Tue, 25 Jun 2024 18:52:20 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

> Hi all, 
> 
> This PR addresses [8335108](https://bugs.openjdk.org/browse/JDK-8335108). 
> 
> The error arises as template-id is not allowed for constructor/destructor in C++20. 
> 
> Testing: 
> - [x] Compilation succeeds with g++ 14.1.1.
> 
> Thanks, 
> Sonia

The 24-hour rule is mentioned here:

https://openjdk.org/guide/#life-of-a-pr

6. Allow enough time for review

In general all PRs should be open for at least 24 hours to allow for reviewers in all time zones to get a chance to see it. It may actually happen that even 24 hours isn?t enough. Take into account weekends, holidays, and vacation times throughout the world and you?ll realize that a change that requires more than just a trivial review may have to be open for a while. In some areas [trivial] changes are allowed to be pushed without the 24 hour delay. Ask your reviewers if you think this applies to your change.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19890#issuecomment-2205119100

From stuefe at openjdk.org  Wed Jul  3 05:23:19 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 3 Jul 2024 05:23:19 GMT
Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII
 [v2]
In-Reply-To: <l--dAczzsXMRswqo_9XmPlWtjR_TMK_LuLzcE0e2QAI=.7d3c03a8-eddb-4d10-9e04-15159e339623@github.com>
References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com>
 <Is5Y-4I-mPUfwsGgPw81goRaHtXMORucAqKXdxnufD0=.c13bf441-4678-4801-b874-7a610286822e@github.com>
 <cRYxoxtDjMenRgLdZEDcYtEzxv6JliX_sTBjbmNsw3U=.abf947a0-b1a9-4846-b06f-b62796aa04c4@github.com>
 <nnF0zgFpLnwB7QpwiUq39YhjDQN6J9HiV4Yb6B-vhAo=.b746b5b7-81c7-4cd5-a6d1-d29c8dd0137b@github.com>
 <l--dAczzsXMRswqo_9XmPlWtjR_TMK_LuLzcE0e2QAI=.7d3c03a8-eddb-4d10-9e04-15159e339623@github.com>
Message-ID: <YWfftjtRvw97c5a5c7pSAcwX-Ck369idnO2J58_lzOs=.e97cbd26-de71-4eb4-844c-8d06b14974f8@github.com>

On Wed, 3 Jul 2024 03:07:23 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> There is a one-in-2^(32|64) chance the errval numerical value happend to be in memory. By reading twice, with different errval, we diminish the chance of mistaking a successful read for an error.
>
> Ouch! So all the other places we only use SafetchN once are potentially broken? Or is this is special case where any value in memory could theoretically be valid?

Well, maybe I am just overly careful. After all, the chance is infinitesimally small. You are probably in the clear unless you use 0 or some other frequent pattern.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1663506578

From duke at openjdk.org  Wed Jul  3 06:11:19 2024
From: duke at openjdk.org (Liming Liu)
Date: Wed, 3 Jul 2024 06:11:19 GMT
Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing [v2]
In-Reply-To: <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
 <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com>
Message-ID: <rnPsj7tJBhXHpNZ_Cubn3LdzsXi_u8vjFjWIlTiE9-s=.acda4cb0-7fe3-40ce-aac4-7fb5d7deef1a@github.com>

On Fri, 28 Jun 2024 19:22:32 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> See JBS issue.
>> 
>> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure.
>> 
>> The patch:
>> - exposes os::available_memory via Whitebox
>> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException`
>> 
>> I have some misgivings about this solution, though:
>> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. 
>> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions)
>> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException.
>> 
>> Despite my doubts, I think this is the best we can come up with if we want to have such a test.
>> 
>> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice.
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update TestAlwaysPreTouchBehavior.java

Could you please confirm whether it is related to JDK-8335167?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19803#issuecomment-2205168688

From rehn at openjdk.org  Wed Jul  3 06:34:20 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Wed, 3 Jul 2024 06:34:20 GMT
Subject: RFR: 8335411: RISC-V: Optimize encode_heap_oop when oop is not
 null
In-Reply-To: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com>
References: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com>
Message-ID: <Twjo9tLARYiTFf4DlqQzJuwAvbxBDsUJGQ-XYZndDrU=.a9291118-99bb-4430-8d2b-8fcff9f4a003@github.com>

On Mon, 1 Jul 2024 14:32:03 GMT, Feilong Jiang <fjiang at openjdk.org> wrote:

> Hi, please review this enhancement that adds two more `encode_heap_oop_not_null` methods.
> 
> Currently, `encode_heap_oop` will check if the oop pointer is `null` at first. We can skip the null check of the oop to reduce the unnecessary branch instruction when encoding non-null oop pointer into compressed form.
> 
> 
> Testing:
> - [x] Tier1~3 on linux-riscv64 with release build
> - [x] renaissance & dacapo benchmark suits for functionality

Nice, thanks!

-------------

Marked as reviewed by rehn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19974#pullrequestreview-2155566884

From rehn at openjdk.org  Wed Jul  3 06:37:51 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Wed, 3 Jul 2024 06:37:51 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v20]
In-Reply-To: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
Message-ID: <FMu4_SfI8mQvWR4HQwIssjRHArOwX1MQf7qDHiSAb2w=.807d5c7d-fd9c-4fdd-b7c2-f8c6323f440a@github.com>

> Hi all, please consider!
> 
> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
> Using a very small application or running very short time we have fast patchable calls.
> But any normal application running longer will increase the code size and code chrun/fragmentation.
> So whatever or not you get hot fast calls rely on luck.
> 
> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
> This would be the common case for a patchable call.
> 
> Code stream:
> JAL <trampo>
> Stubs:
> AUIPC
> LD
> JALR
> <DEST>
> 
> 
> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
> Even if you don't have that problem having a call to a jump is not the fastest way.
> Loading the address avoids the pitsfalls of cmodx.
> 
> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
> and instead do by default:
> 
> Code stream:
> AUIPC
> LD
> JALR
> Stubs:
> <DEST>
> 
> An experimental option for turning trampolines back on exists.
> 
> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
> 
> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
> 
> fop                                        (msec)    2239       |  2128       =  0.950424
> h2                                         (msec)    18660      |  16594      =  0.889282
> jython                                     (msec)    22022      |  21925      =  0.995595
> luindex                                    (msec)    2866       |  2842       =  0.991626
> lusearch                                   (msec)    4108       |  4311       =  1.04942
> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
> pmd                                        (msec)    5976       |  5897       =  0.98678
> jython                                     (msec)    22022      |  21925      =  0.995595
> Avg:                                       0.974112                              
> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
> h2(xcomp)                                  (msec)    37719      |  38004      =  1.00756
> jython(xcomp)        ...

Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits:

 - Merge branch 'master' into 8332689
 - Rename lc
 - Merge branch 'master' into 8332689
 - Merge branch 'master' into 8332689
 - Comments
 - Missed in merge-fixes, minor revert
 - Merge branch 'master' into 8332689
 - Minor review comments
 - Merge branch 'master' into 8332689
 - To be pushed
 - ... and 19 more: https://git.openjdk.org/jdk/compare/77a7078b...6fd73a66

-------------

Changes: https://git.openjdk.org/jdk/pull/19453/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=19
  Stats: 897 lines in 16 files changed: 622 ins; 177 del; 98 mod
  Patch: https://git.openjdk.org/jdk/pull/19453.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453

PR: https://git.openjdk.org/jdk/pull/19453

From chagedorn at openjdk.org  Wed Jul  3 06:59:18 2024
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Wed, 3 Jul 2024 06:59:18 GMT
Subject: RFR: 8335591: Fix -Wzero-as-null-pointer-constant warnings in
 ConcurrentHashTable
In-Reply-To: <CIgX0UWlbnyNubbrvO9B3IE5Ilm9QP7f8v0Jho752tc=.03073f61-304d-4d4d-b85a-72c5f8ffcdbf@github.com>
References: <CIgX0UWlbnyNubbrvO9B3IE5Ilm9QP7f8v0Jho752tc=.03073f61-304d-4d4d-b85a-72c5f8ffcdbf@github.com>
Message-ID: <aYI3OWb3zWuLPK8CJHq20UWxO55mtnqhxnsAYNIBpE8=.d2df1e20-52f5-4900-9026-064e3bd8d6e0@github.com>

On Wed, 3 Jul 2024 04:54:59 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this trivial change to ConcurrentHashTable.  Initialization of
> and assignments to the _invisible_epoch member are changed from a value of 0
> to nullptr.  This removes some -Wzero-as-null-pointer-constant warnings when
> building with that enabled.
> 
> Testing: mach5 tier1

Looks good and trivial.

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19996#pullrequestreview-2155609135

From dholmes at openjdk.org  Wed Jul  3 07:14:19 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 3 Jul 2024 07:14:19 GMT
Subject: RFR: 8335397: Improve reliability of
 TestRecursiveMonitorChurn.java [v2]
In-Reply-To: <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com>
References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
 <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com>
Message-ID: <AB5lGlWSK4IcD98ADnYCTvbipSgv8aGzxbhrx0utaWc=.dfa6b91e-7a81-449e-8f7e-100a932e99f5@github.com>

On Tue, 2 Jul 2024 19:18:47 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation.
>> 
>> Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT.
>
> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java

This seems much more reliable if we are testing the absence of excessive inflation.

One query below and one typo, but approved.

Thanks

src/hotspot/share/prims/whitebox.cpp line 1858:

> 1856: 
> 1857: WB_ENTRY(jlong, WB_getInUseMonitorCount(JNIEnv* env, jobject wb))
> 1858:   return (jlong) WhiteBox::get_in_use_monitor_count();

Why the indirection?

test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java line 72:

> 70:         if (pre_monitor_count != post_monitor_count) {
> 71:             final long monitor_count_change = post_monitor_count - pre_monitor_count;
> 72:             System.out.println("Unexpected change in mointor count: " + monitor_count_change);

typo: mointor

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19965#pullrequestreview-2155627165
PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1663630628
PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1663636766

From dholmes at openjdk.org  Wed Jul  3 07:18:20 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 3 Jul 2024 07:18:20 GMT
Subject: RFR: 8335397: Improve reliability of
 TestRecursiveMonitorChurn.java [v2]
In-Reply-To: <AB5lGlWSK4IcD98ADnYCTvbipSgv8aGzxbhrx0utaWc=.dfa6b91e-7a81-449e-8f7e-100a932e99f5@github.com>
References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
 <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com>
 <AB5lGlWSK4IcD98ADnYCTvbipSgv8aGzxbhrx0utaWc=.dfa6b91e-7a81-449e-8f7e-100a932e99f5@github.com>
Message-ID: <x-cPdnQy6lxMx3JNPo4F6Yc6c9XJ4RSxzK3D-XmDcas=.e34c0733-4353-4ca1-9fae-15f42bcf1905@github.com>

On Wed, 3 Jul 2024 07:06:17 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java
>
> src/hotspot/share/prims/whitebox.cpp line 1858:
> 
>> 1856: 
>> 1857: WB_ENTRY(jlong, WB_getInUseMonitorCount(JNIEnv* env, jobject wb))
>> 1858:   return (jlong) WhiteBox::get_in_use_monitor_count();
> 
> Why the indirection?

Ah now I see. We need the member function to make it a friend.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1663646717

From aboldtch at openjdk.org  Wed Jul  3 07:25:48 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Wed, 3 Jul 2024 07:25:48 GMT
Subject: RFR: 8335397: Improve reliability of
 TestRecursiveMonitorChurn.java [v3]
In-Reply-To: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
Message-ID: <HiqUmFciHGmdjRTa9B3RfxRxstbd6BA-QfZck-y5wBE=.98e4d5f3-4d47-406a-8de4-c712ca48d24f@github.com>

> TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation.
> 
> Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT.

Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:

  Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19965/files
  - new: https://git.openjdk.org/jdk/pull/19965/files/44279523..071869f3

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19965&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19965&range=01-02

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/19965.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19965/head:pull/19965

PR: https://git.openjdk.org/jdk/pull/19965

From aboldtch at openjdk.org  Wed Jul  3 07:25:48 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Wed, 3 Jul 2024 07:25:48 GMT
Subject: RFR: 8335397: Improve reliability of
 TestRecursiveMonitorChurn.java [v2]
In-Reply-To: <x-cPdnQy6lxMx3JNPo4F6Yc6c9XJ4RSxzK3D-XmDcas=.e34c0733-4353-4ca1-9fae-15f42bcf1905@github.com>
References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
 <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com>
 <AB5lGlWSK4IcD98ADnYCTvbipSgv8aGzxbhrx0utaWc=.dfa6b91e-7a81-449e-8f7e-100a932e99f5@github.com>
 <x-cPdnQy6lxMx3JNPo4F6Yc6c9XJ4RSxzK3D-XmDcas=.e34c0733-4353-4ca1-9fae-15f42bcf1905@github.com>
Message-ID: <jTghiE5P2NvKjeHntSSZssb7LDAlNZrmyEvfCxFmDiA=.93f464dd-a3ca-4e12-b414-65212925e43d@github.com>

On Wed, 3 Jul 2024 07:16:08 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> src/hotspot/share/prims/whitebox.cpp line 1858:
>> 
>>> 1856: 
>>> 1857: WB_ENTRY(jlong, WB_getInUseMonitorCount(JNIEnv* env, jobject wb))
>>> 1858:   return (jlong) WhiteBox::get_in_use_monitor_count();
>> 
>> Why the indirection?
>
> Ah now I see. We need the member function to make it a friend.

Yeah. But your comment made me realise that I could have friended the `friend jlong WB_getInUseMonitorCount(JNIEnv* env, jobject wb)` function directly. But I think this is fine as well.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1663656789

From aboldtch at openjdk.org  Wed Jul  3 07:25:49 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Wed, 3 Jul 2024 07:25:49 GMT
Subject: RFR: 8335397: Improve reliability of
 TestRecursiveMonitorChurn.java [v2]
In-Reply-To: <AB5lGlWSK4IcD98ADnYCTvbipSgv8aGzxbhrx0utaWc=.dfa6b91e-7a81-449e-8f7e-100a932e99f5@github.com>
References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
 <U2hHhqSXfrNJ5ZqJollgUAwBsv8jUbXNcqQ8A_D7Wh4=.e58c75e1-44a6-4afd-86e1-d65582fc6580@github.com>
 <AB5lGlWSK4IcD98ADnYCTvbipSgv8aGzxbhrx0utaWc=.dfa6b91e-7a81-449e-8f7e-100a932e99f5@github.com>
Message-ID: <vYAJxtUCQUW-rWQymIj7Zw_vjdEzBCLNfQnIeT83vrc=.04726dc5-c88d-41bb-9070-c0a9949c34fa@github.com>

On Wed, 3 Jul 2024 07:09:55 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java
>
> test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java line 72:
> 
>> 70:         if (pre_monitor_count != post_monitor_count) {
>> 71:             final long monitor_count_change = post_monitor_count - pre_monitor_count;
>> 72:             System.out.println("Unexpected change in mointor count: " + monitor_count_change);
> 
> typo: mointor

Suggestion:

            System.out.println("Unexpected change in monitor count: " + monitor_count_change);

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19965#discussion_r1663653892

From stuefe at openjdk.org  Wed Jul  3 07:31:21 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 3 Jul 2024 07:31:21 GMT
Subject: RFR: 8322475: Extend printing for System.map [v6]
In-Reply-To: <I2AjfniaetQsEpBX3ir4NBP1Ja1de1NZDbVmmkuAPwc=.5b2bb425-659f-4cd8-a5f3-360c55020d0b@github.com>
References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com>
 <-Qkoj2CJIqS0pNR-3JxXULeaty66oPIAJZgFx7IskTA=.9e679c42-24e4-4fb2-a3fd-d27be65aeac0@github.com>
 <I2AjfniaetQsEpBX3ir4NBP1Ja1de1NZDbVmmkuAPwc=.5b2bb425-659f-4cd8-a5f3-360c55020d0b@github.com>
Message-ID: <b_GyqOzspgbgo4cCtunljWBJT-bI7-BDG3YajeXv7W8=.3c3a5c61-8317-40e3-b139-1145ad3532c4@github.com>

On Tue, 2 Jul 2024 15:11:13 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits:
>> 
>>  - feedback johan
>>  - fix merge errors
>>  - Merge branch 'master' into System.maps-more-info
>>  - copyrights
>>  - Merge branch 'master' into System.maps-more-info
>>  - fix merge issue
>>  - Merge branch 'master' into System.maps-more-info
>>  - fix whitespace issue
>>  - wip
>>  - exhuming
>>  - ... and 13 more: https://git.openjdk.org/jdk/compare/c6f3bf4b...940199de
>
> src/hotspot/os/linux/procMapsParser.hpp line 66:
> 
>> 64:     from = to = nullptr;
>> 65:     prot[0] = filename[0] = '\0';
>> 66:     kernelpagesize = rss = private_hugetlb = anonhugepages = swap = 0;
> 
> `private_hugetlb` and `shared_hugetlb` missing in reset. Intentional?

No :( good catch. That is why I prefer memsetting, but folks don't like that.

> test/hotspot/jtreg/serviceability/dcmd/vm/SystemDumpMapTest.java line 31:
> 
>> 29: 
>> 30: import java.io.*;
>> 31: import java.lang.StringBuilder;
> 
> Nit: `java.lang.*` are imported by default. I don't see it used, so maybe a left over?

Yep, a bunch of those are not needed anymore.

> test/hotspot/jtreg/serviceability/dcmd/vm/SystemMapTestBase.java line 53:
> 
>> 51:         regexBase_committed + "\\[stack\\]",
>> 52:         // we should see the hs-perf data file, and it should appear as shared as well as committed
>> 53:         regexBase_shared_and_committed + "hsperfdata_.*"
> 
> Suggestion: Should the test run with `-XX:+UsePerfData` since it's expecting this file. It's default on, but that might change.

That is a good point.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1663668311
PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1663664444
PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1663662914

From stuefe at openjdk.org  Wed Jul  3 07:55:48 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 3 Jul 2024 07:55:48 GMT
Subject: RFR: 8322475: Extend printing for System.map [v7]
In-Reply-To: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com>
References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com>
Message-ID: <x8LrKZVgWp9X2bwTSnV3KppIRabvSzfvnRLuLDgvy84=.7b71527c-95a7-49e1-a0c6-e78c81f644c1@github.com>

> This is an expansion on the new `System.map` command introduced with JDK-8318636.
> 
> We now print valuable information per memory region, such as:
> 
> - the actual resident set size
> - the actual number of huge pages
> - the actual used page size
> - the THP state of the region (was advised, is eligible, uses THP, ...)
> - whether the region is shared
> - whether the region had been committed (backed by swap)
> - whether the region has been swapped out.
> 
> Example output:
> 
> [system-map-thp1.txt](https://github.com/user-attachments/files/15587748/system-map-thp1.txt)
> 
> 
> from                 to                       size        rss    hugetlb pgsz prot notes            vm info/file                                                                                                                                                                    
> 0x00000000c0000000 - 0x00000000ffe00000 1071644672          0    4194304 2M   rw-p huge             JAVAHEAP /anon_hugepage                                                                                                                                                         
> 0x00000000ffe00000 - 0x0000000100000000    2097152          0          0 2M   rw-p huge             JAVAHEAP /anon_hugepage                                                                                                                                                         
> 0x0000558016b67000 - 0x0000558016b68000       4096       4096          0 4K   r--p                  /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java
> 0x0000558016b68000 - 0x0000558016b69000       4096       4096          0 4K   r-xp                  /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java
> 0x00007f3a749f2000 - 0x00007f3a74c62000    2555904    2555904          0 4K   rwxp                  CODE(CodeHeap 'profiled nmethods')                               
> 0x00007f3a74c62000 - 0x00007f3a7be51000  119468032          0          0 4K   ---p nores            CODE(CodeHeap 'profiled nmethods')                               
> 0x00007f3a7be51000 - 0x00007f3a7c1c1000    3604480    3604480          0 4K   rwxp                  CODE(CodeHeap 'profiled nmethods')                               
> 0x00007f3a7c1c1000 - 0x00007f3a7c592000    4001792          0          0 4K   ---p nores            CODE(CodeHeap 'non-nmethods')                                    
> 0x00007f3a7c592000 - 0x00007f3a7c802000    2555904    2555904          0 4K   rwxp                  CODE(Code...

Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits:

 - no stack on 32-bit, scan for vDSO lib instead
 - feedback severin
 - Merge branch 'master' into System.maps-more-info
 - feedback johan
 - fix merge errors
 - Merge branch 'master' into System.maps-more-info
 - copyrights
 - Merge branch 'master' into System.maps-more-info
 - fix merge issue
 - Merge branch 'master' into System.maps-more-info
 - ... and 16 more: https://git.openjdk.org/jdk/compare/0db9bc57...3cc5943d

-------------

Changes: https://git.openjdk.org/jdk/pull/17158/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17158&range=06
  Stats: 624 lines in 14 files changed: 425 ins; 111 del; 88 mod
  Patch: https://git.openjdk.org/jdk/pull/17158.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17158/head:pull/17158

PR: https://git.openjdk.org/jdk/pull/17158

From stuefe at openjdk.org  Wed Jul  3 07:55:49 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 3 Jul 2024 07:55:49 GMT
Subject: RFR: 8322475: Extend printing for System.map [v6]
In-Reply-To: <I2AjfniaetQsEpBX3ir4NBP1Ja1de1NZDbVmmkuAPwc=.5b2bb425-659f-4cd8-a5f3-360c55020d0b@github.com>
References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com>
 <-Qkoj2CJIqS0pNR-3JxXULeaty66oPIAJZgFx7IskTA=.9e679c42-24e4-4fb2-a3fd-d27be65aeac0@github.com>
 <I2AjfniaetQsEpBX3ir4NBP1Ja1de1NZDbVmmkuAPwc=.5b2bb425-659f-4cd8-a5f3-360c55020d0b@github.com>
Message-ID: <RFTaag5x2gqKf-BoDkIovVZvUG0qw54T541JwnbMvvY=.ffb56e15-9314-4647-a199-a7392cfd39e6@github.com>

On Tue, 2 Jul 2024 15:15:00 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits:
>> 
>>  - feedback johan
>>  - fix merge errors
>>  - Merge branch 'master' into System.maps-more-info
>>  - copyrights
>>  - Merge branch 'master' into System.maps-more-info
>>  - fix merge issue
>>  - Merge branch 'master' into System.maps-more-info
>>  - fix whitespace issue
>>  - wip
>>  - exhuming
>>  - ... and 13 more: https://git.openjdk.org/jdk/compare/c6f3bf4b...940199de
>
> This seems fine. Mostly nits.

Many thanks, @jerboaa !

I fixed you remarks, and swapped scanning for the primordial thread VMA for scanning for the vdso library, which should be loaded on all linuxes and archs.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17158#issuecomment-2205333212

From sgehwolf at openjdk.org  Wed Jul  3 10:15:21 2024
From: sgehwolf at openjdk.org (Severin Gehwolf)
Date: Wed, 3 Jul 2024 10:15:21 GMT
Subject: RFR: 8322475: Extend printing for System.map [v7]
In-Reply-To: <x8LrKZVgWp9X2bwTSnV3KppIRabvSzfvnRLuLDgvy84=.7b71527c-95a7-49e1-a0c6-e78c81f644c1@github.com>
References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com>
 <x8LrKZVgWp9X2bwTSnV3KppIRabvSzfvnRLuLDgvy84=.7b71527c-95a7-49e1-a0c6-e78c81f644c1@github.com>
Message-ID: <or-sjCNsgmoehxnd5HyNFp_4jFGzuYbYr_h9KOsmOx8=.8ed899a0-3b85-4ebf-85db-5dd654cb61d1@github.com>

On Wed, 3 Jul 2024 07:55:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This is an expansion on the new `System.map` command introduced with JDK-8318636.
>> 
>> We now print valuable information per memory region, such as:
>> 
>> - the actual resident set size
>> - the actual number of huge pages
>> - the actual used page size
>> - the THP state of the region (was advised, is eligible, uses THP, ...)
>> - whether the region is shared
>> - whether the region had been committed (backed by swap)
>> - whether the region has been swapped out.
>> 
>> Example output:
>> 
>> [system-map-thp1.txt](https://github.com/user-attachments/files/15587748/system-map-thp1.txt)
>> 
>> 
>> from                 to                       size        rss    hugetlb pgsz prot notes            vm info/file                                                                                                                                                                    
>> 0x00000000c0000000 - 0x00000000ffe00000 1071644672          0    4194304 2M   rw-p huge             JAVAHEAP /anon_hugepage                                                                                                                                                         
>> 0x00000000ffe00000 - 0x0000000100000000    2097152          0          0 2M   rw-p huge             JAVAHEAP /anon_hugepage                                                                                                                                                         
>> 0x0000558016b67000 - 0x0000558016b68000       4096       4096          0 4K   r--p                  /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java
>> 0x0000558016b68000 - 0x0000558016b69000       4096       4096          0 4K   r-xp                  /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java
>> 0x00007f3a749f2000 - 0x00007f3a74c62000    2555904    2555904          0 4K   rwxp                  CODE(CodeHeap 'profiled nmethods')                               
>> 0x00007f3a74c62000 - 0x00007f3a7be51000  119468032          0          0 4K   ---p nores            CODE(CodeHeap 'profiled nmethods')                               
>> 0x00007f3a7be51000 - 0x00007f3a7c1c1000    3604480    3604480          0 4K   rwxp                  CODE(CodeHeap 'profiled nmethods')                               
>> 0x00007f3a7c1c1000 - 0x00007f3a7c592000    4001792          0          0 4K   ---p nores            CODE(CodeHeap 'non-nmethods')                                    
>> 0x00007f3a7c592000 - 0x00007f3a7c802000    2555904    2...
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits:
> 
>  - no stack on 32-bit, scan for vDSO lib instead
>  - feedback severin
>  - Merge branch 'master' into System.maps-more-info
>  - feedback johan
>  - fix merge errors
>  - Merge branch 'master' into System.maps-more-info
>  - copyrights
>  - Merge branch 'master' into System.maps-more-info
>  - fix merge issue
>  - Merge branch 'master' into System.maps-more-info
>  - ... and 16 more: https://git.openjdk.org/jdk/compare/0db9bc57...3cc5943d

Looks OK to me.

-------------

Marked as reviewed by sgehwolf (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/17158#pullrequestreview-2156041166

From kbarrett at openjdk.org  Wed Jul  3 11:14:22 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Wed, 3 Jul 2024 11:14:22 GMT
Subject: RFR: 8335591: Fix -Wzero-as-null-pointer-constant warnings in
 ConcurrentHashTable
In-Reply-To: <aYI3OWb3zWuLPK8CJHq20UWxO55mtnqhxnsAYNIBpE8=.d2df1e20-52f5-4900-9026-064e3bd8d6e0@github.com>
References: <CIgX0UWlbnyNubbrvO9B3IE5Ilm9QP7f8v0Jho752tc=.03073f61-304d-4d4d-b85a-72c5f8ffcdbf@github.com>
 <aYI3OWb3zWuLPK8CJHq20UWxO55mtnqhxnsAYNIBpE8=.d2df1e20-52f5-4900-9026-064e3bd8d6e0@github.com>
Message-ID: <1dXlcjPoisEhSe50WkJGe47to7wdbuAGaOsifGKmrak=.51c72d76-afd2-463e-918d-400b1cca370f@github.com>

On Wed, 3 Jul 2024 06:56:12 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Please review this trivial change to ConcurrentHashTable.  Initialization of
>> and assignments to the _invisible_epoch member are changed from a value of 0
>> to nullptr.  This removes some -Wzero-as-null-pointer-constant warnings when
>> building with that enabled.
>> 
>> Testing: mach5 tier1
>
> Looks good and trivial.

Thanks for review @chhagedorn

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19996#issuecomment-2205826875

From kbarrett at openjdk.org  Wed Jul  3 11:14:23 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Wed, 3 Jul 2024 11:14:23 GMT
Subject: Integrated: 8335591: Fix -Wzero-as-null-pointer-constant warnings in
 ConcurrentHashTable
In-Reply-To: <CIgX0UWlbnyNubbrvO9B3IE5Ilm9QP7f8v0Jho752tc=.03073f61-304d-4d4d-b85a-72c5f8ffcdbf@github.com>
References: <CIgX0UWlbnyNubbrvO9B3IE5Ilm9QP7f8v0Jho752tc=.03073f61-304d-4d4d-b85a-72c5f8ffcdbf@github.com>
Message-ID: <o1wSdNWF7BdFsVQXAyEO8F00DkjWkRKzpYG2cM7z7PM=.952dff23-7721-43f7-af9d-ac0cb01dae22@github.com>

On Wed, 3 Jul 2024 04:54:59 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this trivial change to ConcurrentHashTable.  Initialization of
> and assignments to the _invisible_epoch member are changed from a value of 0
> to nullptr.  This removes some -Wzero-as-null-pointer-constant warnings when
> building with that enabled.
> 
> Testing: mach5 tier1

This pull request has now been integrated.

Changeset: c06b75ff
Author:    Kim Barrett <kbarrett at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/c06b75ff88babf57bdcd0919ea177ff363fd858b
Stats:     4 lines in 1 file changed: 0 ins; 0 del; 4 mod

8335591: Fix -Wzero-as-null-pointer-constant warnings in ConcurrentHashTable

Reviewed-by: chagedorn

-------------

PR: https://git.openjdk.org/jdk/pull/19996

From fjiang at openjdk.org  Wed Jul  3 12:14:25 2024
From: fjiang at openjdk.org (Feilong Jiang)
Date: Wed, 3 Jul 2024 12:14:25 GMT
Subject: RFR: 8335411: RISC-V: Optimize encode_heap_oop when oop is not
 null
In-Reply-To: <Twjo9tLARYiTFf4DlqQzJuwAvbxBDsUJGQ-XYZndDrU=.a9291118-99bb-4430-8d2b-8fcff9f4a003@github.com>
References: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com>
 <Twjo9tLARYiTFf4DlqQzJuwAvbxBDsUJGQ-XYZndDrU=.a9291118-99bb-4430-8d2b-8fcff9f4a003@github.com>
Message-ID: <FX7RhxK2he69hMb8nzE2yMd5WJZLw0j9uJ2_dxDidZk=.6992ce14-b903-4e30-b94a-47a3c0c86d2a@github.com>

On Wed, 3 Jul 2024 06:31:38 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hi, please review this enhancement that adds two more `encode_heap_oop_not_null` methods.
>> 
>> Currently, `encode_heap_oop` will check if the oop pointer is `null` at first. We can skip the null check of the oop to reduce the unnecessary branch instruction when encoding non-null oop pointer into compressed form.
>> 
>> 
>> Testing:
>> - [x] Tier1~3 on linux-riscv64 with release build
>> - [x] renaissance & dacapo benchmark suits for functionality
>
> Nice, thanks!

@robehn @RealFYang -- Thanks?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19974#issuecomment-2205930449

From fjiang at openjdk.org  Wed Jul  3 12:14:27 2024
From: fjiang at openjdk.org (Feilong Jiang)
Date: Wed, 3 Jul 2024 12:14:27 GMT
Subject: Integrated: 8335411: RISC-V: Optimize encode_heap_oop when oop is not
 null
In-Reply-To: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com>
References: <oc-oKUicWVvFjZKiZdhlKYw9nQv9kq2zABpj-beTyxA=.79a98f53-bd18-4bdc-b08d-f21494b949a0@github.com>
Message-ID: <mx_vmNTC-AXMc9qr0utCZgNQp0ybfYOl34wlp3HdL50=.73344f6e-3602-4eea-8f3e-3efa0e253aac@github.com>

On Mon, 1 Jul 2024 14:32:03 GMT, Feilong Jiang <fjiang at openjdk.org> wrote:

> Hi, please review this enhancement that adds two more `encode_heap_oop_not_null` methods.
> 
> Currently, `encode_heap_oop` will check if the oop pointer is `null` at first. We can skip the null check of the oop to reduce the unnecessary branch instruction when encoding non-null oop pointer into compressed form.
> 
> 
> Testing:
> - [x] Tier1~3 on linux-riscv64 with release build
> - [x] renaissance & dacapo benchmark suits for functionality

This pull request has now been integrated.

Changeset: 5866b16d
Author:    Feilong Jiang <fjiang at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/5866b16dbca3f63770c8792d204dabdf49b59839
Stats:     62 lines in 3 files changed: 59 ins; 0 del; 3 mod

8335411: RISC-V: Optimize encode_heap_oop when oop is not null

Reviewed-by: fyang, rehn

-------------

PR: https://git.openjdk.org/jdk/pull/19974

From duke at openjdk.org  Wed Jul  3 12:33:29 2024
From: duke at openjdk.org (ArsenyBochkarev)
Date: Wed, 3 Jul 2024 12:33:29 GMT
Subject: RFR: 8335615: Clean up left-overs from 8317721
Message-ID: <8mKp_7GIrYq0ncBiLKo7UK0B110YLom7xHiJJLs6YY8=.80719e65-de2f-49a6-96e5-3af42dfd20a9@github.com>

The `compiler/intrinsics/zip/TestCRC32.java` is ok. Note about performance: https://github.com/openjdk/jdk/pull/17046#discussion_r1548163980

-------------

Commit messages:
 - JDK-8335615: Clean up left-overs from 8317721

Changes: https://git.openjdk.org/jdk/pull/20004/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20004&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335615
  Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20004.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20004/head:pull/20004

PR: https://git.openjdk.org/jdk/pull/20004

From rehn at openjdk.org  Wed Jul  3 12:53:54 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Wed, 3 Jul 2024 12:53:54 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v21]
In-Reply-To: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
Message-ID: <tdkglfMOAm2Mg_Qj_TvnOro8oIqlOV8LKgfqgKTYFIw=.654dd861-9959-4b17-9c5a-6628f2782e3b@github.com>

> Hi all, please consider!
> 
> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
> Using a very small application or running very short time we have fast patchable calls.
> But any normal application running longer will increase the code size and code chrun/fragmentation.
> So whatever or not you get hot fast calls rely on luck.
> 
> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
> This would be the common case for a patchable call.
> 
> Code stream:
> JAL <trampo>
> Stubs:
> AUIPC
> LD
> JALR
> <DEST>
> 
> 
> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
> Even if you don't have that problem having a call to a jump is not the fastest way.
> Loading the address avoids the pitsfalls of cmodx.
> 
> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
> and instead do by default:
> 
> Code stream:
> AUIPC
> LD
> JALR
> Stubs:
> <DEST>
> 
> An experimental option for turning trampolines back on exists.
> 
> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
> 
> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
> 
> fop                                        (msec)    2239       |  2128       =  0.950424
> h2                                         (msec)    18660      |  16594      =  0.889282
> jython                                     (msec)    22022      |  21925      =  0.995595
> luindex                                    (msec)    2866       |  2842       =  0.991626
> lusearch                                   (msec)    4108       |  4311       =  1.04942
> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
> pmd                                        (msec)    5976       |  5897       =  0.98678
> jython                                     (msec)    22022      |  21925      =  0.995595
> Avg:                                       0.974112                              
> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
> h2(xcomp)                                  (msec)    37719      |  38004      =  1.00756
> jython(xcomp)        ...

Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:

  Rename to reloc_call

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19453/files
  - new: https://git.openjdk.org/jdk/pull/19453/files/6fd73a66..7337a2fc

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=20
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=19-20

  Stats: 26 lines in 10 files changed: 0 ins; 0 del; 26 mod
  Patch: https://git.openjdk.org/jdk/pull/19453.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453

PR: https://git.openjdk.org/jdk/pull/19453

From rehn at openjdk.org  Wed Jul  3 12:53:54 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Wed, 3 Jul 2024 12:53:54 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v19]
In-Reply-To: <Om10ZojNaTukbhvpMkDBFrScxrRrbNad3UgtsIWpX1k=.fc472d60-670b-4388-b601-3d7075b2b241@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <iV5H4rQ0puc5_3boriBFg2XOqhdeuaNkGhXM6ExImYo=.107e0dfd-5096-49f0-8184-d019fd933918@github.com>
 <-SCwSVP6zTNao7X4VlIPRor8OU9vDg79oGVDWTf4XCM=.b72cbf6f-66e7-4fb3-b387-e00ae0537ac6@github.com>
 <C0jgPKg6jBtEMdY6ExLPLigaYdAw4ilbi0g4AnoCDUE=.25fccd4a-e53c-48a4-8c7d-cc8ac1e65b73@github.com>
 <Om10ZojNaTukbhvpMkDBFrScxrRrbNad3UgtsIWpX1k=.fc472d60-670b-4388-b601-3d7075b2b241@github.com>
Message-ID: <j10U_JSyhnDVNiIFEjVeMeVstb0MAG_4t-5HhjY7hRs=.2b8e0606-eef6-4c95-bae5-f27831430ced@github.com>

On Sat, 29 Jun 2024 01:55:56 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Yes. My thinking was, the site is still patachable, even if some sites don't need that capability.
>> The reason why this patch ignores near calls is because the short reach of JAL +-1MB (so normally only a few stubs can be reach from a few nmethods).
>> But it is on the enhancement list.
>> 
>> I don't mind changing the name, feel free to suggest something!
>
> The key things seems to be that they are typed with a relocInfo, so maybe `reloc_call`?

Ok, fixed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1664141580

From coleenp at openjdk.org  Wed Jul  3 12:55:19 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Wed, 3 Jul 2024 12:55:19 GMT
Subject: RFR: 8335397: Improve reliability of
 TestRecursiveMonitorChurn.java [v3]
In-Reply-To: <HiqUmFciHGmdjRTa9B3RfxRxstbd6BA-QfZck-y5wBE=.98e4d5f3-4d47-406a-8de4-c712ca48d24f@github.com>
References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
 <HiqUmFciHGmdjRTa9B3RfxRxstbd6BA-QfZck-y5wBE=.98e4d5f3-4d47-406a-8de4-c712ca48d24f@github.com>
Message-ID: <CH2MObaRHB7xOBoeF78t8nSurjHARVdmzvGK1Uc9kgQ=.61fe6ca0-1e37-4d71-9b1d-367509564300@github.com>

On Wed, 3 Jul 2024 07:25:48 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation.
>> 
>> Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT.
>
> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java

Marked as reviewed by coleenp (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/19965#pullrequestreview-2156368077

From fyang at openjdk.org  Wed Jul  3 13:46:18 2024
From: fyang at openjdk.org (Fei Yang)
Date: Wed, 3 Jul 2024 13:46:18 GMT
Subject: RFR: 8335615: Clean up left-overs from 8317721
In-Reply-To: <8mKp_7GIrYq0ncBiLKo7UK0B110YLom7xHiJJLs6YY8=.80719e65-de2f-49a6-96e5-3af42dfd20a9@github.com>
References: <8mKp_7GIrYq0ncBiLKo7UK0B110YLom7xHiJJLs6YY8=.80719e65-de2f-49a6-96e5-3af42dfd20a9@github.com>
Message-ID: <XdLKabSGwtSwsscso3HOcsOsPh_xTyvcgEL54KQ0l5I=.cbf4b879-90eb-4066-8b98-fbb2bbf68258@github.com>

On Wed, 3 Jul 2024 12:28:42 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:

> The `compiler/intrinsics/zip/TestCRC32.java` is ok. Note about performance: https://github.com/openjdk/jdk/pull/17046#discussion_r1548163980

Thanks!

-------------

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20004#pullrequestreview-2156497002

From duke at openjdk.org  Wed Jul  3 13:54:19 2024
From: duke at openjdk.org (duke)
Date: Wed, 3 Jul 2024 13:54:19 GMT
Subject: RFR: 8335615: Clean up left-overs from 8317721
In-Reply-To: <8mKp_7GIrYq0ncBiLKo7UK0B110YLom7xHiJJLs6YY8=.80719e65-de2f-49a6-96e5-3af42dfd20a9@github.com>
References: <8mKp_7GIrYq0ncBiLKo7UK0B110YLom7xHiJJLs6YY8=.80719e65-de2f-49a6-96e5-3af42dfd20a9@github.com>
Message-ID: <ZxM_7zxGnjwuCVWpvH91ZX8mTNji4STS5loz3o0w4OY=.d21774e1-6ea7-43cd-9fd3-3cae3ae11b34@github.com>

On Wed, 3 Jul 2024 12:28:42 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:

> The `compiler/intrinsics/zip/TestCRC32.java` is ok. Note about performance: https://github.com/openjdk/jdk/pull/17046#discussion_r1548163980

@ArsenyBochkarev 
Your change (at version fd2db6da165aa26923537b93128b4dfc7a62ab3a) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20004#issuecomment-2206135362

From duke at openjdk.org  Wed Jul  3 14:12:23 2024
From: duke at openjdk.org (ArsenyBochkarev)
Date: Wed, 3 Jul 2024 14:12:23 GMT
Subject: Integrated: 8335615: Clean up left-overs from 8317721
In-Reply-To: <8mKp_7GIrYq0ncBiLKo7UK0B110YLom7xHiJJLs6YY8=.80719e65-de2f-49a6-96e5-3af42dfd20a9@github.com>
References: <8mKp_7GIrYq0ncBiLKo7UK0B110YLom7xHiJJLs6YY8=.80719e65-de2f-49a6-96e5-3af42dfd20a9@github.com>
Message-ID: <dE1Bo5ledkwRLSbRspakcga5Sj4yfmIxiOSGKQogMpI=.7853515f-1f60-4c1e-b079-5cf4c68788f2@github.com>

On Wed, 3 Jul 2024 12:28:42 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:

> The `compiler/intrinsics/zip/TestCRC32.java` is ok. Note about performance: https://github.com/openjdk/jdk/pull/17046#discussion_r1548163980

This pull request has now been integrated.

Changeset: 5a8af2b8
Author:    Arseny Bochkarev <arseny.bochkarev at syntacore.com>
Committer: Vladimir Kempik <vkempik at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/5a8af2b8b93672de9b3a3e73e6984506980da932
Stats:     3 lines in 1 file changed: 0 ins; 1 del; 2 mod

8335615: Clean up left-overs from 8317721

Reviewed-by: fyang

-------------

PR: https://git.openjdk.org/jdk/pull/20004

From stuefe at openjdk.org  Wed Jul  3 16:11:31 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 3 Jul 2024 16:11:31 GMT
Subject: RFR: 8322475: Extend printing for System.map [v7]
In-Reply-To: <x8LrKZVgWp9X2bwTSnV3KppIRabvSzfvnRLuLDgvy84=.7b71527c-95a7-49e1-a0c6-e78c81f644c1@github.com>
References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com>
 <x8LrKZVgWp9X2bwTSnV3KppIRabvSzfvnRLuLDgvy84=.7b71527c-95a7-49e1-a0c6-e78c81f644c1@github.com>
Message-ID: <DjtiwDJV9LMKe0g7Lfsb96MhmtcGatw-TXaCC8uhOvg=.ea5afc08-dc09-4d2e-b5df-6c040c00b82d@github.com>

On Wed, 3 Jul 2024 07:55:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This is an expansion on the new `System.map` command introduced with JDK-8318636.
>> 
>> We now print valuable information per memory region, such as:
>> 
>> - the actual resident set size
>> - the actual number of huge pages
>> - the actual used page size
>> - the THP state of the region (was advised, is eligible, uses THP, ...)
>> - whether the region is shared
>> - whether the region had been committed (backed by swap)
>> - whether the region has been swapped out.
>> 
>> Example output:
>> 
>> [system-map-thp1.txt](https://github.com/user-attachments/files/15587748/system-map-thp1.txt)
>> 
>> 
>> from                 to                       size        rss    hugetlb pgsz prot notes            vm info/file                                                                                                                                                                    
>> 0x00000000c0000000 - 0x00000000ffe00000 1071644672          0    4194304 2M   rw-p huge             JAVAHEAP /anon_hugepage                                                                                                                                                         
>> 0x00000000ffe00000 - 0x0000000100000000    2097152          0          0 2M   rw-p huge             JAVAHEAP /anon_hugepage                                                                                                                                                         
>> 0x0000558016b67000 - 0x0000558016b68000       4096       4096          0 4K   r--p                  /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java
>> 0x0000558016b68000 - 0x0000558016b69000       4096       4096          0 4K   r-xp                  /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java
>> 0x00007f3a749f2000 - 0x00007f3a74c62000    2555904    2555904          0 4K   rwxp                  CODE(CodeHeap 'profiled nmethods')                               
>> 0x00007f3a74c62000 - 0x00007f3a7be51000  119468032          0          0 4K   ---p nores            CODE(CodeHeap 'profiled nmethods')                               
>> 0x00007f3a7be51000 - 0x00007f3a7c1c1000    3604480    3604480          0 4K   rwxp                  CODE(CodeHeap 'profiled nmethods')                               
>> 0x00007f3a7c1c1000 - 0x00007f3a7c592000    4001792          0          0 4K   ---p nores            CODE(CodeHeap 'non-nmethods')                                    
>> 0x00007f3a7c592000 - 0x00007f3a7c802000    2555904    2...
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits:
> 
>  - no stack on 32-bit, scan for vDSO lib instead
>  - feedback severin
>  - Merge branch 'master' into System.maps-more-info
>  - feedback johan
>  - fix merge errors
>  - Merge branch 'master' into System.maps-more-info
>  - copyrights
>  - Merge branch 'master' into System.maps-more-info
>  - fix merge issue
>  - Merge branch 'master' into System.maps-more-info
>  - ... and 16 more: https://git.openjdk.org/jdk/compare/0db9bc57...3cc5943d

x64 fastdebug build error unrelated. I locally built and tested on linux x64.

Thanks @jerboaa !

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17158#issuecomment-2206708662

From stuefe at openjdk.org  Wed Jul  3 16:11:33 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 3 Jul 2024 16:11:33 GMT
Subject: Integrated: 8322475: Extend printing for System.map
In-Reply-To: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com>
References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com>
Message-ID: <Jz3SMNKWUmZHMAbauun0xkOW89W80eGA2PIr-oxOkJU=.356c4e73-b209-43a6-a460-280b48caf56a@github.com>

On Tue, 19 Dec 2023 15:48:58 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> This is an expansion on the new `System.map` command introduced with JDK-8318636.
> 
> We now print valuable information per memory region, such as:
> 
> - the actual resident set size
> - the actual number of huge pages
> - the actual used page size
> - the THP state of the region (was advised, is eligible, uses THP, ...)
> - whether the region is shared
> - whether the region had been committed (backed by swap)
> - whether the region has been swapped out.
> 
> Example output:
> 
> [system-map-thp1.txt](https://github.com/user-attachments/files/15587748/system-map-thp1.txt)
> 
> 
> from                 to                       size        rss    hugetlb pgsz prot notes            vm info/file                                                                                                                                                                    
> 0x00000000c0000000 - 0x00000000ffe00000 1071644672          0    4194304 2M   rw-p huge             JAVAHEAP /anon_hugepage                                                                                                                                                         
> 0x00000000ffe00000 - 0x0000000100000000    2097152          0          0 2M   rw-p huge             JAVAHEAP /anon_hugepage                                                                                                                                                         
> 0x0000558016b67000 - 0x0000558016b68000       4096       4096          0 4K   r--p                  /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java
> 0x0000558016b68000 - 0x0000558016b69000       4096       4096          0 4K   r-xp                  /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java
> 0x00007f3a749f2000 - 0x00007f3a74c62000    2555904    2555904          0 4K   rwxp                  CODE(CodeHeap 'profiled nmethods')                               
> 0x00007f3a74c62000 - 0x00007f3a7be51000  119468032          0          0 4K   ---p nores            CODE(CodeHeap 'profiled nmethods')                               
> 0x00007f3a7be51000 - 0x00007f3a7c1c1000    3604480    3604480          0 4K   rwxp                  CODE(CodeHeap 'profiled nmethods')                               
> 0x00007f3a7c1c1000 - 0x00007f3a7c592000    4001792          0          0 4K   ---p nores            CODE(CodeHeap 'non-nmethods')                                    
> 0x00007f3a7c592000 - 0x00007f3a7c802000    2555904    2555904          0 4K   rwxp                  CODE(Code...

This pull request has now been integrated.

Changeset: 8aaec37a
Author:    Thomas Stuefe <stuefe at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/8aaec37ace102b55ee1387cfd1967ec3ab662083
Stats:     624 lines in 14 files changed: 425 ins; 111 del; 88 mod

8322475: Extend printing for System.map

Reviewed-by: sgehwolf, jsjolen

-------------

PR: https://git.openjdk.org/jdk/pull/17158

From pchilanomate at openjdk.org  Wed Jul  3 16:43:26 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Wed, 3 Jul 2024 16:43:26 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area in
 frame::oops_interpreted_do oop closure after 8329665
Message-ID: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>

The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens.

The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details).

The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. 

I tested the patch by running it through mach5 tiers 1-6.

Thanks,
Patricio

-------------

Commit messages:
 - v1

Changes: https://git.openjdk.org/jdk/pull/20012/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20012&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335409
  Stats: 38 lines in 3 files changed: 11 ins; 15 del; 12 mod
  Patch: https://git.openjdk.org/jdk/pull/20012.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20012/head:pull/20012

PR: https://git.openjdk.org/jdk/pull/20012

From iklam at openjdk.org  Wed Jul  3 17:04:41 2024
From: iklam at openjdk.org (Ioi Lam)
Date: Wed, 3 Jul 2024 17:04:41 GMT
Subject: RFR: 8312125: Refactor CDS enum class handling
Message-ID: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com>

Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change.

-------------

Commit messages:
 - 8312125: Refactor CDS enum class handling

Changes: https://git.openjdk.org/jdk/pull/20013/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20013&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8312125
  Stats: 285 lines in 5 files changed: 190 ins; 93 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20013.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20013/head:pull/20013

PR: https://git.openjdk.org/jdk/pull/20013

From sgehwolf at openjdk.org  Wed Jul  3 17:37:20 2024
From: sgehwolf at openjdk.org (Severin Gehwolf)
Date: Wed, 3 Jul 2024 17:37:20 GMT
Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII
 [v3]
In-Reply-To: <nyLYOhw7-wSPlKjeWi3FyuLY0UzFwWJdj-19ijEInU4=.6f539aaf-0cff-4ab8-8ca0-3acd3b44d071@github.com>
References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com>
 <nyLYOhw7-wSPlKjeWi3FyuLY0UzFwWJdj-19ijEInU4=.6f539aaf-0cff-4ab8-8ca0-3acd3b44d071@github.com>
Message-ID: <EliUQk2e0HZE3BQ3BKOGvF81KROy_lLp4OgK-hRWazA=.79466db9-87df-403c-a928-15e1dea8bbd5@github.com>

On Thu, 27 Jun 2024 08:05:42 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Motivated by analyzing CDS dump differences for reproducible builds, I found an optional ASCII printout to be valuable. As usual with hex dumps, ascii follows hex printout
>> 
>> Example:
>> 
>> 
>> 
>>    118 0x00000000000001c0:   204b444a6e65704f 53207469422d3436 4d56207265767265 6564747361662820   OpenJDK 64-Bit Server VM (fastde
>>    119 0x00000000000001e0:   692d343220677562 2d6c616e7265746e 68742e636f686461 756f732e73616d6f   bug 24-internal-adhoc.thomas.sou
>>    120 0x0000000000000200:   726f662029656372 612d78756e696c20 45524a203436646d 746e692d34322820   rce) for linux-amd64 JRE (24-int
>>    121 0x0000000000000220:   64612d6c616e7265 6d6f68742e636f68 6372756f732e7361 6c697562202c2965   ernal-adhoc.thomas.source), buil
>>    122 0x0000000000000240:   323032206e6f2074 5430322d36302d34 32313a35343a3031 672068746977205a   t on 2024-06-20T10:45:12Z with g
>>    123 0x0000000000000260:   2e352e3031206363 0000000000000030 0000000000000000 0000000000000000   cc 10.5.0_______________________
>>    124 0x0000000000000280:   0000000000000000 0000000000000000 0000000000000000 0000000000000000   ________________________________
>>    125 0x00000000000002a0:   0000000000000000 0000000000000000 0000000000000000 0000000000000000   ________________________________
>> 
>> 
>> The patch does that.
>> 
>> Small unrelated changes:
>> 
>> - I rewrote and extended the gtests, testing now a real-life printout containing a mixture or readable and non-readable pages, and printable and non-printable characters. I re-enabled tests on Windows, since https://bugs.openjdk.org/browse/JDK-8185734 is long solved.
>> 
>> - The new test uncovered an issue on 32-bit when printing giant words. We shift a signed value by 32 bits upwards, which can result in -1 resp. ffffffff in the upper half of the giant word. One of the pitfalls of intptr_t vs uintptr_t (I think most uses of intptr_t should probably use uintptr_t).
>> 
>> - I got tired of casting constness away from to-be-printed memory range just to be able to feed an address to os::print_hex_dump. The content printed is usually const. os::print_hex_dump does not need non-constness, but since we use address, and address is typedef char*, and one cannot declare a typedef'ed pointer target-const, the issue is there. I therefore changed the input to const uint8_t*. Maybe we need a const_address or something similar.
>> 
>> ----
>> 
>> Ran tests on Linux x64 and x86, Windows x86 and Mac aarch64. Fixed all issues I found. Only little-endian, I don't have big-e...
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   exclude test for AIX

This isn't really the area of my expertise, but the patch seems reasonable to me.

src/hotspot/share/runtime/os.cpp line 945:

> 943: 
> 944: ATTRIBUTE_NO_ASAN static bool read_safely_from(const uintptr_t* p, uintptr_t* result) {
> 945:   DEBUG_ONLY(*result = 0xAAAA;)

It's not clear why this was added. Left-over?

-------------

Marked as reviewed by sgehwolf (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19835#pullrequestreview-2156999448
PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1664527314

From matsaave at openjdk.org  Wed Jul  3 17:51:20 2024
From: matsaave at openjdk.org (Matias Saavedra Silva)
Date: Wed, 3 Jul 2024 17:51:20 GMT
Subject: RFR: 8312125: Refactor CDS enum class handling
In-Reply-To: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com>
References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com>
Message-ID: <ZHucB6ROGjR9owwiQyZpm-k25GRwPdkmpLoaXglBZ7M=.e04beda9-56d1-4425-9297-5cac89ee7f57@github.com>

On Wed, 3 Jul 2024 17:00:30 GMT, Ioi Lam <iklam at openjdk.org> wrote:

> Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change.

I have two comments but overall this looks good!

src/hotspot/share/cds/cdsEnumKlass.cpp line 2:

> 1: /*
> 2:  * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved.

Please update the copyrights on the new files.

src/hotspot/share/cds/cdsEnumKlass.cpp line 68:

> 66:                                    oop orig_obj) {
> 67:   assert(level > 1, "must never be called at the first (outermost) level");
> 68:   assert(is_enum_obj(orig_obj), "must be");

Is this assert redundant? You check for this before you make the call to `handle_enum_obj()` below. I think you can move the if statement in here.

-------------

Changes requested by matsaave (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20013#pullrequestreview-2157034791
PR Review Comment: https://git.openjdk.org/jdk/pull/20013#discussion_r1664548371
PR Review Comment: https://git.openjdk.org/jdk/pull/20013#discussion_r1664553655

From hgreule at openjdk.org  Wed Jul  3 19:47:39 2024
From: hgreule at openjdk.org (Hannes Greule)
Date: Wed, 3 Jul 2024 19:47:39 GMT
Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively
 throws wrong exception
Message-ID: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com>

Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods.

The exception message is less exact, but I think that's acceptable.

-------------

Commit messages:
 - add test (and find missing method)
 - make reflective calls to signature polymorphic methods in VarHandle throw UOE

Changes: https://git.openjdk.org/jdk/pull/20015/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20015&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335638
  Stats: 75 lines in 2 files changed: 71 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/20015.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20015/head:pull/20015

PR: https://git.openjdk.org/jdk/pull/20015

From iklam at openjdk.org  Wed Jul  3 19:57:51 2024
From: iklam at openjdk.org (Ioi Lam)
Date: Wed, 3 Jul 2024 19:57:51 GMT
Subject: RFR: 8312125: Refactor CDS enum class handling [v2]
In-Reply-To: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com>
References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com>
Message-ID: <xxU06cCiROZP1kPcY6pWxomBLPTGPPnxrbc22c-K08E=.4c6d7a88-29d0-43eb-a917-4e90766ddfc9@github.com>

> Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change.

Ioi Lam has updated the pull request incrementally with one additional commit since the last revision:

  fixed copyright

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20013/files
  - new: https://git.openjdk.org/jdk/pull/20013/files/64b77ecb..49dc109e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20013&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20013&range=00-01

  Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20013.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20013/head:pull/20013

PR: https://git.openjdk.org/jdk/pull/20013

From iklam at openjdk.org  Wed Jul  3 19:57:52 2024
From: iklam at openjdk.org (Ioi Lam)
Date: Wed, 3 Jul 2024 19:57:52 GMT
Subject: RFR: 8312125: Refactor CDS enum class handling [v2]
In-Reply-To: <ZHucB6ROGjR9owwiQyZpm-k25GRwPdkmpLoaXglBZ7M=.e04beda9-56d1-4425-9297-5cac89ee7f57@github.com>
References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com>
 <ZHucB6ROGjR9owwiQyZpm-k25GRwPdkmpLoaXglBZ7M=.e04beda9-56d1-4425-9297-5cac89ee7f57@github.com>
Message-ID: <7LNBOLBhkwk0BpAWFiJv-i0PFhX6M54IwpVM6w0a5uc=.59137b5a-8810-4d7b-81af-ac1d931cc80b@github.com>

On Wed, 3 Jul 2024 17:42:07 GMT, Matias Saavedra Silva <matsaave at openjdk.org> wrote:

>> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fixed copyright
>
> src/hotspot/share/cds/cdsEnumKlass.cpp line 2:
> 
>> 1: /*
>> 2:  * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved.
> 
> Please update the copyrights on the new files.

I changed the dates to `2023, 2024` as these files were initially published in the leyden repo last year.

> src/hotspot/share/cds/cdsEnumKlass.cpp line 68:
> 
>> 66:                                    oop orig_obj) {
>> 67:   assert(level > 1, "must never be called at the first (outermost) level");
>> 68:   assert(is_enum_obj(orig_obj), "must be");
> 
> Is this assert redundant? You check for this before you make the call to `handle_enum_obj()` below. I think you can move the if statement in here.

The assert declares the fact that the caller must check that orig_obj is an enum before calling this function. This guards against inadvertent changes that might accidentally drop the "if" check in the caller.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20013#discussion_r1664690497
PR Review Comment: https://git.openjdk.org/jdk/pull/20013#discussion_r1664687592

From matsaave at openjdk.org  Wed Jul  3 20:08:17 2024
From: matsaave at openjdk.org (Matias Saavedra Silva)
Date: Wed, 3 Jul 2024 20:08:17 GMT
Subject: RFR: 8312125: Refactor CDS enum class handling [v2]
In-Reply-To: <xxU06cCiROZP1kPcY6pWxomBLPTGPPnxrbc22c-K08E=.4c6d7a88-29d0-43eb-a917-4e90766ddfc9@github.com>
References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com>
 <xxU06cCiROZP1kPcY6pWxomBLPTGPPnxrbc22c-K08E=.4c6d7a88-29d0-43eb-a917-4e90766ddfc9@github.com>
Message-ID: <uy6CKlyFbVZ-yLe6Mklejpa6AmToFoMFuV_tL6VJ-f4=.0d123829-80e2-43f3-8d2c-c7ff8973cb0a@github.com>

On Wed, 3 Jul 2024 19:57:51 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>> Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change.
>
> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fixed copyright

Thanks for the changes and clarification!

-------------

Marked as reviewed by matsaave (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20013#pullrequestreview-2157295122

From liach at openjdk.org  Wed Jul  3 20:36:22 2024
From: liach at openjdk.org (Chen Liang)
Date: Wed, 3 Jul 2024 20:36:22 GMT
Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively
 throws wrong exception
In-Reply-To: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com>
References: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com>
Message-ID: <wehVWk76N-IIsKi8LfdOchbLJGxUfxsW8Y9VqiDd-2k=.1b3d5923-9ba1-4eec-b926-164c7423cc87@github.com>

On Wed, 3 Jul 2024 19:43:05 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

> Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods.
> 
> The exception message is less exact, but I think that's acceptable.

Great work!

src/hotspot/share/prims/methodHandles.cpp line 1372:

> 1370:  */
> 1371: JVM_ENTRY(jobject, VH_UOE(JNIEnv* env, jobject mh, jobjectArray args)) {
> 1372:   THROW_MSG_NULL(vmSymbols::java_lang_UnsupportedOperationException(), "VarHandle access mode method a cannot be invoked reflectively");

Suggestion:

  THROW_MSG_NULL(vmSymbols::java_lang_UnsupportedOperationException(), "VarHandle access mode methods cannot be invoked reflectively");

Looks like a typo to me.

src/hotspot/share/prims/methodHandles.cpp line 1419:

> 1417: static JNINativeMethod VH_methods[] = {
> 1418:   // UnsupportedOperationException throwers
> 1419:   {CC "compareAndExchange",         CC "([" OBJ ")" OBJ,    FN_PTR(VH_UOE)},

I recommend ordering these by the order in `AccessMode`, which is also the declaration order in `VarHandle`; that way, if we add a new access mode, it's easier for us to maintain this list.

src/hotspot/share/prims/methodHandles.cpp line 1457:

> 1455: JVM_ENTRY(void, JVM_RegisterMethodHandleMethods(JNIEnv *env, jclass MHN_class)) {
> 1456:   assert(!MethodHandles::enabled(), "must not be enabled");
> 1457:   assert(vmClasses::MethodHandle_klass() != nullptr, "should be present");

Should we duplicate this assert for `vmClasses::VarHandle_klass()` too?

test/jdk/java/lang/invoke/VarHandles/VarHandleTestReflection.java line 1:

> 1: /*

The copyright header's year needs an update.

test/jdk/java/lang/invoke/VarHandles/VarHandleTestReflection.java line 69:

> 67:         VarHandle v = handle();
> 68: 
> 69:         // Try a reflective invoke using a Method, with an array of 0 arguments

Suggestion:

        // Try a reflective invoke using a Method, with the minimal required argument

test/jdk/java/lang/invoke/VarHandles/VarHandleTestReflection.java line 72:

> 70: 
> 71:         Method vhm = VarHandle.class.getMethod(accessMode.methodName(), Object[].class);
> 72:         Object args = new Object[0];

I recommend naming this `arg`, as this is the single arg to the reflected method. Had you inlined this, you would have called `vhm.invoke(v, (Object) new Object[0]);`

-------------

PR Review: https://git.openjdk.org/jdk/pull/20015#pullrequestreview-2157341254
PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664744641
PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664741601
PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664737631
PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664753008
PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664751627
PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664751688

From psandoz at openjdk.org  Wed Jul  3 21:34:18 2024
From: psandoz at openjdk.org (Paul Sandoz)
Date: Wed, 3 Jul 2024 21:34:18 GMT
Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively
 throws wrong exception
In-Reply-To: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com>
References: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com>
Message-ID: <xAUJsRxtQTGRtlpHChBG17bxbhbFbIE0iK_3YYz5z2Y=.e63e4147-7197-4438-9ffa-3f2deead0088@github.com>

On Wed, 3 Jul 2024 19:43:05 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

> Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods.
> 
> The exception message is less exact, but I think that's acceptable.

src/hotspot/share/prims/methodHandles.cpp line 1371:

> 1369:  * invoked directly.
> 1370:  */
> 1371: JVM_ENTRY(jobject, VH_UOE(JNIEnv* env, jobject mh, jobjectArray args)) {

Suggestion:

JVM_ENTRY(jobject, VH_UOE(JNIEnv* env, jobject vh, jobjectArray args)) {

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664836522

From dholmes at openjdk.org  Wed Jul  3 22:03:28 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 3 Jul 2024 22:03:28 GMT
Subject: RFR: 8322475: Extend printing for System.map [v7]
In-Reply-To: <x8LrKZVgWp9X2bwTSnV3KppIRabvSzfvnRLuLDgvy84=.7b71527c-95a7-49e1-a0c6-e78c81f644c1@github.com>
References: <xXLpEw01_OAADNe6SFsw8sBYqjShMROIKQH3IflvgAM=.facb614e-cc97-441f-873f-e7453bd4338d@github.com>
 <x8LrKZVgWp9X2bwTSnV3KppIRabvSzfvnRLuLDgvy84=.7b71527c-95a7-49e1-a0c6-e78c81f644c1@github.com>
Message-ID: <7_EFxaouVZu-UhsgaEPKb771nYu0H9bLkyywlxgLhWM=.2145753f-7397-45c0-8a2b-8c5a1fafd74e@github.com>

On Wed, 3 Jul 2024 07:55:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This is an expansion on the new `System.map` command introduced with JDK-8318636.
>> 
>> We now print valuable information per memory region, such as:
>> 
>> - the actual resident set size
>> - the actual number of huge pages
>> - the actual used page size
>> - the THP state of the region (was advised, is eligible, uses THP, ...)
>> - whether the region is shared
>> - whether the region had been committed (backed by swap)
>> - whether the region has been swapped out.
>> 
>> Example output:
>> 
>> [system-map-thp1.txt](https://github.com/user-attachments/files/15587748/system-map-thp1.txt)
>> 
>> 
>> from                 to                       size        rss    hugetlb pgsz prot notes            vm info/file                                                                                                                                                                    
>> 0x00000000c0000000 - 0x00000000ffe00000 1071644672          0    4194304 2M   rw-p huge             JAVAHEAP /anon_hugepage                                                                                                                                                         
>> 0x00000000ffe00000 - 0x0000000100000000    2097152          0          0 2M   rw-p huge             JAVAHEAP /anon_hugepage                                                                                                                                                         
>> 0x0000558016b67000 - 0x0000558016b68000       4096       4096          0 4K   r--p                  /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java
>> 0x0000558016b68000 - 0x0000558016b69000       4096       4096          0 4K   r-xp                  /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java
>> 0x00007f3a749f2000 - 0x00007f3a74c62000    2555904    2555904          0 4K   rwxp                  CODE(CodeHeap 'profiled nmethods')                               
>> 0x00007f3a74c62000 - 0x00007f3a7be51000  119468032          0          0 4K   ---p nores            CODE(CodeHeap 'profiled nmethods')                               
>> 0x00007f3a7be51000 - 0x00007f3a7c1c1000    3604480    3604480          0 4K   rwxp                  CODE(CodeHeap 'profiled nmethods')                               
>> 0x00007f3a7c1c1000 - 0x00007f3a7c592000    4001792          0          0 4K   ---p nores            CODE(CodeHeap 'non-nmethods')                                    
>> 0x00007f3a7c592000 - 0x00007f3a7c802000    2555904    2...
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits:
> 
>  - no stack on 32-bit, scan for vDSO lib instead
>  - feedback severin
>  - Merge branch 'master' into System.maps-more-info
>  - feedback johan
>  - fix merge errors
>  - Merge branch 'master' into System.maps-more-info
>  - copyrights
>  - Merge branch 'master' into System.maps-more-info
>  - fix merge issue
>  - Merge branch 'master' into System.maps-more-info
>  - ... and 16 more: https://git.openjdk.org/jdk/compare/0db9bc57...3cc5943d

The modified tests are failing in our CI on Linux x64 and Aarch64 - see https://bugs.openjdk.org/browse/JDK-8335643

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17158#issuecomment-2207383900

From Matthew.Carter at microsoft.com  Wed Jul  3 23:13:43 2024
From: Matthew.Carter at microsoft.com (Mat Carter)
Date: Wed, 3 Jul 2024 23:13:43 +0000
Subject: Proposal for small experimental change to compiler thread calculation.
Message-ID: <SJ0PR21MB2040BDBA4389364FBC5D39B58ADD2@SJ0PR21MB2040.namprd21.prod.outlook.com>

We've been looking at compiler queue load, both under dynamic and static configurations and added jdk.CompilerQueueUtilization to the JFR logging to help with this.

There's much discussion in JBS ([1], [2] and [3] amongst others) and on the mailinglists regarding 1 vs 2 queues, shared threads vs dedicated and the benefits/tradeoffs of dynamic compiler threads.

Our proposal is to allow the 1:2 ratio (c1:c2) to be overridden on the command line with a goal to allow experimentation that might help either solidify the rational around the current settings or set us on a new path to make some changes.  Further it could allow developers to fine tune the ratio for their workload specific needs.

Something like -XX:CICompilerThreadRatio="2:3" (default is "1:2" which matches the current settings)

Note that the math to calculate the allocation of CICompilerCounts to C1 and C2 would remain integer, ensuring that the default ratio of 1:2 allocates the same number of threads to C1 and C2 as it does today.

Other than adding a new command line option, the only change would be in the initialize method in src/hotspot/share/compiler/compilationPolicy.cpp

There's also a thought on setting the compiler threads explicitly that we're happy to table until later: -XX:CICompilerThreadCounts="3:4"; in this case we'd compute CICompilerCounts as the sum of C1 and C2 threads.

Thoughts and questions appreciated, thanks in advance
Mat

[1] https://bugs.openjdk.org/browse/JDK-8134507
[2] https://bugs.openjdk.org/browse/JDK-8198756
[3] https://bugs.openjdk.org/browse/JDK-8302264
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240703/a5b4b41c/attachment.htm>

From xpeng at openjdk.org  Thu Jul  4 00:36:33 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Thu, 4 Jul 2024 00:36:33 GMT
Subject: RFR: 8334231: Optimize MethodData layout
Message-ID: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>

Hi all,
     This PR is a part of the https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of  MethodData. Here is the original layout from `pahole`:

class MethodData : public Metadata {
public:

	/* class Metadata            <ancestor>; */      /*     0     0 */

	/* XXX 8 bytes hole, try to pack */

	class Method *             _method;              /*     8     8 */
	int                        _size;                /*    16     4 */
	int                        _hint_di;             /*    20     4 */
	class Mutex               _extra_data_lock;      /*    24   104 */
	/* --- cacheline 2 boundary (128 bytes) --- */
	class CompilerCounters    _compiler_counters;    /*   128    80 */
	/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
	intx                       _eflags;              /*   208     8 */
	intx                       _arg_local;           /*   216     8 */
	intx                       _arg_stack;           /*   224     8 */
	intx                       _arg_returned;        /*   232     8 */
	int                        _creation_mileage;    /*   240     4 */
	class InvocationCounter   _invocation_counter;   /*   244     4 */
	class InvocationCounter   _backedge_counter;     /*   248     4 */
	int                        _invocation_counter_start; /*   252     4 */
	/* --- cacheline 4 boundary (256 bytes) --- */
	int                        _backedge_counter_start; /*   256     4 */
	uint                       _tenure_traps;        /*   260     4 */
	int                        _invoke_mask;         /*   264     4 */
	int                        _backedge_mask;       /*   268     4 */
	short int                  _num_loops;           /*   272     2 */
	short int                  _num_blocks;          /*   274     2 */
	enum WouldProfile          _would_profile;       /*   276     4 */
	int                        _jvmci_ir_size;       /*   280     4 */

	/* XXX 4 bytes hole, try to pack */

	class FailedSpeculation *  _failed_speculations; /*   288     8 */
	int                        _data_size;           /*   296     4 */
	int                        _parameters_type_data_di; /*   300     4 */
	int                        _exception_handler_data_di; /*   304     4 */

	/* XXX 4 bytes hole, try to pack */

	intptr_t                   _data[1];             /*   312     8 */

	/* size: 320, cachelines: 5, members: 27 */
	/* sum members: 304, holes: 3, sum holes: 16 */
};


There are 3 holes in the layout, the 1st 8-byte hole seems related to the ancestor Metadata which actually has 8-byte size, we may not be able to do anything to optimize:

class Metadata : public MetaspaceObj {
public:

	/* class MetaspaceObj        <ancestor>; */      /*     0     0 */

	/* XXX last struct has 1 byte of padding */

	int ()(void) * *           _vptr.Metadata;       /*     0     8 */

	/* size: 8, cachelines: 1, members: 2 */
	/* paddings: 1, sum paddings: 1 */
	/* last cacheline: 8 bytes */
};


The two 4-byte holes should be easy to fix, we can simply swap the position of  _jvmci_ir_size and _failed_speculations for better alignment. Here is the new layout after the change:

class MethodData : public Metadata {
public:

	/* class Metadata            <ancestor>; */      /*     0     0 */

	/* XXX 8 bytes hole, try to pack */

	class Method *             _method;              /*     8     8 */
	int                        _size;                /*    16     4 */
	int                        _hint_di;             /*    20     4 */
	class Mutex               _extra_data_lock;      /*    24   104 */
	/* --- cacheline 2 boundary (128 bytes) --- */
	class CompilerCounters    _compiler_counters;    /*   128    80 */
	/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
	intx                       _eflags;              /*   208     8 */
	intx                       _arg_local;           /*   216     8 */
	intx                       _arg_stack;           /*   224     8 */
	intx                       _arg_returned;        /*   232     8 */
	int                        _creation_mileage;    /*   240     4 */
	class InvocationCounter   _invocation_counter;   /*   244     4 */
	class InvocationCounter   _backedge_counter;     /*   248     4 */
	int                        _invocation_counter_start; /*   252     4 */
	/* --- cacheline 4 boundary (256 bytes) --- */
	int                        _backedge_counter_start; /*   256     4 */
	uint                       _tenure_traps;        /*   260     4 */
	int                        _invoke_mask;         /*   264     4 */
	int                        _backedge_mask;       /*   268     4 */
	short int                  _num_loops;           /*   272     2 */
	short int                  _num_blocks;          /*   274     2 */
	enum WouldProfile          _would_profile;       /*   276     4 */
	class FailedSpeculation *  _failed_speculations; /*   280     8 */
	int                        _jvmci_ir_size;       /*   288     4 */
	int                        _data_size;           /*   292     4 */
	int                        _parameters_type_data_di; /*   296     4 */
	int                        _exception_handler_data_di; /*   300     4 */
	intptr_t                   _data[1];             /*   304     8 */

	/* size: 312, cachelines: 5, members: 27 */
	/* sum members: 304, holes: 1, sum holes: 8 */
	/* last cacheline: 56 bytes */
};


The two 4-byte holes are removed, saving 8 bytes. 

Also removed unnecessary `private: `  mark.

Additional test:
- [x] CONF=linux-x86_64-server-fastdebug CONF_CHECK=ignore make clean test TEST=tier2

Best,
Xiaolong.

-------------

Commit messages:
 - Swap position of _jvmci_ir_size and _failed_speculations for better alignment
 - Optimize MethodData layout

Changes: https://git.openjdk.org/jdk/pull/20019/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20019&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8334231
  Stats: 3 lines in 1 file changed: 1 ins; 2 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20019.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20019/head:pull/20019

PR: https://git.openjdk.org/jdk/pull/20019

From duke at openjdk.org  Thu Jul  4 01:53:29 2024
From: duke at openjdk.org (duke)
Date: Thu, 4 Jul 2024 01:53:29 GMT
Subject: Withdrawn: 8325316: Enable -pedantic -Wpedantic for gcc
In-Reply-To: <5-fL8vy-065EhMR2SzE21o7UFrOOoDVfnhDG06G-kkE=.0c2dab10-c458-4178-89f7-004f8f921a3f@github.com>
References: <5-fL8vy-065EhMR2SzE21o7UFrOOoDVfnhDG06G-kkE=.0c2dab10-c458-4178-89f7-004f8f921a3f@github.com>
Message-ID: <27AIDm3yKW2W4MwsZCVPsxPdmQLc5LCIvFpT5P32ruA=.ab7c27fd-064e-43c1-8abd-77f8a052592c@github.com>

On Tue, 6 Feb 2024 09:45:07 GMT, Julian Waters <jwaters at openjdk.org> wrote:

> Similarly to [JDK-8325163](https://bugs.openjdk.org/browse/JDK-8325163), this enables pedantic mode for gcc, ensuring stricter Standard conformance and allowing for buggy and broken code previously undetectable by gcc to be caught

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/17727

From dholmes at openjdk.org  Thu Jul  4 04:18:17 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 4 Jul 2024 04:18:17 GMT
Subject: RFR: 8334231: Optimize MethodData layout
In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
Message-ID: <clMXAOykJ1YFhkWUsoEbupm32URSyEnFJ5hcYRSB6iM=.04eaf1c4-3baa-46f0-b1c8-87ccbc802298@github.com>

On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> Hi all,
>      This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of  MethodData. Here is the original layout from `pahole`:
> 
> class MethodData : public Metadata {
> public:
> 
> 	/* class Metadata            <ancestor>; */      /*     0     0 */
> 
> 	/* XXX 8 bytes hole, try to pack */
> 
> 	class Method *             _method;              /*     8     8 */
> 	int                        _size;                /*    16     4 */
> 	int                        _hint_di;             /*    20     4 */
> 	class Mutex               _extra_data_lock;      /*    24   104 */
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	class CompilerCounters    _compiler_counters;    /*   128    80 */
> 	/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
> 	intx                       _eflags;              /*   208     8 */
> 	intx                       _arg_local;           /*   216     8 */
> 	intx                       _arg_stack;           /*   224     8 */
> 	intx                       _arg_returned;        /*   232     8 */
> 	int                        _creation_mileage;    /*   240     4 */
> 	class InvocationCounter   _invocation_counter;   /*   244     4 */
> 	class InvocationCounter   _backedge_counter;     /*   248     4 */
> 	int                        _invocation_counter_start; /*   252     4 */
> 	/* --- cacheline 4 boundary (256 bytes) --- */
> 	int                        _backedge_counter_start; /*   256     4 */
> 	uint                       _tenure_traps;        /*   260     4 */
> 	int                        _invoke_mask;         /*   264     4 */
> 	int                        _backedge_mask;       /*   268     4 */
> 	short int                  _num_loops;           /*   272     2 */
> 	short int                  _num_blocks;          /*   274     2 */
> 	enum WouldProfile          _would_profile;       /*   276     4 */
> 	int                        _jvmci_ir_size;       /*   280     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	class FailedSpeculation *  _failed_speculations; /*   288     8 */
> 	int                        _data_size;           /*   296     4 */
> 	int                        _parameters_type_data_di; /*   300     4 */
> 	int                        _exception_handler_data_di; /*   304     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	intptr_t                   _data[1];             /*   312     8 */
> 
> 	/* size: 320, cachelines: 5, members: 27 */
> 	/* sum members: 304, holes: 3, sum holes: 16 */
> };
> 
> 
> There are 3 holes ...

Seems reasonable.

Thanks

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20019#pullrequestreview-2157913550

From dholmes at openjdk.org  Thu Jul  4 05:02:25 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 4 Jul 2024 05:02:25 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665
In-Reply-To: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
Message-ID: <iZb_AvCGeJYQ51-UTqMhkxRKQwt0F6UgdM6nppalaEo=.d3c5ad91-9342-42a6-83c9-03a9e4a104bb@github.com>

On Wed, 3 Jul 2024 16:24:20 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens.
> 
> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details).
> 
> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. 
> 
> I tested the patch by running it through mach5 tiers 1-6.
> 
> Thanks,
> Patricio

Thanks for the detailed explanations. A couple of minor (pre-existing) nits but changes are good.

Thanks

src/hotspot/share/interpreter/oopMapCache.cpp line 179:

> 177: #ifdef ASSERT
> 178:   _used = false;
> 179: #endif

Nit pre-existing: use of DEBUG_ONLY would be more consistent with later setting of `_used`.

src/hotspot/share/interpreter/oopMapCache.cpp line 408:

> 406: 
> 407: void InterpreterOopMap::resource_copy(OopMapCacheEntry* from) {
> 408:   // The expectation is that this InterpreterOopMap is a recently created

s/is a recently/is recently/

src/hotspot/share/interpreter/oopMapCache.hpp line 136:

> 134:   // Copy the OopMapCacheEntry in parameter "from" into this
> 135:   // InterpreterOopMap.  If the _bit_mask[0] in "from" points to
> 136:   // allocated space (i.e., the bit mask was to large to hold

Nit pre-existing: s/to/too/

src/hotspot/share/interpreter/oopMapCache.hpp line 138:

> 136:   // allocated space (i.e., the bit mask was to large to hold
> 137:   // in-line), allocate the space from the C heap.
> 138:   void resource_copy(OopMapCacheEntry* from);

The name `resource_copy` seems somewhat of a misnomer given it may be C heap. Is it worth changing?

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2157946377
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1665114778
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1665112766
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1665116975
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1665117670

From hgreule at openjdk.org  Thu Jul  4 06:22:31 2024
From: hgreule at openjdk.org (Hannes Greule)
Date: Thu, 4 Jul 2024 06:22:31 GMT
Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively
 throws wrong exception [v2]
In-Reply-To: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com>
References: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com>
Message-ID: <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com>

> Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods.
> 
> The exception message is less exact, but I think that's acceptable.

Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:

  address comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20015/files
  - new: https://git.openjdk.org/jdk/pull/20015/files/fe43b749..e329ceb2

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20015&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20015&range=00-01

  Stats: 43 lines in 2 files changed: 17 ins; 16 del; 10 mod
  Patch: https://git.openjdk.org/jdk/pull/20015.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20015/head:pull/20015

PR: https://git.openjdk.org/jdk/pull/20015

From stuefe at openjdk.org  Thu Jul  4 06:24:32 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 4 Jul 2024 06:24:32 GMT
Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII
 [v3]
In-Reply-To: <EliUQk2e0HZE3BQ3BKOGvF81KROy_lLp4OgK-hRWazA=.79466db9-87df-403c-a928-15e1dea8bbd5@github.com>
References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com>
 <nyLYOhw7-wSPlKjeWi3FyuLY0UzFwWJdj-19ijEInU4=.6f539aaf-0cff-4ab8-8ca0-3acd3b44d071@github.com>
 <EliUQk2e0HZE3BQ3BKOGvF81KROy_lLp4OgK-hRWazA=.79466db9-87df-403c-a928-15e1dea8bbd5@github.com>
Message-ID: <b4NrQs2S9jYAEddRJvmJelnXOXo8tGRulqW7b9Q_RO8=.0a33e434-6096-427d-940b-6f87facc3db6@github.com>

On Wed, 3 Jul 2024 17:34:14 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   exclude test for AIX
>
> This isn't really the area of my expertise, but the patch seems reasonable to me.

Many thanks, @jerboaa !

> src/hotspot/share/runtime/os.cpp line 945:
> 
>> 943: 
>> 944: ATTRIBUTE_NO_ASAN static bool read_safely_from(const uintptr_t* p, uintptr_t* result) {
>> 945:   DEBUG_ONLY(*result = 0xAAAA;)
> 
> It's not clear why this was added. Left-over?

It's intentional, to have an indication for a failing SafeFetch that we in turn fail to recognise.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19835#issuecomment-2208195396
PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1665183201

From stuefe at openjdk.org  Thu Jul  4 06:24:33 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 4 Jul 2024 06:24:33 GMT
Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII
 [v3]
In-Reply-To: <Aj6UFZiCaVQTWxFWc3SvhVTFINYza_C0NruMDH6auPU=.0ae0896a-7eaf-49a5-96e5-672ff241cfef@github.com>
References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com>
 <Aj6UFZiCaVQTWxFWc3SvhVTFINYza_C0NruMDH6auPU=.0ae0896a-7eaf-49a5-96e5-672ff241cfef@github.com>
Message-ID: <1Tis955GRVGvI0HJ7G80tvnN_wzsPZlU7v72CtbBE2s=.f6633a9f-3fe8-4a50-9c86-bf1b673d88c7@github.com>

On Tue, 25 Jun 2024 06:53:39 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   exclude test for AIX
>
> src/hotspot/share/runtime/os.cpp line 961:
> 
>> 959:   union {
>> 960:     uint64_t v;
>> 961:     uint8_t c[sizeof(v)];
> 
> Why `uint8_t` instead of `unsigned char`?

@dholmes-ora I missed your question, sorry. Both are the same, obviously, but I vaguely prefer uint8_t since its a bit shorter and has the size in its name.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1665186434

From stuefe at openjdk.org  Thu Jul  4 06:24:33 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 4 Jul 2024 06:24:33 GMT
Subject: Integrated: 8334738: os::print_hex_dump should optionally print ASCII
In-Reply-To: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com>
References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com>
Message-ID: <E62kiGYd4JUlaxg6YHZh9dzoAl2zLN8WhMmp44E5VZU=.b6e793f5-42d5-494e-b1c1-e93e824692c7@github.com>

On Fri, 21 Jun 2024 16:17:43 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> Motivated by analyzing CDS dump differences for reproducible builds, I found an optional ASCII printout to be valuable. As usual with hex dumps, ascii follows hex printout
> 
> Example:
> 
> 
> 
>    118 0x00000000000001c0:   204b444a6e65704f 53207469422d3436 4d56207265767265 6564747361662820   OpenJDK 64-Bit Server VM (fastde
>    119 0x00000000000001e0:   692d343220677562 2d6c616e7265746e 68742e636f686461 756f732e73616d6f   bug 24-internal-adhoc.thomas.sou
>    120 0x0000000000000200:   726f662029656372 612d78756e696c20 45524a203436646d 746e692d34322820   rce) for linux-amd64 JRE (24-int
>    121 0x0000000000000220:   64612d6c616e7265 6d6f68742e636f68 6372756f732e7361 6c697562202c2965   ernal-adhoc.thomas.source), buil
>    122 0x0000000000000240:   323032206e6f2074 5430322d36302d34 32313a35343a3031 672068746977205a   t on 2024-06-20T10:45:12Z with g
>    123 0x0000000000000260:   2e352e3031206363 0000000000000030 0000000000000000 0000000000000000   cc 10.5.0_______________________
>    124 0x0000000000000280:   0000000000000000 0000000000000000 0000000000000000 0000000000000000   ________________________________
>    125 0x00000000000002a0:   0000000000000000 0000000000000000 0000000000000000 0000000000000000   ________________________________
> 
> 
> The patch does that.
> 
> Small unrelated changes:
> 
> - I rewrote and extended the gtests, testing now a real-life printout containing a mixture or readable and non-readable pages, and printable and non-printable characters. I re-enabled tests on Windows, since https://bugs.openjdk.org/browse/JDK-8185734 is long solved.
> 
> - The new test uncovered an issue on 32-bit when printing giant words. We shift a signed value by 32 bits upwards, which can result in -1 resp. ffffffff in the upper half of the giant word. One of the pitfalls of intptr_t vs uintptr_t (I think most uses of intptr_t should probably use uintptr_t).
> 
> - I got tired of casting constness away from to-be-printed memory range just to be able to feed an address to os::print_hex_dump. The content printed is usually const. os::print_hex_dump does not need non-constness, but since we use address, and address is typedef char*, and one cannot declare a typedef'ed pointer target-const, the issue is there. I therefore changed the input to const uint8_t*. Maybe we need a const_address or something similar.
> 
> ----
> 
> Ran tests on Linux x64 and x86, Windows x86 and Mac aarch64. Fixed all issues I found. Only little-endian, I don't have big-endian machines and therefore  made those changes blindly. ...

This pull request has now been integrated.

Changeset: 38a578d5
Author:    Thomas Stuefe <stuefe at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/38a578d547f39c3637d97f5e0242f4a69f3bbb31
Stats:     159 lines in 7 files changed: 67 ins; 15 del; 77 mod

8334738: os::print_hex_dump should optionally print ASCII

Reviewed-by: dholmes, sgehwolf

-------------

PR: https://git.openjdk.org/jdk/pull/19835

From rehn at openjdk.org  Thu Jul  4 06:58:23 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Thu, 4 Jul 2024 06:58:23 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v21]
In-Reply-To: <tdkglfMOAm2Mg_Qj_TvnOro8oIqlOV8LKgfqgKTYFIw=.654dd861-9959-4b17-9c5a-6628f2782e3b@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <tdkglfMOAm2Mg_Qj_TvnOro8oIqlOV8LKgfqgKTYFIw=.654dd861-9959-4b17-9c5a-6628f2782e3b@github.com>
Message-ID: <lbWK_Y3hEdKr4vm76gKheXLMBYyLGvusC2svp0BKTSo=.cd4f881b-8217-46ad-8384-b95129db851d@github.com>

On Wed, 3 Jul 2024 12:53:54 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hi all, please consider!
>> 
>> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
>> Using a very small application or running very short time we have fast patchable calls.
>> But any normal application running longer will increase the code size and code chrun/fragmentation.
>> So whatever or not you get hot fast calls rely on luck.
>> 
>> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
>> This would be the common case for a patchable call.
>> 
>> Code stream:
>> JAL <trampo>
>> Stubs:
>> AUIPC
>> LD
>> JALR
>> <DEST>
>> 
>> 
>> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
>> Even if you don't have that problem having a call to a jump is not the fastest way.
>> Loading the address avoids the pitsfalls of cmodx.
>> 
>> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
>> and instead do by default:
>> 
>> Code stream:
>> AUIPC
>> LD
>> JALR
>> Stubs:
>> <DEST>
>> 
>> An experimental option for turning trampolines back on exists.
>> 
>> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
>> 
>> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
>> 
>> fop                                        (msec)    2239       |  2128       =  0.950424
>> h2                                         (msec)    18660      |  16594      =  0.889282
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> luindex                                    (msec)    2866       |  2842       =  0.991626
>> lusearch                                   (msec)    4108       |  4311       =  1.04942
>> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
>> pmd                                        (msec)    5976       |  5897       =  0.98678
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> Avg:                                       0.974112                              
>> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
>> h2(xcomp) ...
>
> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename to reloc_call

If there is no major issues, I suggest we should consider ship now.
As it is early in the cycle this will get a lot of bake-time, and there will be plenty of time to do additional changes, and even possible to change the default to trampoline calls.

I'll re-start all testing once again :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2208250089

From stuefe at openjdk.org  Thu Jul  4 07:19:19 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 4 Jul 2024 07:19:19 GMT
Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing [v2]
In-Reply-To: <j9wD-xHe-y5DPpuECd_cRVryshOhzq602wuMqqpXZi0=.389f4f4c-845e-46f4-9f46-d187a534cdf9@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
 <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com>
 <_RvHrV4nbtsAGnJq9S_98XOXciT71gMb6DfbCr6WVC0=.1278784f-b6c5-4d5e-92e6-fa21db95bc51@github.com>
 <hxa4YcsrUexAD_cqjQV9kz556tbr62l25sEdAKJf9hk=.3c1f219b-582b-46af-89c1-0b3dbcdc2c16@github.com>
 <j9wD-xHe-y5DPpuECd_cRVryshOhzq602wuMqqpXZi0=.389f4f4c-845e-46f4-9f46-d187a534cdf9@github.com>
Message-ID: <1yfgRA2s391Nqf1ONq_8MNblW3S2U_ARUncWsA4T83w=.845c926d-de90-4f4b-ba6d-bcf929d5d924@github.com>

On Mon, 1 Jul 2024 13:02:54 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> See line 141
>
> I see; thanks. Can you add a comment referencing `prepareOptions`?

Sure.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1665245390

From rehn at openjdk.org  Thu Jul  4 07:28:34 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Thu, 4 Jul 2024 07:28:34 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22]
In-Reply-To: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
Message-ID: <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com>

> Hi all, please consider!
> 
> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
> Using a very small application or running very short time we have fast patchable calls.
> But any normal application running longer will increase the code size and code chrun/fragmentation.
> So whatever or not you get hot fast calls rely on luck.
> 
> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
> This would be the common case for a patchable call.
> 
> Code stream:
> JAL <trampo>
> Stubs:
> AUIPC
> LD
> JALR
> <DEST>
> 
> 
> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
> Even if you don't have that problem having a call to a jump is not the fastest way.
> Loading the address avoids the pitsfalls of cmodx.
> 
> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
> and instead do by default:
> 
> Code stream:
> AUIPC
> LD
> JALR
> Stubs:
> <DEST>
> 
> An experimental option for turning trampolines back on exists.
> 
> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
> 
> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
> 
> fop                                        (msec)    2239       |  2128       =  0.950424
> h2                                         (msec)    18660      |  16594      =  0.889282
> jython                                     (msec)    22022      |  21925      =  0.995595
> luindex                                    (msec)    2866       |  2842       =  0.991626
> lusearch                                   (msec)    4108       |  4311       =  1.04942
> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
> pmd                                        (msec)    5976       |  5897       =  0.98678
> jython                                     (msec)    22022      |  21925      =  0.995595
> Avg:                                       0.974112                              
> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
> h2(xcomp)                                  (msec)    37719      |  38004      =  1.00756
> jython(xcomp)        ...

Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits:

 - Merge branch 'master' into 8332689
 - Rename to reloc_call
 - Merge branch 'master' into 8332689
 - Rename lc
 - Merge branch 'master' into 8332689
 - Merge branch 'master' into 8332689
 - Comments
 - Missed in merge-fixes, minor revert
 - Merge branch 'master' into 8332689
 - Minor review comments
 - ... and 21 more: https://git.openjdk.org/jdk/compare/38a578d5...9eabb5fa

-------------

Changes: https://git.openjdk.org/jdk/pull/19453/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=21
  Stats: 897 lines in 16 files changed: 622 ins; 177 del; 98 mod
  Patch: https://git.openjdk.org/jdk/pull/19453.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453

PR: https://git.openjdk.org/jdk/pull/19453

From stuefe at openjdk.org  Thu Jul  4 07:30:18 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 4 Jul 2024 07:30:18 GMT
Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing [v2]
In-Reply-To: <j9wD-xHe-y5DPpuECd_cRVryshOhzq602wuMqqpXZi0=.389f4f4c-845e-46f4-9f46-d187a534cdf9@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
 <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com>
 <_RvHrV4nbtsAGnJq9S_98XOXciT71gMb6DfbCr6WVC0=.1278784f-b6c5-4d5e-92e6-fa21db95bc51@github.com>
 <hxa4YcsrUexAD_cqjQV9kz556tbr62l25sEdAKJf9hk=.3c1f219b-582b-46af-89c1-0b3dbcdc2c16@github.com>
 <j9wD-xHe-y5DPpuECd_cRVryshOhzq602wuMqqpXZi0=.389f4f4c-845e-46f4-9f46-d187a534cdf9@github.com>
Message-ID: <tATLRKVSTICcHm46o6uumcpF_RWHC_hwAXnyXzKpofU=.b3a77778-6a8c-4ddd-9eda-724e3e4cd9ae@github.com>

On Mon, 1 Jul 2024 13:02:09 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>>> Why the explicit -Xmx64m? As I understand this is essentially the launcher, whose heap-size is of little importance.
>> 
>> No particular reason, just don't like launchers to use large heaps. I can remove it.
>> 
>>> Also, why does the launch require WhiteBoxAPI?
>> 
>> Because the launcher needs to access hostAvailableMemory in order to decide before starting the test whether it makes sense to start the test.
>
>> just don't like launchers to use large heaps.
> 
> Could you add a comment at the start of this file explaining the test setup, launcher creating another VM + real test flags? There, the rational for small heap (64M) can be covered as well.
> 
> Some text on these fields can also help understand this test.
> 
>     final static long expectedMaxNonHeapRSS = M * 256;
>     final static  long requiredAvailableBefore = heapsize * 2 + expectedMaxNonHeapRSS;
>     final static  long requiredAvailableDuring = expectedMaxNonHeapRSS;

Sure.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1665258517

From stuefe at openjdk.org  Thu Jul  4 07:33:19 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 4 Jul 2024 07:33:19 GMT
Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing [v2]
In-Reply-To: <GXA1RAdCoZKi_8VnJ9qtOsqPCuuyhO-gfjfnlvW04d8=.17a66849-d08e-4e71-ba1d-0db0c742e691@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
 <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com>
 <GXA1RAdCoZKi_8VnJ9qtOsqPCuuyhO-gfjfnlvW04d8=.17a66849-d08e-4e71-ba1d-0db0c742e691@github.com>
Message-ID: <ZnAE3KOtCPDj0viMtN2g3tU_i0-XvkrRHNnNMBIbYhQ=.66cae7cb-831a-4674-8477-1ea04c3d773a@github.com>

On Mon, 1 Jul 2024 13:09:51 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update TestAlwaysPreTouchBehavior.java
>
> test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 164:
> 
>> 162:             if (rss < committed) {
>> 163:                 if (avail < requiredAvailableDuring) {
>> 164:                     throw new SkippedException("Not enough memory for this  test (" + avail + ")");
> 
> This is essentially an early-return; why is this inside the `rss < committed` comparison? Does it work if it's lifted up? The structure I have in mind is like:
> 
> 
> if (avail < ....) {
>   skip-test;
> }
> assert(rss >= committed, error-msg);

Well, the test may have succeeded despite what we recognize as low memory conditions. The "low-memory-condition-recognition" is necessarily over-generous - we count even vaguely lowish conditions as "low." We do this to prevent false negatives, which tests like these are often plagued with.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1665262024

From stuefe at openjdk.org  Thu Jul  4 07:46:35 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 4 Jul 2024 07:46:35 GMT
Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing [v2]
In-Reply-To: <GXA1RAdCoZKi_8VnJ9qtOsqPCuuyhO-gfjfnlvW04d8=.17a66849-d08e-4e71-ba1d-0db0c742e691@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
 <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com>
 <GXA1RAdCoZKi_8VnJ9qtOsqPCuuyhO-gfjfnlvW04d8=.17a66849-d08e-4e71-ba1d-0db0c742e691@github.com>
Message-ID: <uyUvbKGKPu8Y_SL77V20x0s8wTNu-sJI3DNOXYAFkzM=.6ae7c214-0d28-4c84-8e8e-bb3fd3f5a977@github.com>

On Mon, 1 Jul 2024 13:10:39 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update TestAlwaysPreTouchBehavior.java
>
> Some readability suggestions.

Hi @albertnetymk, thanks a lot for your review. I added a comment explaining the test ratio and -setup.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19803#issuecomment-2208322749

From stuefe at openjdk.org  Thu Jul  4 07:46:35 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 4 Jul 2024 07:46:35 GMT
Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing [v3]
In-Reply-To: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
Message-ID: <ri-bl1TeDnmiHjJX4EDIElQZl_ff5vIl_Al7RrbFCUg=.9b2952ee-9c14-42ab-aa50-eb5dd60fcce8@github.com>

> See JBS issue.
> 
> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure.
> 
> The patch:
> - exposes os::available_memory via Whitebox
> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException`
> 
> I have some misgivings about this solution, though:
> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. 
> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions)
> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException.
> 
> Despite my doubts, I think this is the best we can come up with if we want to have such a test.
> 
> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice.

Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:

 - Merge branch 'master' into JDK-8334513-New-test-gc-TestAlwaysPreTouchBehavior-java-is-failing
 - comments for albert
 - Update TestAlwaysPreTouchBehavior.java
 - tweaks
 - fixes
 - Merge branch 'master' into JDK-8334513-New-test-gc-TestAlwaysPreTouchBehavior-java-is-failing
 - fix

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19803/files
  - new: https://git.openjdk.org/jdk/pull/19803/files/19ed5833..a25bfca4

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19803&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19803&range=01-02

  Stats: 14736 lines in 498 files changed: 9695 ins; 3004 del; 2037 mod
  Patch: https://git.openjdk.org/jdk/pull/19803.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19803/head:pull/19803

PR: https://git.openjdk.org/jdk/pull/19803

From stuefe at openjdk.org  Thu Jul  4 07:49:32 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 4 Jul 2024 07:49:32 GMT
Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing [v4]
In-Reply-To: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
Message-ID: <aiYvWQf9AqVWcE_I4yy5e4l1CL3pN9KjZyYEMa0t0N8=.67276cf4-29f0-41d7-8ef2-a1eb1d4dc68e@github.com>

> See JBS issue.
> 
> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure.
> 
> The patch:
> - exposes os::available_memory via Whitebox
> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException`
> 
> I have some misgivings about this solution, though:
> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. 
> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions)
> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException.
> 
> Despite my doubts, I think this is the best we can come up with if we want to have such a test.
> 
> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice.

Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:

  comma

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19803/files
  - new: https://git.openjdk.org/jdk/pull/19803/files/a25bfca4..eba72ed9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19803&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19803&range=02-03

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/19803.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19803/head:pull/19803

PR: https://git.openjdk.org/jdk/pull/19803

From stuefe at openjdk.org  Thu Jul  4 07:53:01 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 4 Jul 2024 07:53:01 GMT
Subject: RFR: 8330174: Establish no-access zone at the start of Klass
 encoding range [v4]
In-Reply-To: <9RShpjQGr5MI3aqK6VqpYgDiUJS3q_Q6Bdo4jWmtJ5g=.764b3747-69be-4a70-a599-d6cb9a02bddd@github.com>
References: <9RShpjQGr5MI3aqK6VqpYgDiUJS3q_Q6Bdo4jWmtJ5g=.764b3747-69be-4a70-a599-d6cb9a02bddd@github.com>
Message-ID: <ZTQ7TB_8UHD8r9W96IsXtV2TO2X2DLwRaf4tgVHVpuA=.d93ccc2a-ab86-47f4-8c8c-a38040f84eef@github.com>

> After having reserved an address range for the Klass encoding range, we either:
> a) Place CDS, then class space, into that address range
> b) Place only class space in that range (if CDS is off).
> 
> For an nKlass of 0, the decoded Klasspointer points to the beginning of the encoding range. Since nKlass=0 is a special value, both CDS (a) and Metaspace (b) ensure that no Klass is placed right at the start of the Klass range.
> 
> However, it would also be good to establish a no-access zone at the range's start. Dereferencing an nKlass=0 would then result in an immediate, obvious crash instead of in reading invalid data.
> 
> This would closely mimic what we do in the compressed-oops-enabled java heap (albeit there we do it for fault-based null checks, too) and what Operating Systems do with low-address ranges.
> 
> ---
> 
> The patch:
> 
> We can neither move the encoding base down one page (the encoding base is carefully chosen to fit the platform's decoding). Nor can we move CDS archive space up one page (since CDS relies on the archive being placed exactly at the encoding base address). Nor do we want to move class space up (since class space start has a high alignment requirement of 16MB, protection zone would need to be 16MB large, which is a waste of address space).
> 
> Instead, as before, we just let Metaspace and CDS handle the protection zone internally. For Metaspace, this is very simple. We just protect the first page of class space. 
> 
> For CDS, it is a tiny bit more complex since we need to leave a "protection-zone-shaped hole" in the first region of the archive when we dump it. We do just that and then give that region a new property, "has protection zone". At runtime, we protect the underlying memory if a mapped region has a protection zone.
> 
> With CDS, because the page size can differ between dump- and runtime, the protection zone is the size of CDS core region alignment, not page-sized (e.g. dumping on Linux aarch64 with 4KB pages shall generate an archive that can be used in Docker on MacOS with 16KB pages).
> 
> ----
> 
> Tests:
> - ran CDS and AppCDS jtreg tests manually on Mac m1
> - manually tested that decoding, then dereferencing an nKlass=0 gives us the new "Fault address is narrow Klass base - dereferencing a zero nKlass?" output in the hs-err file
> - GHAs (which include the new regression test)

Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:

 - Merge branch 'openjdk:master' into cds-metaspace-prot-prefix
 - Merge branch 'openjdk:master' into cds-metaspace-prot-prefix
 - Merge branch 'openjdk:master' into cds-metaspace-prot-prefix
 - Merge branch 'openjdk:master' into cds-metaspace-prot-prefix
 - Update metaspace.cpp
 - cds-metaspace-prot-prefix

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19290/files
  - new: https://git.openjdk.org/jdk/pull/19290/files/2ccd527d..ee869fc6

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19290&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19290&range=02-03

  Stats: 35871 lines in 820 files changed: 23818 ins; 8212 del; 3841 mod
  Patch: https://git.openjdk.org/jdk/pull/19290.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19290/head:pull/19290

PR: https://git.openjdk.org/jdk/pull/19290

From eosterlund at openjdk.org  Thu Jul  4 07:54:22 2024
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Thu, 4 Jul 2024 07:54:22 GMT
Subject: RFR: 8334890: Missing unconditional cross modifying fence in
 nmethod entry barriers
In-Reply-To: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com>
References: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com>
Message-ID: <yjmiVgrkuSxz7OHkp_Pg5lhOlu6mGoA4L46ZcmXhgxk=.8e6ed7af-6bef-4e43-8161-79156f76afaf@github.com>

On Tue, 2 Jul 2024 15:43:08 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code.
> 
> We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher.
> 
> However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification.
> 
> There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully.
> 
> This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly.

Performance results are neutral, as expected. Tier1-5 passed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19990#issuecomment-2208337400

From aboldtch at openjdk.org  Thu Jul  4 08:01:18 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Thu, 4 Jul 2024 08:01:18 GMT
Subject: RFR: 8334890: Missing unconditional cross modifying fence in
 nmethod entry barriers
In-Reply-To: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com>
References: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com>
Message-ID: <AJ60YmU3wkp4rz9Q8noIiOCF6BuAaSicDOEGmziLn4s=.f4532050-f64f-4725-b825-9cc36cec2813@github.com>

On Tue, 2 Jul 2024 15:43:08 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code.
> 
> We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher.
> 
> However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification.
> 
> There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully.
> 
> This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly.

The always fence changes looks good.

The effects w.r.t. `DeoptimizeNMethodBarriersALot` and our testing is less clear to me.

src/hotspot/share/gc/shared/barrierSetNMethod.cpp line 197:

> 195:   if (DeoptimizeNMethodBarriersALot && !nm->is_osr_method()) {
> 196:     static volatile uint32_t counter=0;
> 197:     if (Atomic::add(&counter, 1u) % 10 == 0) {

I have not good intuition about the frequency of this, and how this affects things. So have hard time commenting on this change.

An alternative would be to just fence when the `nmethod` is already disarmed and not call `bs_nm->nmethod_entry_barrier(nm);`. This is what effectively already happens in all implementations of `nmethod_entry_barrier` after [JDK-8331911](https://bugs.openjdk.org/browse/JDK-8331911) / #19285. (Maintaining the current behaviour with respect to `DeoptimizeNMethodBarriersALot`). 

But as long as this new magic constant seems to have similar testing coverage when running our tests that use `DeoptimizeNMethodBarriersALot` this seems like a sensible solution as well.

-------------

Marked as reviewed by aboldtch (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19990#pullrequestreview-2158224060
PR Review Comment: https://git.openjdk.org/jdk/pull/19990#discussion_r1665284317

From kbarrett at openjdk.org  Thu Jul  4 08:08:24 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Thu, 4 Jul 2024 08:08:24 GMT
Subject: RFR: 8334890: Missing unconditional cross modifying fence in
 nmethod entry barriers
In-Reply-To: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com>
References: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com>
Message-ID: <GPFw-GRnWeKbjJILxoF3yPuXHTk-PNP5Rs2tQidkjIo=.00fa8b12-2e54-4304-ab8d-d19e04a5d249@github.com>

On Tue, 2 Jul 2024 15:43:08 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code.
> 
> We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher.
> 
> However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification.
> 
> There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully.
> 
> This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly.

Looks good.

src/hotspot/share/gc/shared/barrierSetNMethod.cpp line 195:

> 193:   // Diagnostic option to force deoptimization 1 in 10 times. It is otherwise
> 194:   // a very rare event.
> 195:   if (DeoptimizeNMethodBarriersALot && !nm->is_osr_method()) {

This could also include may_enter in the conditions, since the effect of this bit of code is to set it false.
But maybe that's a rare thing and not worth checking for here.

-------------

Marked as reviewed by kbarrett (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19990#pullrequestreview-2158259609
PR Review Comment: https://git.openjdk.org/jdk/pull/19990#discussion_r1665305892

From sgehwolf at openjdk.org  Thu Jul  4 08:10:23 2024
From: sgehwolf at openjdk.org (Severin Gehwolf)
Date: Thu, 4 Jul 2024 08:10:23 GMT
Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII
 [v3]
In-Reply-To: <b4NrQs2S9jYAEddRJvmJelnXOXo8tGRulqW7b9Q_RO8=.0a33e434-6096-427d-940b-6f87facc3db6@github.com>
References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com>
 <nyLYOhw7-wSPlKjeWi3FyuLY0UzFwWJdj-19ijEInU4=.6f539aaf-0cff-4ab8-8ca0-3acd3b44d071@github.com>
 <EliUQk2e0HZE3BQ3BKOGvF81KROy_lLp4OgK-hRWazA=.79466db9-87df-403c-a928-15e1dea8bbd5@github.com>
 <b4NrQs2S9jYAEddRJvmJelnXOXo8tGRulqW7b9Q_RO8=.0a33e434-6096-427d-940b-6f87facc3db6@github.com>
Message-ID: <6CdB1rRz9eL_2DXfwAGmWbNOW8QAx66YaZyPFQ53RCo=.fad303a5-e94b-478a-ab6f-a28e2e67e7a8@github.com>

On Thu, 4 Jul 2024 06:17:45 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/runtime/os.cpp line 945:
>> 
>>> 943: 
>>> 944: ATTRIBUTE_NO_ASAN static bool read_safely_from(const uintptr_t* p, uintptr_t* result) {
>>> 945:   DEBUG_ONLY(*result = 0xAAAA;)
>> 
>> It's not clear why this was added. Left-over?
>
> It's intentional, to have an indication for a failing SafeFetch that we in turn fail to recognise.

OK, thanks for the explanation.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1665309552

From stuefe at openjdk.org  Thu Jul  4 08:23:21 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 4 Jul 2024 08:23:21 GMT
Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing [v2]
In-Reply-To: <rnPsj7tJBhXHpNZ_Cubn3LdzsXi_u8vjFjWIlTiE9-s=.acda4cb0-7fe3-40ce-aac4-7fb5d7deef1a@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
 <C7DmVK87y-6S5Ljq58--YbyaFND0i03LQEGzpn0FBrY=.fd0e1e6f-86bb-44fa-b61d-1853a48e2fd7@github.com>
 <rnPsj7tJBhXHpNZ_Cubn3LdzsXi_u8vjFjWIlTiE9-s=.acda4cb0-7fe3-40ce-aac4-7fb5d7deef1a@github.com>
Message-ID: <ZhyduzWKIbuw3PC9eJb2IXpmrEOuihapCp9MUubHiuI=.74c0787f-6a48-45d9-82d1-c109e704d19e@github.com>

On Wed, 3 Jul 2024 06:08:34 GMT, Liming Liu <duke at openjdk.org> wrote:

> Could you please confirm whether it is related to JDK-8335167?

Both failures may be symptom of the same issue. See my comment in JBS.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19803#issuecomment-2208388841

From aboldtch at openjdk.org  Thu Jul  4 08:23:26 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Thu, 4 Jul 2024 08:23:26 GMT
Subject: Integrated: 8335397: Improve reliability of
 TestRecursiveMonitorChurn.java
In-Reply-To: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
Message-ID: <TP3HJj8byl2IagJ4KFHtqUs5xaISnHC0yL8jcCZbx10=.bf3da5de-6385-4353-b6fd-d574ca2aac31@github.com>

On Mon, 1 Jul 2024 09:21:13 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation.
> 
> Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT.

This pull request has now been integrated.

Changeset: b20e8c8e
Author:    Axel Boldt-Christmas <aboldtch at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/b20e8c8e85e0a0e96ae648f42ff803f1c83f6291
Stats:     77 lines in 5 files changed: 28 ins; 31 del; 18 mod

8335397: Improve reliability of TestRecursiveMonitorChurn.java

Reviewed-by: coleenp, rkennke, dholmes

-------------

PR: https://git.openjdk.org/jdk/pull/19965

From aboldtch at openjdk.org  Thu Jul  4 08:23:25 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Thu, 4 Jul 2024 08:23:25 GMT
Subject: RFR: 8335397: Improve reliability of
 TestRecursiveMonitorChurn.java [v3]
In-Reply-To: <HiqUmFciHGmdjRTa9B3RfxRxstbd6BA-QfZck-y5wBE=.98e4d5f3-4d47-406a-8de4-c712ca48d24f@github.com>
References: <T8MKz8vkeTMpY_mF99GXLNRdMmECDSQVj0TT7u9LVpU=.34c46d26-dd1d-443a-8d96-92796d8a0b5c@github.com>
 <HiqUmFciHGmdjRTa9B3RfxRxstbd6BA-QfZck-y5wBE=.98e4d5f3-4d47-406a-8de4-c712ca48d24f@github.com>
Message-ID: <hCTzMAa1TsGw7xRGynwyawGuk_R4hiie-rNWkGycItQ=.56f8bd10-060c-4f7b-8e9d-20f47a44a5da@github.com>

On Wed, 3 Jul 2024 07:25:48 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> TestRecursiveMonitorChurn.java currently uses NMT to try and correlate the native memory increase with unwanted inflation.
>> 
>> Change to instead query the JVM for exact number of inflations via the Whitebox API. This allow us to both be more exact and less dependent on interactions with NMT.
>
> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update test/hotspot/jtreg/runtime/locking/TestRecursiveMonitorChurn.java

Thanks for the reviews.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19965#issuecomment-2208388161

From fyang at openjdk.org  Thu Jul  4 09:28:24 2024
From: fyang at openjdk.org (Fei Yang)
Date: Thu, 4 Jul 2024 09:28:24 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22]
In-Reply-To: <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com>
Message-ID: <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com>

On Thu, 4 Jul 2024 07:28:34 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hi all, please consider!
>> 
>> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
>> Using a very small application or running very short time we have fast patchable calls.
>> But any normal application running longer will increase the code size and code chrun/fragmentation.
>> So whatever or not you get hot fast calls rely on luck.
>> 
>> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
>> This would be the common case for a patchable call.
>> 
>> Code stream:
>> JAL <trampo>
>> Stubs:
>> AUIPC
>> LD
>> JALR
>> <DEST>
>> 
>> 
>> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
>> Even if you don't have that problem having a call to a jump is not the fastest way.
>> Loading the address avoids the pitsfalls of cmodx.
>> 
>> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
>> and instead do by default:
>> 
>> Code stream:
>> AUIPC
>> LD
>> JALR
>> Stubs:
>> <DEST>
>> 
>> An experimental option for turning trampolines back on exists.
>> 
>> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
>> 
>> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
>> 
>> fop                                        (msec)    2239       |  2128       =  0.950424
>> h2                                         (msec)    18660      |  16594      =  0.889282
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> luindex                                    (msec)    2866       |  2842       =  0.991626
>> lusearch                                   (msec)    4108       |  4311       =  1.04942
>> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
>> pmd                                        (msec)    5976       |  5897       =  0.98678
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> Avg:                                       0.974112                              
>> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
>> h2(xcomp) ...
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits:
> 
>  - Merge branch 'master' into 8332689
>  - Rename to reloc_call
>  - Merge branch 'master' into 8332689
>  - Rename lc
>  - Merge branch 'master' into 8332689
>  - Merge branch 'master' into 8332689
>  - Comments
>  - Missed in merge-fixes, minor revert
>  - Merge branch 'master' into 8332689
>  - Minor review comments
>  - ... and 21 more: https://git.openjdk.org/jdk/compare/38a578d5...9eabb5fa

All right. I think this has been thoroughly checked. Looks good modulo one small question.

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 987:

> 985:   assert(is_simm32(distance), "Must be");
> 986:   Assembler::auipc(temp, (int32_t)distance + 0x800);
> 987:   Assembler::_ld(temp, temp, ((int32_t)distance << 20) >> 20);

Question: Why would you use this low-level `Assembler::_ld` here?

-------------

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19453#pullrequestreview-2158438628
PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1665415011

From rehn at openjdk.org  Thu Jul  4 10:00:23 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Thu, 4 Jul 2024 10:00:23 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22]
In-Reply-To: <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com>
 <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com>
Message-ID: <RpC7ys1vEQoTjqXMQ68o68ZhWEdJcJi9msPTqz63Xl4=.d798c00c-38a2-48d4-aab5-ebd124441ef4@github.com>

On Thu, 4 Jul 2024 09:22:46 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits:
>> 
>>  - Merge branch 'master' into 8332689
>>  - Rename to reloc_call
>>  - Merge branch 'master' into 8332689
>>  - Rename lc
>>  - Merge branch 'master' into 8332689
>>  - Merge branch 'master' into 8332689
>>  - Comments
>>  - Missed in merge-fixes, minor revert
>>  - Merge branch 'master' into 8332689
>>  - Minor review comments
>>  - ... and 21 more: https://git.openjdk.org/jdk/compare/38a578d5...9eabb5fa
>
> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 987:
> 
>> 985:   assert(is_simm32(distance), "Must be");
>> 986:   Assembler::auipc(temp, (int32_t)distance + 0x800);
>> 987:   Assembler::_ld(temp, temp, ((int32_t)distance << 20) >> 20);
> 
> Question: Why would you use this low-level `Assembler::_ld` here?

I used to be excplicit about this is the normal **ld**.
But I see I did not do the same for **jalr** => **_jalr**. (as we are in MASM we can use 'private' method).
I'll change to normal ld and add an assert that we are in a incompressable region?

(I would like to revert that at some time, so the user of reloc_call don't need to know about it needs incompressable for reloc_call)

Suggested:

@@ -982,0 +983 @@ void MacroAssembler::load_link_jump(const address source, Register temp) {
+  assert(!in_compressible_region(), "Must be");
@@ -987 +988 @@ void MacroAssembler::load_link_jump(const address source, Register temp) {
-  Assembler::_ld(temp, temp, ((int32_t)distance << 20) >> 20);
+  Assembler::ld(temp, temp, ((int32_t)distance << 20) >> 20);

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1665465771

From eosterlund at openjdk.org  Thu Jul  4 10:08:24 2024
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Thu, 4 Jul 2024 10:08:24 GMT
Subject: RFR: 8334890: Missing unconditional cross modifying fence in
 nmethod entry barriers
In-Reply-To: <GPFw-GRnWeKbjJILxoF3yPuXHTk-PNP5Rs2tQidkjIo=.00fa8b12-2e54-4304-ab8d-d19e04a5d249@github.com>
References: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com>
 <GPFw-GRnWeKbjJILxoF3yPuXHTk-PNP5Rs2tQidkjIo=.00fa8b12-2e54-4304-ab8d-d19e04a5d249@github.com>
Message-ID: <wtWJVsoy68A6CH-hZDosoR8bprqOcJLXfGOEFbQGBEU=.c895fab0-1de8-4ffd-9991-1ed4bba6b637@github.com>

On Thu, 4 Jul 2024 08:05:15 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code.
>> 
>> We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher.
>> 
>> However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification.
>> 
>> There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully.
>> 
>> This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly.
>
> Looks good.

Thanks for the reviews @kimbarrett and @xmas92!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19990#issuecomment-2208600511

From eosterlund at openjdk.org  Thu Jul  4 10:08:25 2024
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Thu, 4 Jul 2024 10:08:25 GMT
Subject: Integrated: 8334890: Missing unconditional cross modifying fence in
 nmethod entry barriers
In-Reply-To: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com>
References: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com>
Message-ID: <08nCRUZWk-JXV63xf5M1oU2_5fVS5hRsHe75eaSOrSs=.b7c41e14-211f-4bbb-bf9b-88cf454704d2@github.com>

On Tue, 2 Jul 2024 15:43:08 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code.
> 
> We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher.
> 
> However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification.
> 
> There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully.
> 
> This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly.

This pull request has now been integrated.

Changeset: c0604fb8
Author:    Erik ?sterlund <eosterlund at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/c0604fb823d9f3b2e347a9857b11606b223ad8ec
Stats:     21 lines in 1 file changed: 1 ins; 17 del; 3 mod

8334890: Missing unconditional cross modifying fence in nmethod entry barriers

Reviewed-by: aboldtch, kbarrett

-------------

PR: https://git.openjdk.org/jdk/pull/19990

From fyang at openjdk.org  Thu Jul  4 11:07:21 2024
From: fyang at openjdk.org (Fei Yang)
Date: Thu, 4 Jul 2024 11:07:21 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22]
In-Reply-To: <RpC7ys1vEQoTjqXMQ68o68ZhWEdJcJi9msPTqz63Xl4=.d798c00c-38a2-48d4-aab5-ebd124441ef4@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com>
 <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com>
 <RpC7ys1vEQoTjqXMQ68o68ZhWEdJcJi9msPTqz63Xl4=.d798c00c-38a2-48d4-aab5-ebd124441ef4@github.com>
Message-ID: <dSl5ihkZZiE0ZM_a69HgNUOwBPs-Dnd3rmwYOBmaCLc=.15750165-6919-4f8c-9177-08f0dfee8e18@github.com>

On Thu, 4 Jul 2024 09:57:44 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 987:
>> 
>>> 985:   assert(is_simm32(distance), "Must be");
>>> 986:   Assembler::auipc(temp, (int32_t)distance + 0x800);
>>> 987:   Assembler::_ld(temp, temp, ((int32_t)distance << 20) >> 20);
>> 
>> Question: Why would you use this low-level `Assembler::_ld` here?
>
> I used to be excplicit about this is the normal **ld**.
> But I see I did not do the same for **jalr** => **_jalr**. (as we are in MASM we can use 'private' method).
> I'll change to normal ld and add an assert that we are in a incompressable region?
> 
> (I would like to revert that at some time, so the user of reloc_call don't need to know about it needs incompressable for reloc_call)
> 
> Suggested:
> 
> @@ -982,0 +983 @@ void MacroAssembler::load_link_jump(const address source, Register temp) {
> +  assert(!in_compressible_region(), "Must be");
> @@ -987 +988 @@ void MacroAssembler::load_link_jump(const address source, Register temp) {
> -  Assembler::_ld(temp, temp, ((int32_t)distance << 20) >> 20);
> +  Assembler::ld(temp, temp, ((int32_t)distance << 20) >> 20);

Seems no need to worry about that? I think it's up to the caller to decide if it wants compressed instructions. Here in this case, the only call site is within a relocate() which will disable compressed instructions for you[1].

  relocate(entry.rspec(), [&] {
    load_link_jump(target);
  });

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/assembler_riscv.hpp#L2130

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1665546545

From xpeng at openjdk.org  Thu Jul  4 11:16:19 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Thu, 4 Jul 2024 11:16:19 GMT
Subject: RFR: 8334231: Optimize MethodData layout
In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
Message-ID: <P0oXQu4hhYp8JVZOsPJVL6erv1hvzNOnBYpRyjmg82k=.c4af4468-ba02-40ab-a1c7-727bc3690287@github.com>

On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> Hi all,
>      This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of  MethodData. Here is the original layout from `pahole`:
> 
> class MethodData : public Metadata {
> public:
> 
> 	/* class Metadata            <ancestor>; */      /*     0     0 */
> 
> 	/* XXX 8 bytes hole, try to pack */
> 
> 	class Method *             _method;              /*     8     8 */
> 	int                        _size;                /*    16     4 */
> 	int                        _hint_di;             /*    20     4 */
> 	class Mutex               _extra_data_lock;      /*    24   104 */
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	class CompilerCounters    _compiler_counters;    /*   128    80 */
> 	/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
> 	intx                       _eflags;              /*   208     8 */
> 	intx                       _arg_local;           /*   216     8 */
> 	intx                       _arg_stack;           /*   224     8 */
> 	intx                       _arg_returned;        /*   232     8 */
> 	int                        _creation_mileage;    /*   240     4 */
> 	class InvocationCounter   _invocation_counter;   /*   244     4 */
> 	class InvocationCounter   _backedge_counter;     /*   248     4 */
> 	int                        _invocation_counter_start; /*   252     4 */
> 	/* --- cacheline 4 boundary (256 bytes) --- */
> 	int                        _backedge_counter_start; /*   256     4 */
> 	uint                       _tenure_traps;        /*   260     4 */
> 	int                        _invoke_mask;         /*   264     4 */
> 	int                        _backedge_mask;       /*   268     4 */
> 	short int                  _num_loops;           /*   272     2 */
> 	short int                  _num_blocks;          /*   274     2 */
> 	enum WouldProfile          _would_profile;       /*   276     4 */
> 	int                        _jvmci_ir_size;       /*   280     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	class FailedSpeculation *  _failed_speculations; /*   288     8 */
> 	int                        _data_size;           /*   296     4 */
> 	int                        _parameters_type_data_di; /*   300     4 */
> 	int                        _exception_handler_data_di; /*   304     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	intptr_t                   _data[1];             /*   312     8 */
> 
> 	/* size: 320, cachelines: 5, members: 27 */
> 	/* sum members: 304, holes: 3, sum holes: 16 */
> };
> 
> 
> There are 3 holes ...

Thanks David! 

I assume it is trivial, what do you think? @dholmes-ora

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20019#issuecomment-2208714610

From mli at openjdk.org  Thu Jul  4 11:21:28 2024
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 4 Jul 2024 11:21:28 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22]
In-Reply-To: <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com>
Message-ID: <d--581tMPd5ifCiz-9Y4aSkO4aTz1xsjHymjTTBmeyU=.8663438a-15f6-4069-becc-b6d3db91c27b@github.com>

On Thu, 4 Jul 2024 07:28:34 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hi all, please consider!
>> 
>> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
>> Using a very small application or running very short time we have fast patchable calls.
>> But any normal application running longer will increase the code size and code chrun/fragmentation.
>> So whatever or not you get hot fast calls rely on luck.
>> 
>> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
>> This would be the common case for a patchable call.
>> 
>> Code stream:
>> JAL <trampo>
>> Stubs:
>> AUIPC
>> LD
>> JALR
>> <DEST>
>> 
>> 
>> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
>> Even if you don't have that problem having a call to a jump is not the fastest way.
>> Loading the address avoids the pitsfalls of cmodx.
>> 
>> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
>> and instead do by default:
>> 
>> Code stream:
>> AUIPC
>> LD
>> JALR
>> Stubs:
>> <DEST>
>> 
>> An experimental option for turning trampolines back on exists.
>> 
>> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
>> 
>> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
>> 
>> fop                                        (msec)    2239       |  2128       =  0.950424
>> h2                                         (msec)    18660      |  16594      =  0.889282
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> luindex                                    (msec)    2866       |  2842       =  0.991626
>> lusearch                                   (msec)    4108       |  4311       =  1.04942
>> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
>> pmd                                        (msec)    5976       |  5897       =  0.98678
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> Avg:                                       0.974112                              
>> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
>> h2(xcomp) ...
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits:
> 
>  - Merge branch 'master' into 8332689
>  - Rename to reloc_call
>  - Merge branch 'master' into 8332689
>  - Rename lc
>  - Merge branch 'master' into 8332689
>  - Merge branch 'master' into 8332689
>  - Comments
>  - Missed in merge-fixes, minor revert
>  - Merge branch 'master' into 8332689
>  - Minor review comments
>  - ... and 21 more: https://git.openjdk.org/jdk/compare/38a578d5...9eabb5fa

I agree, let's move forward, and remove the old one if possible.

-------------

Marked as reviewed by mli (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19453#pullrequestreview-2158687339

From chagedorn at openjdk.org  Thu Jul  4 11:30:19 2024
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Thu, 4 Jul 2024 11:30:19 GMT
Subject: RFR: 8334231: Optimize MethodData layout
In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
Message-ID: <tqL2MTD8hE0Dq2mpMhT80f3kscrUPkzY8NNMyJbZzFo=.c9857e46-7d65-4124-960f-5d6878b03a91@github.com>

On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> Hi all,
>      This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of  MethodData. Here is the original layout from `pahole`:
> 
> class MethodData : public Metadata {
> public:
> 
> 	/* class Metadata            <ancestor>; */      /*     0     0 */
> 
> 	/* XXX 8 bytes hole, try to pack */
> 
> 	class Method *             _method;              /*     8     8 */
> 	int                        _size;                /*    16     4 */
> 	int                        _hint_di;             /*    20     4 */
> 	class Mutex               _extra_data_lock;      /*    24   104 */
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	class CompilerCounters    _compiler_counters;    /*   128    80 */
> 	/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
> 	intx                       _eflags;              /*   208     8 */
> 	intx                       _arg_local;           /*   216     8 */
> 	intx                       _arg_stack;           /*   224     8 */
> 	intx                       _arg_returned;        /*   232     8 */
> 	int                        _creation_mileage;    /*   240     4 */
> 	class InvocationCounter   _invocation_counter;   /*   244     4 */
> 	class InvocationCounter   _backedge_counter;     /*   248     4 */
> 	int                        _invocation_counter_start; /*   252     4 */
> 	/* --- cacheline 4 boundary (256 bytes) --- */
> 	int                        _backedge_counter_start; /*   256     4 */
> 	uint                       _tenure_traps;        /*   260     4 */
> 	int                        _invoke_mask;         /*   264     4 */
> 	int                        _backedge_mask;       /*   268     4 */
> 	short int                  _num_loops;           /*   272     2 */
> 	short int                  _num_blocks;          /*   274     2 */
> 	enum WouldProfile          _would_profile;       /*   276     4 */
> 	int                        _jvmci_ir_size;       /*   280     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	class FailedSpeculation *  _failed_speculations; /*   288     8 */
> 	int                        _data_size;           /*   296     4 */
> 	int                        _parameters_type_data_di; /*   300     4 */
> 	int                        _exception_handler_data_di; /*   304     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	intptr_t                   _data[1];             /*   312     8 */
> 
> 	/* size: 320, cachelines: 5, members: 27 */
> 	/* sum members: 304, holes: 3, sum holes: 16 */
> };
> 
> 
> There are 3 holes ...

Looks good to me, too. Yes, I think this is trivial.

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20019#pullrequestreview-2158700920

From xpeng at openjdk.org  Thu Jul  4 11:30:19 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Thu, 4 Jul 2024 11:30:19 GMT
Subject: RFR: 8334231: Optimize MethodData layout
In-Reply-To: <tqL2MTD8hE0Dq2mpMhT80f3kscrUPkzY8NNMyJbZzFo=.c9857e46-7d65-4124-960f-5d6878b03a91@github.com>
References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
 <tqL2MTD8hE0Dq2mpMhT80f3kscrUPkzY8NNMyJbZzFo=.c9857e46-7d65-4124-960f-5d6878b03a91@github.com>
Message-ID: <Ty-mRJRi5dDG3ybiCqNj1xT3cDTOABdMZZ9aseGchvw=.c4cd3af8-1ded-4f76-9591-5e2e7663ed1b@github.com>

On Thu, 4 Jul 2024 11:25:39 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Hi all,
>>      This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of  MethodData. Here is the original layout from `pahole`:
>> 
>> class MethodData : public Metadata {
>> public:
>> 
>> 	/* class Metadata            <ancestor>; */      /*     0     0 */
>> 
>> 	/* XXX 8 bytes hole, try to pack */
>> 
>> 	class Method *             _method;              /*     8     8 */
>> 	int                        _size;                /*    16     4 */
>> 	int                        _hint_di;             /*    20     4 */
>> 	class Mutex               _extra_data_lock;      /*    24   104 */
>> 	/* --- cacheline 2 boundary (128 bytes) --- */
>> 	class CompilerCounters    _compiler_counters;    /*   128    80 */
>> 	/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
>> 	intx                       _eflags;              /*   208     8 */
>> 	intx                       _arg_local;           /*   216     8 */
>> 	intx                       _arg_stack;           /*   224     8 */
>> 	intx                       _arg_returned;        /*   232     8 */
>> 	int                        _creation_mileage;    /*   240     4 */
>> 	class InvocationCounter   _invocation_counter;   /*   244     4 */
>> 	class InvocationCounter   _backedge_counter;     /*   248     4 */
>> 	int                        _invocation_counter_start; /*   252     4 */
>> 	/* --- cacheline 4 boundary (256 bytes) --- */
>> 	int                        _backedge_counter_start; /*   256     4 */
>> 	uint                       _tenure_traps;        /*   260     4 */
>> 	int                        _invoke_mask;         /*   264     4 */
>> 	int                        _backedge_mask;       /*   268     4 */
>> 	short int                  _num_loops;           /*   272     2 */
>> 	short int                  _num_blocks;          /*   274     2 */
>> 	enum WouldProfile          _would_profile;       /*   276     4 */
>> 	int                        _jvmci_ir_size;       /*   280     4 */
>> 
>> 	/* XXX 4 bytes hole, try to pack */
>> 
>> 	class FailedSpeculation *  _failed_speculations; /*   288     8 */
>> 	int                        _data_size;           /*   296     4 */
>> 	int                        _parameters_type_data_di; /*   300     4 */
>> 	int                        _exception_handler_data_di; /*   304     4 */
>> 
>> 	/* XXX 4 bytes hole, try to pack */
>> 
>> 	intptr_t                   _data[1];             /*   312     8 */
>> 
>> 	/* size: 320, cachelin...
>
> Looks good to me, too. Yes, I think this is trivial.

Thank you for the review! @chhagedorn

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20019#issuecomment-2208737190

From duke at openjdk.org  Thu Jul  4 11:30:19 2024
From: duke at openjdk.org (duke)
Date: Thu, 4 Jul 2024 11:30:19 GMT
Subject: RFR: 8334231: Optimize MethodData layout
In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
Message-ID: <3eechMaFiFrPJ2ak8xsLpFhwvXtT8caB3UxZZ_EWI6Y=.2fbe9e2c-5b3f-4f78-8a94-64b616e31e4d@github.com>

On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> Hi all,
>      This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of  MethodData. Here is the original layout from `pahole`:
> 
> class MethodData : public Metadata {
> public:
> 
> 	/* class Metadata            <ancestor>; */      /*     0     0 */
> 
> 	/* XXX 8 bytes hole, try to pack */
> 
> 	class Method *             _method;              /*     8     8 */
> 	int                        _size;                /*    16     4 */
> 	int                        _hint_di;             /*    20     4 */
> 	class Mutex               _extra_data_lock;      /*    24   104 */
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	class CompilerCounters    _compiler_counters;    /*   128    80 */
> 	/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
> 	intx                       _eflags;              /*   208     8 */
> 	intx                       _arg_local;           /*   216     8 */
> 	intx                       _arg_stack;           /*   224     8 */
> 	intx                       _arg_returned;        /*   232     8 */
> 	int                        _creation_mileage;    /*   240     4 */
> 	class InvocationCounter   _invocation_counter;   /*   244     4 */
> 	class InvocationCounter   _backedge_counter;     /*   248     4 */
> 	int                        _invocation_counter_start; /*   252     4 */
> 	/* --- cacheline 4 boundary (256 bytes) --- */
> 	int                        _backedge_counter_start; /*   256     4 */
> 	uint                       _tenure_traps;        /*   260     4 */
> 	int                        _invoke_mask;         /*   264     4 */
> 	int                        _backedge_mask;       /*   268     4 */
> 	short int                  _num_loops;           /*   272     2 */
> 	short int                  _num_blocks;          /*   274     2 */
> 	enum WouldProfile          _would_profile;       /*   276     4 */
> 	int                        _jvmci_ir_size;       /*   280     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	class FailedSpeculation *  _failed_speculations; /*   288     8 */
> 	int                        _data_size;           /*   296     4 */
> 	int                        _parameters_type_data_di; /*   300     4 */
> 	int                        _exception_handler_data_di; /*   304     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	intptr_t                   _data[1];             /*   312     8 */
> 
> 	/* size: 320, cachelines: 5, members: 27 */
> 	/* sum members: 304, holes: 3, sum holes: 16 */
> };
> 
> 
> There are 3 holes ...

@pengxiaolong 
Your change (at version 12e7aaf5c0a4ed529337d773e78a35a893f6cb44) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20019#issuecomment-2208737573

From liach at openjdk.org  Thu Jul  4 11:51:19 2024
From: liach at openjdk.org (Chen Liang)
Date: Thu, 4 Jul 2024 11:51:19 GMT
Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively
 throws wrong exception [v2]
In-Reply-To: <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com>
References: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com>
 <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com>
Message-ID: <MJxYuFBXEn0tEnHqV9bLaOTxjPRtxU7Lb1wumWvVR6g=.4eada565-e293-4b7f-b55a-270c82a3ea79@github.com>

On Thu, 4 Jul 2024 06:22:31 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods.
>> 
>> The exception message is less exact, but I think that's acceptable.
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   address comments

Marked as reviewed by liach (Committer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20015#pullrequestreview-2158742384

From stuefe at openjdk.org  Thu Jul  4 12:44:19 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 4 Jul 2024 12:44:19 GMT
Subject: RFR: 8334231: Optimize MethodData layout
In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
Message-ID: <qBONEcrgJyYqSsBdiDRbA9NeV8sC8uXKRY2zbpDE8Fc=.1dfd2cbc-5982-4958-b7cb-313d0c52139a@github.com>

On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> Hi all,
>      This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of  MethodData. Here is the original layout from `pahole`:
> 
> class MethodData : public Metadata {
> public:
> 
> 	/* class Metadata            <ancestor>; */      /*     0     0 */
> 
> 	/* XXX 8 bytes hole, try to pack */
> 
> 	class Method *             _method;              /*     8     8 */
> 	int                        _size;                /*    16     4 */
> 	int                        _hint_di;             /*    20     4 */
> 	class Mutex               _extra_data_lock;      /*    24   104 */
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	class CompilerCounters    _compiler_counters;    /*   128    80 */
> 	/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
> 	intx                       _eflags;              /*   208     8 */
> 	intx                       _arg_local;           /*   216     8 */
> 	intx                       _arg_stack;           /*   224     8 */
> 	intx                       _arg_returned;        /*   232     8 */
> 	int                        _creation_mileage;    /*   240     4 */
> 	class InvocationCounter   _invocation_counter;   /*   244     4 */
> 	class InvocationCounter   _backedge_counter;     /*   248     4 */
> 	int                        _invocation_counter_start; /*   252     4 */
> 	/* --- cacheline 4 boundary (256 bytes) --- */
> 	int                        _backedge_counter_start; /*   256     4 */
> 	uint                       _tenure_traps;        /*   260     4 */
> 	int                        _invoke_mask;         /*   264     4 */
> 	int                        _backedge_mask;       /*   268     4 */
> 	short int                  _num_loops;           /*   272     2 */
> 	short int                  _num_blocks;          /*   274     2 */
> 	enum WouldProfile          _would_profile;       /*   276     4 */
> 	int                        _jvmci_ir_size;       /*   280     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	class FailedSpeculation *  _failed_speculations; /*   288     8 */
> 	int                        _data_size;           /*   296     4 */
> 	int                        _parameters_type_data_di; /*   300     4 */
> 	int                        _exception_handler_data_di; /*   304     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	intptr_t                   _data[1];             /*   312     8 */
> 
> 	/* size: 320, cachelines: 5, members: 27 */
> 	/* sum members: 304, holes: 3, sum holes: 16 */
> };
> 
> 
> There are 3 holes ...

I don't think these "Optimize XXX layouts" should be marked as trivial, and I worry that we overuse the trivial rule. It circumvents the second reviewer as well as the 24hr rule, which both are necessary safeties. Especially in the wake of the xz fiasco.

"Trivial" is usually reserved for either changes that need very quick reaction (e.g. reasonably simple build errors that require immediate fixing because everyone's CI is standing still) or things that are painfully obvious in being trivial, e.g. comment changes. Memory layout changes are neither urgent nor really trivial enough IMHO.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20019#issuecomment-2208881597

From rehn at openjdk.org  Thu Jul  4 13:21:23 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Thu, 4 Jul 2024 13:21:23 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22]
In-Reply-To: <dSl5ihkZZiE0ZM_a69HgNUOwBPs-Dnd3rmwYOBmaCLc=.15750165-6919-4f8c-9177-08f0dfee8e18@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com>
 <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com>
 <RpC7ys1vEQoTjqXMQ68o68ZhWEdJcJi9msPTqz63Xl4=.d798c00c-38a2-48d4-aab5-ebd124441ef4@github.com>
 <dSl5ihkZZiE0ZM_a69HgNUOwBPs-Dnd3rmwYOBmaCLc=.15750165-6919-4f8c-9177-08f0dfee8e18@github.com>
Message-ID: <kPV0mynNKt-zkoeR924qfwLg_9BjAeOb0CBPfWj495g=.9c5ca84a-f607-4a26-9891-0d014f040e75@github.com>

On Thu, 4 Jul 2024 11:04:52 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> I used to be excplicit about this is the normal **ld**.
>> But I see I did not do the same for **jalr** => **_jalr**. (as we are in MASM we can use 'private' method).
>> I'll change to normal ld and add an assert that we are in a incompressable region?
>> 
>> (I would like to revert that at some time, so the user of reloc_call don't need to know about it needs incompressable for reloc_call)
>> 
>> Suggested:
>> 
>> @@ -982,0 +983 @@ void MacroAssembler::load_link_jump(const address source, Register temp) {
>> +  assert(!in_compressible_region(), "Must be");
>> @@ -987 +988 @@ void MacroAssembler::load_link_jump(const address source, Register temp) {
>> -  Assembler::_ld(temp, temp, ((int32_t)distance << 20) >> 20);
>> +  Assembler::ld(temp, temp, ((int32_t)distance << 20) >> 20);
>
> Seems no need to worry about that? I think it's up to the caller to decide if it wants compressed instructions.
> Here in this case, the only call site is within a relocate() which will disable compressed instructions for you[1].
> 
>   relocate(entry.rspec(), [&] {
>     load_link_jump(target);
>   });
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/assembler_riscv.hpp#L2130

When you use reloc_call the size of the call must be an exact size we already specified (3 * NativeInstruction::instruction_size). (if those sizes don't apply you don't want a reloc_call) 
So there is nothing to choose from the caller prespective.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1665703906

From jsjolen at openjdk.org  Thu Jul  4 13:23:43 2024
From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=)
Date: Thu, 4 Jul 2024 13:23:43 GMT
Subject: RFR: 8335701: Make GrowableArray templated by an Index
Message-ID: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com>

Hi,

Today the GrowableArray has a set index type of `int`, this PR makes it so that you can set your own index type through a template parameter.

This opens up for a few new design choices:

- Do you know that you have a very small array? Use an `uint8_t` for len and cap, each.
- Do you have a very large one? Use an `uint64_t`.

The code has opted for `int` being default, as to keep identical semantics for all existing code and to let users not have to worry about the index if they don't care.

One "major" change that I don't want to get lost in the review: I've changed the mid-point calculation to be overflow insensitive without casting.



// Old 
mid = ((max + min) / 2);
// New
mid = min + ((max - min) / 2);

Some semi-rigorous thinking:
min \in [0, len)
max \in [0, len)
min <= max
max - min / 2 \in [0, len/2)
Maximizing min and max => len + 0
Maximizing max, minimizing min => len/2
Minimizing max, maximizing min => max = min => min


// Proof that they're identical when m, h, l \in N
(1) m = l + (h - l) / 2 <=>
2m = 2l + h - l = h + l

(2) m = (h + l) / 2 <=>
2m = h + l
(1) = (2)
QED

-------------

Commit messages:
 - Fix spelling and actually include the growableArray is it used in cpp file
 - Move
 - Handle unhandled oops
 - Make GrowableArray receive an Index type optionally

Changes: https://git.openjdk.org/jdk/pull/20031/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20031&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335701
  Stats: 262 lines in 32 files changed: 19 ins; 25 del; 218 mod
  Patch: https://git.openjdk.org/jdk/pull/20031.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20031/head:pull/20031

PR: https://git.openjdk.org/jdk/pull/20031

From rehn at openjdk.org  Thu Jul  4 13:24:22 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Thu, 4 Jul 2024 13:24:22 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22]
In-Reply-To: <kPV0mynNKt-zkoeR924qfwLg_9BjAeOb0CBPfWj495g=.9c5ca84a-f607-4a26-9891-0d014f040e75@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com>
 <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com>
 <RpC7ys1vEQoTjqXMQ68o68ZhWEdJcJi9msPTqz63Xl4=.d798c00c-38a2-48d4-aab5-ebd124441ef4@github.com>
 <dSl5ihkZZiE0ZM_a69HgNUOwBPs-Dnd3rmwYOBmaCLc=.15750165-6919-4f8c-9177-08f0dfee8e18@github.com>
 <kPV0mynNKt-zkoeR924qfwLg_9BjAeOb0CBPfWj495g=.9c5ca84a-f607-4a26-9891-0d014f040e75@github.com>
Message-ID: <dD4ZxN87ZZkJVYPvC3wzpucTt2TDq7-PEXloNDYsu-0=.445db78a-858a-4204-9fcf-c98dc9e6bbe7@github.com>

On Thu, 4 Jul 2024 13:18:57 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Seems no need to worry about that? I think it's up to the caller to decide if it wants compressed instructions.
>> Here in this case, the only call site is within a relocate() which will disable compressed instructions for you[1].
>> 
>>   relocate(entry.rspec(), [&] {
>>     load_link_jump(target);
>>   });
>> 
>> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/assembler_riscv.hpp#L2130
>
> When you use reloc_call the size of the call must be an exact size we already specified (3 * NativeInstruction::instruction_size). (if those sizes don't apply you don't want a reloc_call) 
> So there is nothing to choose from the caller prespective.

Maybe in the future load_link_jump will be used out side reloc_call, was that your thinking?

Anyhow change _ld to ld ?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1665707305

From jsjolen at openjdk.org  Thu Jul  4 13:35:36 2024
From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=)
Date: Thu, 4 Jul 2024 13:35:36 GMT
Subject: RFR: 8335701: Make GrowableArray templated by an Index [v2]
In-Reply-To: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com>
References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com>
Message-ID: <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com>

> Hi,
> 
> Today the GrowableArray has a set index type of `int`, this PR makes it so that you can set your own index type through a template parameter.
> 
> This opens up for a few new design choices:
> 
> - Do you know that you have a very small array? Use an `uint8_t` for len and cap, each.
> - Do you have a very large one? Use an `uint64_t`.
> 
> The code has opted for `int` being default, as to keep identical semantics for all existing code and to let users not have to worry about the index if they don't care.
> 
> One "major" change that I don't want to get lost in the review: I've changed the mid-point calculation to be overflow insensitive without casting.
> 
> 
> 
> // Old 
> mid = ((max + min) / 2);
> // New
> mid = min + ((max - min) / 2);
> 
> Some semi-rigorous thinking:
> min \in [0, len)
> max \in [0, len)
> min <= max
> max - min / 2 \in [0, len/2)
> Maximizing min and max => len + 0
> Maximizing max, minimizing min => len/2
> Minimizing max, maximizing min => max = min => min
> 
> 
> // Proof that they're identical when m, h, l \in N
> (1) m = l + (h - l) / 2 <=>
> 2m = 2l + h - l = h + l
> 
> (2) m = (h + l) / 2 <=>
> 2m = h + l
> (1) = (2)
> QED

Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision:

  Attempt at fixing GA VMStruct

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20031/files
  - new: https://git.openjdk.org/jdk/pull/20031/files/7407a151..b5a87422

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20031&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20031&range=00-01

  Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20031.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20031/head:pull/20031

PR: https://git.openjdk.org/jdk/pull/20031

From jsjolen at openjdk.org  Thu Jul  4 13:41:17 2024
From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=)
Date: Thu, 4 Jul 2024 13:41:17 GMT
Subject: RFR: 8335701: Make GrowableArray templated by an Index [v2]
In-Reply-To: <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com>
References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com>
 <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com>
Message-ID: <OpawmhVDx5yj4qXdiiYcp08Qxc0KyAVPFv95F272n3o=.bc3aa06c-2954-4376-a0a5-54363b2b76d6@github.com>

On Thu, 4 Jul 2024 13:35:36 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> Hi,
>> 
>> Today the GrowableArray has a set index type of `int`, this PR makes it so that you can set your own index type through a template parameter.
>> 
>> This opens up for a few new design choices:
>> 
>> - Do you know that you have a very small array? Use an `uint8_t` for len and cap, each.
>> - Do you have a very large one? Use an `uint64_t`.
>> 
>> The code has opted for `int` being default, as to keep identical semantics for all existing code and to let users not have to worry about the index if they don't care.
>> 
>> One "major" change that I don't want to get lost in the review: I've changed the mid-point calculation to be overflow insensitive without casting.
>> 
>> 
>> 
>> // Old 
>> mid = ((max + min) / 2);
>> // New
>> mid = min + ((max - min) / 2);
>> 
>> Some semi-rigorous thinking:
>> min \in [0, len)
>> max \in [0, len)
>> min <= max
>> max - min / 2 \in [0, len/2)
>> Maximizing min and max => len + 0
>> Maximizing max, minimizing min => len/2
>> Minimizing max, maximizing min => max = min => min
>> 
>> 
>> // Proof that they're identical when m, h, l \in N
>> (1) m = l + (h - l) / 2 <=>
>> 2m = 2l + h - l = h + l
>> 
>> (2) m = (h + l) / 2 <=>
>> 2m = h + l
>> (1) = (2)
>> QED
>
> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Attempt at fixing GA VMStruct

Always fun to grapple with vmStructs, moving this back to draft.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20031#issuecomment-2209022728

From eosterlund at openjdk.org  Thu Jul  4 13:58:26 2024
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Thu, 4 Jul 2024 13:58:26 GMT
Subject: [jdk23] RFR: 8334890: Missing unconditional cross modifying fence in
 nmethod entry barriers
Message-ID: <Zl8M3k7N0G-oBFmAyX3oO6RrxeNnCdr2lH9JyrdX0GQ=.4d2d82e8-49f7-44ec-84ff-0c0d6794b9e5@github.com>

8334890: Missing unconditional cross modifying fence in nmethod entry barriers

-------------

Commit messages:
 - Backport c0604fb823d9f3b2e347a9857b11606b223ad8ec

Changes: https://git.openjdk.org/jdk/pull/20036/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20036&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8334890
  Stats: 18 lines in 1 file changed: 1 ins; 14 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/20036.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20036/head:pull/20036

PR: https://git.openjdk.org/jdk/pull/20036

From aboldtch at openjdk.org  Thu Jul  4 13:58:26 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Thu, 4 Jul 2024 13:58:26 GMT
Subject: [jdk23] RFR: 8334890: Missing unconditional cross modifying fence
 in nmethod entry barriers
In-Reply-To: <Zl8M3k7N0G-oBFmAyX3oO6RrxeNnCdr2lH9JyrdX0GQ=.4d2d82e8-49f7-44ec-84ff-0c0d6794b9e5@github.com>
References: <Zl8M3k7N0G-oBFmAyX3oO6RrxeNnCdr2lH9JyrdX0GQ=.4d2d82e8-49f7-44ec-84ff-0c0d6794b9e5@github.com>
Message-ID: <sKjd64NGI947n-5zuIRi78OPMZfwgreI54UDcyTeTW0=.a28e7639-c100-4e00-938c-15d0dcb8ca63@github.com>

On Thu, 4 Jul 2024 13:52:09 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> 8334890: Missing unconditional cross modifying fence in nmethod entry barriers

Marked as reviewed by aboldtch (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20036#pullrequestreview-2158996704

From fyang at openjdk.org  Thu Jul  4 14:04:21 2024
From: fyang at openjdk.org (Fei Yang)
Date: Thu, 4 Jul 2024 14:04:21 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22]
In-Reply-To: <dD4ZxN87ZZkJVYPvC3wzpucTt2TDq7-PEXloNDYsu-0=.445db78a-858a-4204-9fcf-c98dc9e6bbe7@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com>
 <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com>
 <RpC7ys1vEQoTjqXMQ68o68ZhWEdJcJi9msPTqz63Xl4=.d798c00c-38a2-48d4-aab5-ebd124441ef4@github.com>
 <dSl5ihkZZiE0ZM_a69HgNUOwBPs-Dnd3rmwYOBmaCLc=.15750165-6919-4f8c-9177-08f0dfee8e18@github.com>
 <kPV0mynNKt-zkoeR924qfwLg_9BjAeOb0CBPfWj495g=.9c5ca84a-f607-4a26-9891-0d014f040e75@github.com>
 <dD4ZxN87ZZkJVYPvC3wzpucTt2TDq7-PEXloNDYsu-0=.445db78a-858a-4204-9fcf-c98dc9e6bbe7@github.com>
Message-ID: <_pvvhby0M1_L7J34xtX5ZXQSjyBndjiqUAvc7ohj5ng=.f1a0adaf-5961-4cbf-8001-b4a066fdb201@github.com>

On Thu, 4 Jul 2024 13:21:46 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> When you use reloc_call the size of the call must be an exact size we already specified (3 * NativeInstruction::instruction_size). (if those sizes don't apply you don't want a reloc_call) 
>> So there is nothing to choose from the caller prespective.
>
> Maybe in the future load_link_jump will be used out side reloc_call, was that your thinking?
> 
> Anyhow change _ld to ld ?

Yeah, I just feel it's better to decouple the two and let the users of `load_link_jump` to decide. (Or just inline this `load_link_jump` into the caller?)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1665753700

From rehn at openjdk.org  Thu Jul  4 14:48:36 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Thu, 4 Jul 2024 14:48:36 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v23]
In-Reply-To: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
Message-ID: <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com>

> Hi all, please consider!
> 
> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
> Using a very small application or running very short time we have fast patchable calls.
> But any normal application running longer will increase the code size and code chrun/fragmentation.
> So whatever or not you get hot fast calls rely on luck.
> 
> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
> This would be the common case for a patchable call.
> 
> Code stream:
> JAL <trampo>
> Stubs:
> AUIPC
> LD
> JALR
> <DEST>
> 
> 
> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
> Even if you don't have that problem having a call to a jump is not the fastest way.
> Loading the address avoids the pitsfalls of cmodx.
> 
> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
> and instead do by default:
> 
> Code stream:
> AUIPC
> LD
> JALR
> Stubs:
> <DEST>
> 
> An experimental option for turning trampolines back on exists.
> 
> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
> 
> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
> 
> fop                                        (msec)    2239       |  2128       =  0.950424
> h2                                         (msec)    18660      |  16594      =  0.889282
> jython                                     (msec)    22022      |  21925      =  0.995595
> luindex                                    (msec)    2866       |  2842       =  0.991626
> lusearch                                   (msec)    4108       |  4311       =  1.04942
> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
> pmd                                        (msec)    5976       |  5897       =  0.98678
> jython                                     (msec)    22022      |  21925      =  0.995595
> Avg:                                       0.974112                              
> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
> h2(xcomp)                                  (msec)    37719      |  38004      =  1.00756
> jython(xcomp)        ...

Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:

  _ld to ld

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19453/files
  - new: https://git.openjdk.org/jdk/pull/19453/files/9eabb5fa..b958ee0f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=22
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=21-22

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/19453.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453

PR: https://git.openjdk.org/jdk/pull/19453

From rehn at openjdk.org  Thu Jul  4 14:48:38 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Thu, 4 Jul 2024 14:48:38 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v2]
In-Reply-To: <VUlq4fBerYjwcZKXnY-1R_WIRuk-nlFDYqe5MjeXvRs=.a06722cd-9e8d-4117-b2dc-9f20a8f0b60c@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <3W3z-PDFsRFSclrP3FJRmnEjL4rLDRSUEFN5qkFxSUI=.feb03562-9ca3-4383-94cd-967d4234a4aa@github.com>
 <VUlq4fBerYjwcZKXnY-1R_WIRuk-nlFDYqe5MjeXvRs=.a06722cd-9e8d-4117-b2dc-9f20a8f0b60c@github.com>
Message-ID: <cg4Wr-Co1XRUaF30v4SJ-9bjTjV_kmVh_O5nJaFc6m4=.78ed450b-1e65-4072-b39a-3f5f337538cc@github.com>

On Tue, 4 Jun 2024 11:05:19 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>> 
>>  - Merge branch 'master' into 8332689
>>  - Remove accidental files
>>  - Remove accidental files
>>  - Baseline
>
> I see new classes are added in nativeInst, maybe the comments at the top of nativeInst.hpp needs updated accordingly.

Thanks @Hamlin-Li @RealFYang !

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2209160416

From rehn at openjdk.org  Thu Jul  4 14:48:38 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Thu, 4 Jul 2024 14:48:38 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v22]
In-Reply-To: <_pvvhby0M1_L7J34xtX5ZXQSjyBndjiqUAvc7ohj5ng=.f1a0adaf-5961-4cbf-8001-b4a066fdb201@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <tqCYUNb0IDg2yp0mq1keFZiQ-ANeBFQZQbxDtHd_iLM=.facaaf44-63e1-44e0-ba18-9634dfd04fc5@github.com>
 <EARIa_wPXLx8OWa8uIZIZqLEYmE6jXyvfywSBkXdEwo=.2d52a2b5-837c-4015-a5c7-8537bf01badb@github.com>
 <RpC7ys1vEQoTjqXMQ68o68ZhWEdJcJi9msPTqz63Xl4=.d798c00c-38a2-48d4-aab5-ebd124441ef4@github.com>
 <dSl5ihkZZiE0ZM_a69HgNUOwBPs-Dnd3rmwYOBmaCLc=.15750165-6919-4f8c-9177-08f0dfee8e18@github.com>
 <kPV0mynNKt-zkoeR924qfwLg_9BjAeOb0CBPfWj495g=.9c5ca84a-f607-4a26-9891-0d014f040e75@github.com>
 <dD4ZxN87ZZkJVYPvC3wzpucTt2TDq7-PEXloNDYsu-0=.445db78a-858a-4204-9fcf-c98dc9e6bbe7@github.com>
 <_pvvhby0M1_L7J34xtX5ZXQSjyBndjiqUAvc7ohj5ng=.f1a0adaf-5961-4cbf-8001-b4a066fdb201@github.com>
Message-ID: <g5d3WUbf-ppmfPcgJ_qEBeDG333mwbMCWRVc61Vuucs=.4cae2e43-dcd2-4ae1-957f-43615c749f2a@github.com>

On Thu, 4 Jul 2024 13:58:33 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> Maybe in the future load_link_jump will be used out side reloc_call, was that your thinking?
>> 
>> Anyhow change _ld to ld ?
>
> Yeah, I just feel it's better to decouple the two and let the users of `load_link_jump` to decide. (Or just inline this `load_link_jump` into the caller?)

I just changed to ld.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1665807685

From chagedorn at openjdk.org  Thu Jul  4 15:00:20 2024
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Thu, 4 Jul 2024 15:00:20 GMT
Subject: RFR: 8334231: Optimize MethodData layout
In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
Message-ID: <Cf9U1p3z4utQLP6ygOGk-1Os1CNzSzuiSElIhju-q9Y=.e30c51da-c25e-4aae-adf3-253327a9dc9f@github.com>

On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> Hi all,
>      This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of  MethodData. Here is the original layout from `pahole`:
> 
> class MethodData : public Metadata {
> public:
> 
> 	/* class Metadata            <ancestor>; */      /*     0     0 */
> 
> 	/* XXX 8 bytes hole, try to pack */
> 
> 	class Method *             _method;              /*     8     8 */
> 	int                        _size;                /*    16     4 */
> 	int                        _hint_di;             /*    20     4 */
> 	class Mutex               _extra_data_lock;      /*    24   104 */
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	class CompilerCounters    _compiler_counters;    /*   128    80 */
> 	/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
> 	intx                       _eflags;              /*   208     8 */
> 	intx                       _arg_local;           /*   216     8 */
> 	intx                       _arg_stack;           /*   224     8 */
> 	intx                       _arg_returned;        /*   232     8 */
> 	int                        _creation_mileage;    /*   240     4 */
> 	class InvocationCounter   _invocation_counter;   /*   244     4 */
> 	class InvocationCounter   _backedge_counter;     /*   248     4 */
> 	int                        _invocation_counter_start; /*   252     4 */
> 	/* --- cacheline 4 boundary (256 bytes) --- */
> 	int                        _backedge_counter_start; /*   256     4 */
> 	uint                       _tenure_traps;        /*   260     4 */
> 	int                        _invoke_mask;         /*   264     4 */
> 	int                        _backedge_mask;       /*   268     4 */
> 	short int                  _num_loops;           /*   272     2 */
> 	short int                  _num_blocks;          /*   274     2 */
> 	enum WouldProfile          _would_profile;       /*   276     4 */
> 	int                        _jvmci_ir_size;       /*   280     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	class FailedSpeculation *  _failed_speculations; /*   288     8 */
> 	int                        _data_size;           /*   296     4 */
> 	int                        _parameters_type_data_di; /*   300     4 */
> 	int                        _exception_handler_data_di; /*   304     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	intptr_t                   _data[1];             /*   312     8 */
> 
> 	/* size: 320, cachelines: 5, members: 27 */
> 	/* sum members: 304, holes: 3, sum holes: 16 */
> };
> 
> 
> There are 3 holes ...

That's a fair point. It's sometimes tricky to find the boundary between trivial and non-trivial. Here I thought about it for a bit but it looked trivial (but I understand that you could think about it differently) and I was the second reviewer. Nevertheless, as you pointed out, I agree that we should probably restrict the use of this rule to the really urgent issues (most are not). My default is usually to just wait 24h for normal issues even for changes that are marked as trivial just to give everyone a chance to have a look. Sometimes, you still get some valuable feedback that you could have missed otherwise when integrating early.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20019#issuecomment-2209179254

From aph at openjdk.org  Thu Jul  4 15:29:22 2024
From: aph at openjdk.org (Andrew Haley)
Date: Thu, 4 Jul 2024 15:29:22 GMT
Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well
 [v17]
In-Reply-To: <NQ1QNuTBkNsmBReCpdhY1lrdIYz9s8UiNd1As1sLQ7M=.17c8f789-2bf1-4beb-891f-debccad29164@github.com>
References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
 <NQ1QNuTBkNsmBReCpdhY1lrdIYz9s8UiNd1As1sLQ7M=.17c8f789-2bf1-4beb-891f-debccad29164@github.com>
Message-ID: <G2IfSoXv1DKf69H_Gr5O_L-FTkQQgYGBS15UCNMoVt0=.acf2acd9-337c-4d45-8321-1c1be4e3316e@github.com>

On Mon, 1 Jul 2024 14:14:50 GMT, Amit Kumar <amitkumar at openjdk.org> wrote:

>> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450)
>> 
>> I ran `tier1` test with  `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. 
>> 
>> 
>> Without Patch: 
>> 
>> SecondarySuperCacheHits.test  avgt   15  0.929 ? 0.010  ns/op
>> 
>> SecondarySuperCacheInterContention.test     avgt   15  1.413 ? 0.007  ns/op
>> SecondarySuperCacheInterContention.test:t1  avgt   15  1.415 ? 0.016  ns/op
>> SecondarySuperCacheInterContention.test:t2  avgt   15  1.410 ? 0.017  ns/op
>> 
>> Benchmark                             Mode  Cnt   Score   Error  Units
>> SecondarySupersLookup.testNegative00  avgt   15   1.806 ? 0.325  ns/op
>> SecondarySupersLookup.testNegative01  avgt   15   2.364 ? 0.236  ns/op
>> SecondarySupersLookup.testNegative02  avgt   15   2.903 ? 0.215  ns/op
>> SecondarySupersLookup.testNegative03  avgt   15   3.417 ? 0.199  ns/op
>> SecondarySupersLookup.testNegative04  avgt   15   3.758 ? 0.102  ns/op
>> SecondarySupersLookup.testNegative05  avgt   15   4.352 ? 0.123  ns/op
>> SecondarySupersLookup.testNegative06  avgt   15   4.800 ? 0.099  ns/op
>> SecondarySupersLookup.testNegative07  avgt   15   5.365 ? 0.060  ns/op
>> SecondarySupersLookup.testNegative08  avgt   15   6.316 ? 0.092  ns/op
>> SecondarySupersLookup.testNegative09  avgt   15   6.669 ? 0.164  ns/op
>> SecondarySupersLookup.testNegative10  avgt   15   7.041 ? 0.164  ns/op
>> SecondarySupersLookup.testNegative16  avgt   15   9.336 ? 0.185  ns/op
>> SecondarySupersLookup.testNegative20  avgt   15  11.373 ? 0.029  ns/op
>> SecondarySupersLookup.testNegative30  avgt   15  15.236 ? 0.051  ns/op
>> SecondarySupersLookup.testNegative32  avgt   15  16.031 ? 0.091  ns/op
>> SecondarySupersLookup.testNegative40  avgt   15  19.197 ? 0.279  ns/op
>> SecondarySupersLookup.testNegative50  avgt   15  23.804 ? 2.387  ns/op
>> SecondarySupersLookup.testNegative55  avgt   15  25.610 ? 1.155  ns/op
>> SecondarySupersLookup.testNegative56  avgt   15  26.128 ? 2.203  ns/op
>> SecondarySupersLookup.testNegative57  avgt   15  26.126 ? 0.881  ns/op
>> SecondarySupersLookup.testNegative58  avgt   15  26.314 ? 0.521  ns/op
>> SecondarySupersLookup.testNegative59  avgt   15  26.750 ? 0.837  ns/op
>> SecondarySupersLookup.testNegative60  avgt   15  27.118 ? 0.557 ...
>
> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update src/hotspot/cpu/s390/macroAssembler_s390.cpp
>   
>   Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com>

Looks good.

-------------

Marked as reviewed by aph (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19544#pullrequestreview-2159162829

From sgehwolf at openjdk.org  Thu Jul  4 15:45:20 2024
From: sgehwolf at openjdk.org (Severin Gehwolf)
Date: Thu, 4 Jul 2024 15:45:20 GMT
Subject: RFR: 8333446: Add tests for hierarchical container support [v3]
In-Reply-To: <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com>
References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com>
 <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com>
Message-ID: <Ctdm2c5kjQzfc15BcfAWtiJKwLjNKOhAYz0Lnsy-7N0=.ded4abfe-d6df-44a4-802f-5bf17ef338bc@github.com>

On Mon, 1 Jul 2024 14:43:58 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges.
>> 
>> I'm adding those tests in order to not regress another time.
>> 
>> Testing:
>> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1.
>> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to  JDK-8322420)
>> - [x] GHA
>
> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>  - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>  - Fix comments
>  - 8333446: Add tests for hierarchical container support

Gentle ping.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2209259086

From szaldana at openjdk.org  Thu Jul  4 15:46:25 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Thu, 4 Jul 2024 15:46:25 GMT
Subject: RFR: 8300732: Whitebox functions for Metaspace test should use byte
 size
Message-ID: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com>

Hi all, 

This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. 

Testing: 
- [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. 

Thanks, 
Sonia

-------------

Commit messages:
 - 8300732: Whitebox functions for Metaspace test should use byte size

Changes: https://git.openjdk.org/jdk/pull/20039/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20039&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8300732
  Stats: 140 lines in 12 files changed: 48 ins; 0 del; 92 mod
  Patch: https://git.openjdk.org/jdk/pull/20039.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20039/head:pull/20039

PR: https://git.openjdk.org/jdk/pull/20039

From stuefe at openjdk.org  Thu Jul  4 15:58:18 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 4 Jul 2024 15:58:18 GMT
Subject: RFR: 8300732: Whitebox functions for Metaspace test should use
 byte size
In-Reply-To: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com>
References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com>
Message-ID: <OUlxUaj3irq9j4fNiLvdP4gfgRju-zdfMyNibVp1p18=.ff70caf7-04ac-4c66-a7c3-c8cb090b3450@github.com>

On Thu, 4 Jul 2024 15:18:29 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

> Hi all, 
> 
> This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. 
> 
> Testing: 
> - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. 
> 
> Thanks, 
> Sonia

First cursory look, will look again later

src/hotspot/share/prims/whitebox.cpp line 1715:

> 1713: // MetaspaceTestContext and MetaspaceTestArena
> 1714: WB_ENTRY(jlong, WB_CreateMetaspaceTestContext(JNIEnv* env, jobject wb, jlong commit_limit, jlong reserve_limit))
> 1715:   if (commit_limit % BytesPerWord != 0) {

Use is_aligned() from utilities/align.hpp

-------------

PR Review: https://git.openjdk.org/jdk/pull/20039#pullrequestreview-2159200244
PR Review Comment: https://git.openjdk.org/jdk/pull/20039#discussion_r1665875107

From stuefe at openjdk.org  Thu Jul  4 15:58:18 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 4 Jul 2024 15:58:18 GMT
Subject: RFR: 8300732: Whitebox functions for Metaspace test should use
 byte size
In-Reply-To: <OUlxUaj3irq9j4fNiLvdP4gfgRju-zdfMyNibVp1p18=.ff70caf7-04ac-4c66-a7c3-c8cb090b3450@github.com>
References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com>
 <OUlxUaj3irq9j4fNiLvdP4gfgRju-zdfMyNibVp1p18=.ff70caf7-04ac-4c66-a7c3-c8cb090b3450@github.com>
Message-ID: <cqRtkSTTxmfueCmWXsXIjmF0oUX5vjliE0LmwRowSHg=.6bb60863-943e-46e4-b8d2-a7ce214547ae@github.com>

On Thu, 4 Jul 2024 15:53:17 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. 
>> 
>> Testing: 
>> - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. 
>> 
>> Thanks, 
>> Sonia
>
> src/hotspot/share/prims/whitebox.cpp line 1715:
> 
>> 1713: // MetaspaceTestContext and MetaspaceTestArena
>> 1714: WB_ENTRY(jlong, WB_CreateMetaspaceTestContext(JNIEnv* env, jobject wb, jlong commit_limit, jlong reserve_limit))
>> 1715:   if (commit_limit % BytesPerWord != 0) {
> 
> Use is_aligned() from utilities/align.hpp

And I think you can just assert() here. If this happens, a test written by us is using the whitebox function wrong, and since its all internal, no need to propagate a java exception.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20039#discussion_r1665875957

From duke at openjdk.org  Thu Jul  4 17:56:20 2024
From: duke at openjdk.org (Camel Coder)
Date: Thu, 4 Jul 2024 17:56:20 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2]
In-Reply-To: <S-BpiX60ySY6FNDfcskTHuuDsQQIno54AaOvSFlm67c=.24e8cf29-de2c-4f8e-bcdb-7cd1c7927c30@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
 <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com>
 <S-BpiX60ySY6FNDfcskTHuuDsQQIno54AaOvSFlm67c=.24e8cf29-de2c-4f8e-bcdb-7cd1c7927c30@github.com>
Message-ID: <dGgB0M0dpmd_gFfsX8XlLaeL5uk9HynqHdYuZvp7URs=.17839235-7974-4be7-a92f-2e8d5fdb1c0b@github.com>

On Mon, 1 Jul 2024 15:36:03 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   use pure scalar version when rvv is not supported
>
> with pure scalar impelmentation, it also bring some performance imrpovement in all source size, so also enable the intrinsic when rvv is not supported.
> 
> performance data
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score +instrinsic, scalar | Error | Units | Perf opt
> -- | -- | -- | -- | -- | -- | -- | -- | --
> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.75 | 0.38 | ns/op | 1
> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.71 | 93.824 | 1.954 | ns/op | 0.999
> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.824 | 123.487 | 0.559 | ns/op | 0.987
> Base64Encode.testBase64Encode | 6 | avgt | 10 | 138.984 | 137.697 | 0.273 | ns/op | 1.009
> Base64Encode.testBase64Encode | 7 | avgt | 10 | 161.243 | 157.696 | 0.875 | ns/op | 1.022
> Base64Encode.testBase64Encode | 9 | avgt | 10 | 169.724 | 155.223 | 1.908 | ns/op | 1.093
> Base64Encode.testBase64Encode | 10 | avgt | 10 | 185.92 | 176.339 | 5.875 | ns/op | 1.054
> Base64Encode.testBase64Encode | 48 | avgt | 10 | 408.467 | 347.269 | 1.799 | ns/op | 1.176
> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3665.34 | 2718.442 | 26.954 | ns/op | 1.348
> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7022.025 | 5290.003 | 33.216 | ns/op | 1.327
> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135819.7 | 101988.94 | 2209.887 | ns/op | 1.332
> 
> </google-sheets-html-origin>

@Hamlin-Li Hi, we looked at RVV base64 encode/decode for another project before, however there wasn't one implementation that obviously was best across the different hardware: https://github.com/WojciechMula/base64simd/issues/9 (see issue for benchmark, and repo for code)

I think we currently can't tell how, the complex load/stores will perform on future hardware. Segmented load/stores for example are quite fast on the current in-order RVV 1.0 boards, however it's very slow on the ooo C910, and XiangShan (current master, may change) cores (SiFive P670 LLVM-MCA indicates that it might also be slow on that core). I'm not sure if that is because they are ooo and that gives you additional constraints, but I wouldn't rely on it just yet.

I think the safest bet for encode would be for now "RISC-V RVV (LMUL=1)" ([`encode`](https://github.com/WojciechMula/base64simd/blob/master/encode/encode.rvv.cpp#L60C14-L60C20) + [`lookup_pshufb_improved`](https://github.com/WojciechMula/base64simd/blob/master/encode/lookup.rvv.cpp#L7)), as this only uses instructions with predictable performance, except for LMUL=1 `vrgather.vv`, which I think will need to be fast on any application class core. (See x86 equivalent vperm*)

For decode, I'm not really happy with any implementation. Yours uses multiple `vluxei8` + `vlsege4` + `vssege3`, the others from base64simd use LMUL=8 `vrgather.vv`, which will take `LMUL^2=8^2=64` times the amount of cycles a LMUL=1 `vrgather.vv` takes (on sane implementations, [see my reasoning](https://gitlab.com/riseproject/riscv-optimization-guide/-/issues/1#note_1977583125)). As I said, I'm fairly certain LMUL=1 `vrgather.vv` will have to be relatively fast, so if I had to choose, I'd prefer [my implementation](https://godbolt.org/z/7qc1xhMao) that uses LMUL=1 `vrgather.vv`s +  `vlsege4` + `vssege3`, but using `vsseg*` is not ideal. (Note that gcc currently chokes on the register allocation, so you should use clang for now)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2209403751

From jrose at openjdk.org  Thu Jul  4 21:31:21 2024
From: jrose at openjdk.org (John R Rose)
Date: Thu, 4 Jul 2024 21:31:21 GMT
Subject: RFR: 8334890: Missing unconditional cross modifying fence in
 nmethod entry barriers
In-Reply-To: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com>
References: <592bq3FIM28SxUn6yH2iCDRT6TO_lpn_WvoS6PglM90=.b965043e-8550-45e0-be8b-5a71163a16d6@github.com>
Message-ID: <Objsw0uDMNiMohx4Xn6QedHF9Qdgoe6taf93Jfvz7Ts=.c3f0a6f7-228f-4cb5-9a05-55c68f93ebcc@github.com>

On Tue, 2 Jul 2024 15:43:08 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code.
> 
> We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher.
> 
> However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification.
> 
> There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully.
> 
> This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly.

I see you integrated; good job, and thank you to the Reviewers.  The narrative at the top of this PR is excellent for motivating and explaining the removal of the extra check.  Some of the other diffs are more mysterious, as Axel noted.

There are no API docs in barrierSetNMethod.[ch]pp so I would need to trace through all the code paths to properly educate myself about the effect of this change.

For example, BarrierSetNMethod::nmethod_entry_barrier is a very interesting function, along with its OSR brother, but there are no comments directly visible here that give a clue as to when it is called, or why it must be called.  I think that level of non-documentation is often a maintenance problem.  I see this file relates to the larger API in barrierSet.hpp but that file has sparse comments also.

Ideally, I?d hope to read The Narrative of the Barrier Set at the top of barrierSet.hpp, and maybe have a brief pointer to The Narrative from less-commented related files like barrierSetNMethod.cpp.

Also, if I did this change, and was feeling chatty and cautious, I?d leave behind an informative comment to the effect that ?you might want to double-check the barrier state here, but don?t, because races?.  It?s nice not to leave a seam from past history, but sometimes the absence of a warning leads people to repeat history.

None of the above critiques would have stopped me from approving the change as another Reviewer, but the lack of documentation would have made me hesitate to review quickly.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19990#issuecomment-2209578133

From dholmes at openjdk.org  Thu Jul  4 23:02:21 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 4 Jul 2024 23:02:21 GMT
Subject: RFR: 8334231: Optimize MethodData layout
In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
Message-ID: <jb5ayvzoKuUpOdlf-58EBJ8BHTt9OUOByPOctf4sEgY=.23ccae9d-a92a-4e9d-b10c-4d349e6c6a3e@github.com>

On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> Hi all,
>      This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of  MethodData. Here is the original layout from `pahole`:
> 
> class MethodData : public Metadata {
> public:
> 
> 	/* class Metadata            <ancestor>; */      /*     0     0 */
> 
> 	/* XXX 8 bytes hole, try to pack */
> 
> 	class Method *             _method;              /*     8     8 */
> 	int                        _size;                /*    16     4 */
> 	int                        _hint_di;             /*    20     4 */
> 	class Mutex               _extra_data_lock;      /*    24   104 */
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	class CompilerCounters    _compiler_counters;    /*   128    80 */
> 	/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
> 	intx                       _eflags;              /*   208     8 */
> 	intx                       _arg_local;           /*   216     8 */
> 	intx                       _arg_stack;           /*   224     8 */
> 	intx                       _arg_returned;        /*   232     8 */
> 	int                        _creation_mileage;    /*   240     4 */
> 	class InvocationCounter   _invocation_counter;   /*   244     4 */
> 	class InvocationCounter   _backedge_counter;     /*   248     4 */
> 	int                        _invocation_counter_start; /*   252     4 */
> 	/* --- cacheline 4 boundary (256 bytes) --- */
> 	int                        _backedge_counter_start; /*   256     4 */
> 	uint                       _tenure_traps;        /*   260     4 */
> 	int                        _invoke_mask;         /*   264     4 */
> 	int                        _backedge_mask;       /*   268     4 */
> 	short int                  _num_loops;           /*   272     2 */
> 	short int                  _num_blocks;          /*   274     2 */
> 	enum WouldProfile          _would_profile;       /*   276     4 */
> 	int                        _jvmci_ir_size;       /*   280     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	class FailedSpeculation *  _failed_speculations; /*   288     8 */
> 	int                        _data_size;           /*   296     4 */
> 	int                        _parameters_type_data_di; /*   300     4 */
> 	int                        _exception_handler_data_di; /*   304     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	intptr_t                   _data[1];             /*   312     8 */
> 
> 	/* size: 320, cachelines: 5, members: 27 */
> 	/* sum members: 304, holes: 3, sum holes: 16 */
> };
> 
> 
> There are 3 holes ...

I consider moving declarations around trivial; as functionally there is no impact. There could be a performance impact but it is very unlikely that anyone would know for certain during a code review (unless we violate a comment saying that things needs to be spaced out for caching). But I agree there is also no urgency.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20019#issuecomment-2209627272

From duke at openjdk.org  Fri Jul  5 03:25:29 2024
From: duke at openjdk.org (duke)
Date: Fri, 5 Jul 2024 03:25:29 GMT
Subject: Withdrawn: 8329204: Diagnostic command for zeroing unused parts of the
 heap
In-Reply-To: <b4mPQ-O25iKRZTAsKCoUJ2IrB78D_aEYs1kZRNRS4O4=.e64e5fee-52f4-4543-8c5e-3d6dd60cd8d7@github.com>
References: <b4mPQ-O25iKRZTAsKCoUJ2IrB78D_aEYs1kZRNRS4O4=.e64e5fee-52f4-4543-8c5e-3d6dd60cd8d7@github.com>
Message-ID: <YoIRuUtbj3T9QMht3_Ar4XDiZVZazCZzrMMFczWqk1E=.11de59a2-bcc9-4160-b782-7b5edbe3f04f@github.com>

On Wed, 27 Mar 2024 17:24:34 GMT, Volker Simonis <simonis at openjdk.org> wrote:

> Diagnostic command for zeroing unused parts of the heap
> 
> I propose to add a new diagnostic command `System.zero_unused_memory` which zeros out all unused parts of the heap. The name of the command is intentionally GC/heap agnostic because in the future it might be extended to also zero unused parts of the Metaspace and/or CodeCache.
> 
> Currently `System.zero_unused_memory` triggers a full GC and afterwards zeros unused parts of the heap. Zeroing can help snapshotting technologies like [CRIU][1] or [Firecracker][2] to shrink the snapshot size of VMs/containers with running JVM processes because pages which only contain zero bytes can be easily removed from the image by making the image *sparse* (e.g. with [`fallocate -p`][3]).
> 
> Notice that uncommitting unused heap parts in the JVM doesn't help in the context of virtualization (e.g. KVM/Firecracker) because from the host perspective they are still dirty and can't be easily removed from the snapshot image because they usually contain some non-zero data. More details can be found in my FOSDEM talk ["Zeroing and the semantic gap between host and guest"][4].
> 
> Furthermore, removing pages which only contain zero bytes (i.e. "empty pages") from a snapshot image not only decreases the image size but also speeds up the restore process because empty pages don't have to be read from the image file but will be populated by the kernel zero page first until they are used for the first time. This also decreases the initial memory footprint of a restored process. 
> 
> An additional argument for memory zeroing is security. By zeroing unused heap parts, we can make sure that secrets contained in unreferenced Java objects are deleted. Something that's currently impossibly to achieve from Java because even if a Java program zeroes out arrays with sensitive data after usage, it can never guarantee that the corresponding object hasn't already been moved by the GC and an old, unreferenced copy of that data still exists somewhere in the heap.
> 
> A prototype implementation for this proposal for Serial, Parallel, G1 and Shenandoah GC is available in the linked pull request.
> 
> [1]: https://criu.org
> [2]: https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md
> [3]: https://man7.org/linux/man-pages/man1/fallocate.1.html
> [4]: https://fosdem.org/2024/schedule/event/fosdem-2024-3454-zeroing-and-the-semantic-gap-between-host-and-guest/

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/18521

From duke at openjdk.org  Fri Jul  5 07:07:35 2024
From: duke at openjdk.org (duke)
Date: Fri, 5 Jul 2024 07:07:35 GMT
Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages
 unexpectedly [v30]
In-Reply-To: <9d5UWcRhfhgqpUkvy2dv77bATgCKYFjxNTDreBfk4MI=.5682e46d-b448-4936-8e98-14549669d3dc@github.com>
References: <ah1A3dIb6pD5Z7wYQnjoUPuuU5NvyNKEjUQvmp8MKXU=.1b615efe-deef-44d5-8bfa-908c2b2c9eb0@github.com>
 <9d5UWcRhfhgqpUkvy2dv77bATgCKYFjxNTDreBfk4MI=.5682e46d-b448-4936-8e98-14549669d3dc@github.com>
Message-ID: <87VKmlRTP3dMFaOxiViAj2O44e7hBgbjWkDwnG5K3ug=.2ee61b21-4d7b-4db9-94d2-05ecf94d0908@github.com>

On Fri, 26 Jan 2024 03:07:02 GMT, Liming Liu <lliu at openjdk.org> wrote:

>> As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14).
>> 
>> Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported:
>> 
>> <table>
>>   <tr>
>>     <th>Kernel</th>
>>     <th colspan="2"><tt>-XX:-TransparentHugePages</tt></th>
>>     <th colspan="2"><tt>-XX:+TransparentHugePages</tt></th>
>>   </tr>
>>   <tr><td></td><td>Unpatched</td><td>Patched</td><td>Unpatched</td><td>Patched</td></tr>
>>   <tr><td>4.18</td><td>11.30</td><td>11.30</td><td>0.25</td><td>0.25</td></tr>
>>   <tr><td>5.13</td><td>0.22</td><td>0.22</td><td>3.42</td><td>3.42</td></tr>
>>   <tr><td>6.1</td><td>0.27</td><td>0.33</td><td>3.54</td><td>0.33</td></tr>
>> </table>
>
> Liming Liu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Make it true by default and use a lower log level when fail

@limingliu-ampere 
Your change (at version 3ac920fd2f1f99e6889f3958e13aa8d2a749e17c) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15781#issuecomment-1911765848

From tschatzl at openjdk.org  Fri Jul  5 07:21:27 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Fri, 5 Jul 2024 07:21:27 GMT
Subject: RFR: 8331385: G1: Prefix HeapRegion helper classes with G1
In-Reply-To: <BbCLtLUIqyaA9lNeheVeZJV2fb49kWP2p5t8vRAJ1Uw=.6f72af7b-c5dd-4814-95b4-04e91f32b2c7@github.com>
References: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com>
 <BbCLtLUIqyaA9lNeheVeZJV2fb49kWP2p5t8vRAJ1Uw=.6f72af7b-c5dd-4814-95b4-04e91f32b2c7@github.com>
Message-ID: <G5mdMZ4HOP4K43zFH2JbNEBpec8tRLpv8Sqe5H5WVEI=.cdb0f0f1-e720-4abf-8a9d-3131411f7fb5@github.com>

On Tue, 2 Jul 2024 10:21:35 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> Hi all,
>> 
>>   after [JDK-8330694](https://bugs.openjdk.org/browse/JDK-8330694) which renamed HeapRegion to G1HeapRegion, there were  a few related helper classes in this CR that were not renamed.
>> 
>> It's purely mechanical renaming without even further renaming of files etc.
>> 
>> This change updates them.
>> 
>> (Fwiw, the "Viewed" checkbox at the top right of the file change helps a lot review this change incrementally)
>> 
>> Testing: tier1, tier4, tier5
>> 
>> Thanks,
>>   Thomas
>
> Marked as reviewed by ayang (Reviewer).

Thanks @albertnetymk @dholmes-ora for your reviews.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19967#issuecomment-2210331294

From tschatzl at openjdk.org  Fri Jul  5 07:21:29 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Fri, 5 Jul 2024 07:21:29 GMT
Subject: Integrated: 8331385: G1: Prefix HeapRegion helper classes with G1
In-Reply-To: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com>
References: <q2rzIb9CIlSji4pbk0GdDk-y6jrRgZCsvNFkrYI4CJM=.136951b5-f2bc-4169-83dc-b44d20b42f07@github.com>
Message-ID: <eMo7F30Tw-fsx2MNBVThXkJFfG8-0JCUnrPDTPV8PTM=.5a6a6e6f-2b05-4910-9d12-86665cafe80b@github.com>

On Mon, 1 Jul 2024 09:35:00 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

> Hi all,
> 
>   after [JDK-8330694](https://bugs.openjdk.org/browse/JDK-8330694) which renamed HeapRegion to G1HeapRegion, there were  a few related helper classes in this CR that were not renamed.
> 
> It's purely mechanical renaming without even further renaming of files etc.
> 
> This change updates them.
> 
> (Fwiw, the "Viewed" checkbox at the top right of the file change helps a lot review this change incrementally)
> 
> Testing: tier1, tier4, tier5
> 
> Thanks,
>   Thomas

This pull request has now been integrated.

Changeset: 4ec1ae10
Author:    Thomas Schatzl <tschatzl at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/4ec1ae109710aa150e27acf5706475d335c4655c
Stats:     887 lines in 68 files changed: 163 ins; 165 del; 559 mod

8331385: G1: Prefix HeapRegion helper classes with G1

Reviewed-by: ayang, dholmes

-------------

PR: https://git.openjdk.org/jdk/pull/19967

From eosterlund at openjdk.org  Fri Jul  5 07:57:22 2024
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Fri, 5 Jul 2024 07:57:22 GMT
Subject: [jdk23] RFR: 8334890: Missing unconditional cross modifying fence
 in nmethod entry barriers
In-Reply-To: <sKjd64NGI947n-5zuIRi78OPMZfwgreI54UDcyTeTW0=.a28e7639-c100-4e00-938c-15d0dcb8ca63@github.com>
References: <Zl8M3k7N0G-oBFmAyX3oO6RrxeNnCdr2lH9JyrdX0GQ=.4d2d82e8-49f7-44ec-84ff-0c0d6794b9e5@github.com>
 <sKjd64NGI947n-5zuIRi78OPMZfwgreI54UDcyTeTW0=.a28e7639-c100-4e00-938c-15d0dcb8ca63@github.com>
Message-ID: <3eH8AAPY5p1IQ9e7q9K_kIxbmLW_g0Y36ZGF3_MC1eE=.aeb6e1b7-2ce0-4586-a7d9-9a200c008cd6@github.com>

On Thu, 4 Jul 2024 13:55:15 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> 8334890: Missing unconditional cross modifying fence in nmethod entry barriers
>
> Marked as reviewed by aboldtch (Reviewer).

Thanks for the review @xmas92!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20036#issuecomment-2210381271

From eosterlund at openjdk.org  Fri Jul  5 07:57:23 2024
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Fri, 5 Jul 2024 07:57:23 GMT
Subject: [jdk23] Integrated: 8334890: Missing unconditional cross modifying
 fence in nmethod entry barriers
In-Reply-To: <Zl8M3k7N0G-oBFmAyX3oO6RrxeNnCdr2lH9JyrdX0GQ=.4d2d82e8-49f7-44ec-84ff-0c0d6794b9e5@github.com>
References: <Zl8M3k7N0G-oBFmAyX3oO6RrxeNnCdr2lH9JyrdX0GQ=.4d2d82e8-49f7-44ec-84ff-0c0d6794b9e5@github.com>
Message-ID: <iHwQYcMFiNCQYEUCbnJ1XKsBMVWLdBlu2IJ016J9Tcg=.d90c13c2-1fbd-4a67-8223-9c2b688b495e@github.com>

On Thu, 4 Jul 2024 13:52:09 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> 8334890: Missing unconditional cross modifying fence in nmethod entry barriers

This pull request has now been integrated.

Changeset: d383365e
Author:    Erik ?sterlund <eosterlund at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/d383365ea4196cd5f40de217547392b820c4ad01
Stats:     18 lines in 1 file changed: 1 ins; 14 del; 3 mod

8334890: Missing unconditional cross modifying fence in nmethod entry barriers

Reviewed-by: aboldtch
Backport-of: c0604fb823d9f3b2e347a9857b11606b223ad8ec

-------------

PR: https://git.openjdk.org/jdk/pull/20036

From azafari at openjdk.org  Fri Jul  5 11:15:25 2024
From: azafari at openjdk.org (Afshin Zafari)
Date: Fri, 5 Jul 2024 11:15:25 GMT
Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument
 to family of os::reserve/commit/uncommit memory API [v4]
In-Reply-To: <VuFm0lU78YLRVyTMOvNd1rofkOkF6VyvzhAePmQMFJc=.d8e6ca76-c405-4148-9fe9-007f0a3e616d@github.com>
References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com>
 <VuFm0lU78YLRVyTMOvNd1rofkOkF6VyvzhAePmQMFJc=.d8e6ca76-c405-4148-9fe9-007f0a3e616d@github.com>
Message-ID: <gch2E5eiUMmfTLIqawsbkP0QlwuJQy_Eg0K5ZUzX7aQ=.5f6d33dc-fd8f-4814-99ed-2d6c1f569d62@github.com>

On Fri, 24 May 2024 13:46:15 GMT, Afshin Zafari <azafari at openjdk.org> wrote:

>> This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745).  There are two main fixes here:
>>   1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves.
>> Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved.
>> 
>>   2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` are released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with false assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception.
>> 
>> Tests:
>> mach5 tiers 1-5, {linux-x64, macosx-aarch64, windows-x64, linux-aarch64 } x {debug, non-debug}
>
> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision:
> 
>   more fixes.

Withdrawn.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19343#issuecomment-2210686409

From azafari at openjdk.org  Fri Jul  5 11:15:26 2024
From: azafari at openjdk.org (Afshin Zafari)
Date: Fri, 5 Jul 2024 11:15:26 GMT
Subject: Withdrawn: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument
 to family of os::reserve/commit/uncommit memory API
In-Reply-To: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com>
References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com>
Message-ID: <AH1L3pa_NYr7AJG595_g3d_8iQvLkpTZMLmXz573NbM=.eb1f09bf-8a9e-4062-be19-eb326fcd945b@github.com>

On Wed, 22 May 2024 08:29:05 GMT, Afshin Zafari <azafari at openjdk.org> wrote:

> This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745).  There are two main fixes here:
>   1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves.
> Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved.
> 
>   2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` are released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with false assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception.
> 
> Tests:
> mach5 tiers 1-5, {linux-x64, macosx-aarch64, windows-x64, linux-aarch64 } x {debug, non-debug}

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/19343

From coleenp at openjdk.org  Fri Jul  5 12:38:21 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Fri, 5 Jul 2024 12:38:21 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665
In-Reply-To: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
Message-ID: <KSG0PgqjRhlVE2khvuSnf_CYg2sSqJ_oRaQKqqB4nT4=.aa29fd29-c476-4144-8454-78cf536ed55e@github.com>

On Wed, 3 Jul 2024 16:24:20 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens.
> 
> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details).
> 
> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. 
> 
> I tested the patch by running it through mach5 tiers 1-6.
> 
> Thanks,
> Patricio

Also a couple of nits, but this looks good.  Thanks for tracking down the history and verifying that its an unusual situation that we were optimizing for.

src/hotspot/share/interpreter/oopMapCache.hpp line 45:

> 43: // For InterpreterOopMap the bit_mask is allocated in the C heap
> 44: // to avoid issues with allocations from the resource area that have
> 45: // to live accross the oop closure (see 8335409). InterpreterOopMap

We don't usually put bug numbers in the code and after this change nobody will want to move this back to resource area, so putting the bug number as a caution shouldn't be needed.  If one wants to know the details, they can git blame this file.

src/hotspot/share/interpreter/oopMapCache.hpp line 46:

> 44: // to avoid issues with allocations from the resource area that have
> 45: // to live accross the oop closure (see 8335409). InterpreterOopMap
> 46: // should only be created and deleted during same garbage collection.

Can you add 'the' to "during the same garbage collection."

-------------

PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2160631864
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666753678
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666754397

From coleenp at openjdk.org  Fri Jul  5 12:38:22 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Fri, 5 Jul 2024 12:38:22 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665
In-Reply-To: <iZb_AvCGeJYQ51-UTqMhkxRKQwt0F6UgdM6nppalaEo=.d3c5ad91-9342-42a6-83c9-03a9e4a104bb@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
 <iZb_AvCGeJYQ51-UTqMhkxRKQwt0F6UgdM6nppalaEo=.d3c5ad91-9342-42a6-83c9-03a9e4a104bb@github.com>
Message-ID: <ICN_RiO5Rpx7xM9JGERyUT7Dh2VB-DoW-h4jGmyKDdY=.62faa586-b9f2-4915-95fd-e74e184e0bac@github.com>

On Thu, 4 Jul 2024 04:53:59 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens.
>> 
>> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details).
>> 
>> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. 
>> 
>> I tested the patch by running it through mach5 tiers 1-6.
>> 
>> Thanks,
>> Patricio
>
> src/hotspot/share/interpreter/oopMapCache.hpp line 138:
> 
>> 136:   // allocated space (i.e., the bit mask was to large to hold
>> 137:   // in-line), allocate the space from the C heap.
>> 138:   void resource_copy(OopMapCacheEntry* from);
> 
> The name `resource_copy` seems somewhat of a misnomer given it may be C heap. Is it worth changing?

I agree, this should probably be copy_from, and rename the parameter src.  Or something like that.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666755529

From duke at openjdk.org  Fri Jul  5 13:29:24 2024
From: duke at openjdk.org (duke)
Date: Fri, 5 Jul 2024 13:29:24 GMT
Subject: Withdrawn: 8331608: Consolidate EncodeGCModeConcurrentFrameClosure and
 TransformStackChunkClosure
In-Reply-To: <RjjYzSdzZdei0SN7GLMfHcTXoa_-HJItLxMJEM5UdYo=.6a3d2b3f-788e-4f82-98d7-68e38e62241b@github.com>
References: <RjjYzSdzZdei0SN7GLMfHcTXoa_-HJItLxMJEM5UdYo=.6a3d2b3f-788e-4f82-98d7-68e38e62241b@github.com>
Message-ID: <qHaXyKEgvO610PiIeRxc7rjkzlYjUIjnkxJzuUMMka4=.3f1ed153-e4d6-4cba-a59a-8ea8db98f9b5@github.com>

On Fri, 3 May 2024 11:58:36 GMT, Guoxiong Li <gli at openjdk.org> wrote:

> Hi all,
> 
> After [JDK-8296875](https://bugs.openjdk.org/browse/JDK-8296875), the classes `EncodeGCModeConcurrentFrameClosure` and `TransformStackChunkClosure` almost have the same code. This patch consolidates them into one.
> 
> The tests `make test-hotspot_loom` and `make test-hotspot_gc` passed locally (linux & x64). Thanks for taking the time to review.
> 
> Best Regards,
> -- Guoxiong

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/19084

From aph at openjdk.org  Fri Jul  5 13:36:42 2024
From: aph at openjdk.org (Andrew Haley)
Date: Fri, 5 Jul 2024 13:36:42 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter
Message-ID: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>

This patch expands the use of a hash table for secondary superclasses
to the interpreter, C1, and runtime. It also adds a C2 implementation
of hashed lookup in cases where the superclass isn't known at compile
time.

HotSpot shared runtime
----------------------

Building hashed secondary tables is now unconditional. It takes very
little time, and now that the shared runtime always has the tables, it
might as well take advantage of them. The shared code is easier to
follow now, I think.

There might be a performance issue with x86-64 in that we build
HotSpot for a default x86-64 target that does not support popcount.
This means that HotSpot C++ runtime on x86 always uses a software
emulation for popcount, even though the vast majority of machines made
for the past 20 years can do popcount in a single instruction. It
wouldn't be terribly hard to do something about that.

Having said that, the software popcount is really not bad.

x86
---

x86 is rather tricky, because we still support
`-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
well as 32- and 64-bit ports. There's some further complication in
that only `RCX` can be used as a shift count, so there's some register
shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
rather gnarly, with multiple levels of conditionals at compile time
and runtime.

AArch64
-------

AArch64 is considerably more straightforward. We always have a
popcount instruction and (thankfully) no 32-bit code to worry about.

Generally
---------

I would dearly love simply to rip out the "old" secondary supers cache
support, but I've left it in just in case someone has a performance
regression.

The versions of `MacroAssembler::lookup_secondary_supers_table` that
work with variable superclasses don't take a fixed set of temp
registers, and neither do they call out to to a slow path subroutine.
Instead, the slow patch is expanded inline.

I don't think this is necessarily bad. Apart from the very rare cases
where C2 can't determine the superclass to search for at compile time,
this code is only used for generating stubs, and it seemed to me
ridiculous to have stubs calling other stubs.

I've followed the guidance from @iwanowww not to obsess too much about
the performance of C1-compiled secondary supers lookups, and to prefer
simplicity over absolute performance. Nonetheless, this is a
complicated patch that touches many areas.

-------------

Commit messages:
 - Cleanup tests
 - small
 - Small
 - Temp
 - Merge remote-tracking branch 'refs/remotes/origin/JDK-8331658-work' into JDK-8331658-work
 - Fix x86-32
 - Fix x86
 - Temp
 - Temp
 - Temp
 - ... and 16 more: https://git.openjdk.org/jdk/compare/747e1e47...7d7694cc

Changes: https://git.openjdk.org/jdk/pull/19989/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8331341
  Stats: 886 lines in 13 files changed: 755 ins; 69 del; 62 mod
  Patch: https://git.openjdk.org/jdk/pull/19989.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989

PR: https://git.openjdk.org/jdk/pull/19989

From mli at openjdk.org  Fri Jul  5 13:48:24 2024
From: mli at openjdk.org (Hamlin Li)
Date: Fri, 5 Jul 2024 13:48:24 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v4]
In-Reply-To: <FZMjsZWO9NKx4v5svo8qQPE5HKqvoiM-lc0oiDCah80=.2d250429-524a-4e93-a453-bf1db0238626@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
 <FZMjsZWO9NKx4v5svo8qQPE5HKqvoiM-lc0oiDCah80=.2d250429-524a-4e93-a453-bf1db0238626@github.com>
Message-ID: <JH625cZxMHDvjzWakK6XGICFywENU6G0odkwwzpzLvU=.8e62af09-bdbb-4d77-a63d-fb77f9bb6a92@github.com>

On Tue, 2 Jul 2024 14:16:35 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> 
>> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it.
>> 
>> Thanks.
>> 
>> ## Test
>> benchmarks run on CanVM-K230
>> 
>> I've tried several implementations, respectively with vector group
>> * m2+m1+scalar
>> * m2+scalar
>> * m1+scalar
>> * pure scalar
>> The best one is combination of m2+m1, it have best performance in all source size.
>> 
>> this implementation (m2+m1)
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic
>> -- | -- | -- | -- | -- | -- | -- | -- | --
>> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063
>> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443
>> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364
>> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856
>> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146
>> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423
>> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862
>> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469
>> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631
>> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698
>> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876
>> 
>> </google-sheets-html-origin>
>> 
>> vector with only m2
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: st...
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   move label

Thanks a lot for sharing the information.

> @Hamlin-Li Hi, we looked at RVV base64 encode/decode for another project before, however there wasn't one implementation that obviously was best across the different hardware: [WojciechMula/base64simd#9](https://github.com/WojciechMula/base64simd/issues/9) (see issue for benchmark, and repo for code)

Agree, I think your observation is right.

> I think we currently can't tell how, the complex load/stores will perform on future hardware. Segmented load/stores for example are quite fast on the current in-order RVV 1.0 boards, however it's very slow on the ooo C910, and XiangShan (current master, may change) cores (SiFive P670 LLVM-MCA indicates that it might also be slow on that core). I'm not sure if that is because they are ooo and that gives you additional constraints, but I wouldn't rely on it just yet.

I don't know how that (`it's very slow on the ooo`) happens and currently I don't have these types of machine. And it's bit strange that they are very slow with those instructions, could it be that they are not fully optimized for those instructions on these machines?

> I think the safest bet for encode would be for now "RISC-V RVV (LMUL=1)" ([`encode`](https://github.com/WojciechMula/base64simd/blob/master/encode/encode.rvv.cpp#L60C14-L60C20) + [`lookup_pshufb_improved`](https://github.com/WojciechMula/base64simd/blob/master/encode/lookup.rvv.cpp#L7)), as this only uses instructions with predictable performance, except for LMUL=1 `vrgather.vv`, which I think will need to be fast on any application class core. (See x86 equivalent vperm*)

My current tests on k230 shows that m2+m1+scalar bring the best performance on all size values, I'd like to see test data on other hardwares if someone can help test and get the data.
And, for current implementation it's easy to adjust lmul value in the algorithm. So I'm flexiable to either lmul value based the test data.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2210902916

From mli at openjdk.org  Fri Jul  5 13:48:25 2024
From: mli at openjdk.org (Hamlin Li)
Date: Fri, 5 Jul 2024 13:48:25 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2]
In-Reply-To: <dGgB0M0dpmd_gFfsX8XlLaeL5uk9HynqHdYuZvp7URs=.17839235-7974-4be7-a92f-2e8d5fdb1c0b@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
 <i74xW_pCw7qGaDg6Dk9VokHRJiyhMFQ5PDz8Mi0BLr4=.939e76e4-caa2-4c9f-b33a-f29c901fc193@github.com>
 <S-BpiX60ySY6FNDfcskTHuuDsQQIno54AaOvSFlm67c=.24e8cf29-de2c-4f8e-bcdb-7cd1c7927c30@github.com>
 <dGgB0M0dpmd_gFfsX8XlLaeL5uk9HynqHdYuZvp7URs=.17839235-7974-4be7-a92f-2e8d5fdb1c0b@github.com>
Message-ID: <j_zMt6H-xeKbTEySzai9jsiS8jS0vTgyloYW0DHANF4=.c89a1fde-7f90-49bc-a8e2-29e9df5142f4@github.com>

On Thu, 4 Jul 2024 17:49:56 GMT, Camel Coder <duke at openjdk.org> wrote:

> For decode, I'm not really happy with any implementation. Yours uses multiple `vluxei8` + `vlsege4` + `vssege3`, the others from base64simd use LMUL=8 `vrgather.vv`, which will take `LMUL^2=8^2=64` times the amount of cycles a LMUL=1 `vrgather.vv` takes (on sane implementations, [see my reasoning](https://gitlab.com/riseproject/riscv-optimization-guide/-/issues/1#note_1977583125)). As I said, I'm fairly certain LMUL=1 `vrgather.vv` will have to be relatively fast, so if I had to choose, I'd prefer [my implementation](https://godbolt.org/z/hrs61x9aP) that uses LMUL=1 `vrgather.vv`s + `vlsege4` + `vssege3`, but using `vsseg*` is not ideal. (Note that gcc currently chokes on the register allocation, so you should use clang for now)

I import [your implementation](https://godbolt.org/z/hrs61x9aP) into jdk, but compared to my current decode implementation, it brings much regression.
Let's discuss about decode in https://github.com/openjdk/jdk/pull/20026.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2210907011

From pchilanomate at openjdk.org  Fri Jul  5 14:15:33 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Fri, 5 Jul 2024 14:15:33 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v2]
In-Reply-To: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
Message-ID: <jgJVKPLVStPVPXoOtDJO5RcwYG4ForKGV-ZPpyIEOnk=.9bd869b5-5856-49c5-9fd0-727ea2e04c9f@github.com>

> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens.
> 
> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details).
> 
> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. 
> 
> I tested the patch by running it through mach5 tiers 1-6.
> 
> Thanks,
> Patricio

Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision:

 - Coleen's comments
 - David's comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20012/files
  - new: https://git.openjdk.org/jdk/pull/20012/files/ca0db02b..805358e7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20012&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20012&range=00-01

  Stats: 23 lines in 2 files changed: 0 ins; 2 del; 21 mod
  Patch: https://git.openjdk.org/jdk/pull/20012.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20012/head:pull/20012

PR: https://git.openjdk.org/jdk/pull/20012

From pchilanomate at openjdk.org  Fri Jul  5 14:15:36 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Fri, 5 Jul 2024 14:15:36 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v2]
In-Reply-To: <iZb_AvCGeJYQ51-UTqMhkxRKQwt0F6UgdM6nppalaEo=.d3c5ad91-9342-42a6-83c9-03a9e4a104bb@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
 <iZb_AvCGeJYQ51-UTqMhkxRKQwt0F6UgdM6nppalaEo=.d3c5ad91-9342-42a6-83c9-03a9e4a104bb@github.com>
Message-ID: <BxVXPXx1uYVm3LYXBOIgQ26i8VhKpaH-r0zio3ykvAI=.00a0a0bc-92fa-40cc-ae77-6f25f6f9be0d@github.com>

On Thu, 4 Jul 2024 04:49:53 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Coleen's comments
>>  - David's comments
>
> src/hotspot/share/interpreter/oopMapCache.cpp line 179:
> 
>> 177: #ifdef ASSERT
>> 178:   _used = false;
>> 179: #endif
> 
> Nit pre-existing: use of DEBUG_ONLY would be more consistent with later setting of `_used`.

Fixed.

> src/hotspot/share/interpreter/oopMapCache.cpp line 408:
> 
>> 406: 
>> 407: void InterpreterOopMap::resource_copy(OopMapCacheEntry* from) {
>> 408:   // The expectation is that this InterpreterOopMap is a recently created
> 
> s/is a recently/is recently/

Fixed.

> src/hotspot/share/interpreter/oopMapCache.hpp line 136:
> 
>> 134:   // Copy the OopMapCacheEntry in parameter "from" into this
>> 135:   // InterpreterOopMap.  If the _bit_mask[0] in "from" points to
>> 136:   // allocated space (i.e., the bit mask was to large to hold
> 
> Nit pre-existing: s/to/too/

Fixed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666856873
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666856765
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666856975

From pchilanomate at openjdk.org  Fri Jul  5 14:17:09 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Fri, 5 Jul 2024 14:17:09 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v2]
In-Reply-To: <KSG0PgqjRhlVE2khvuSnf_CYg2sSqJ_oRaQKqqB4nT4=.aa29fd29-c476-4144-8454-78cf536ed55e@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
 <KSG0PgqjRhlVE2khvuSnf_CYg2sSqJ_oRaQKqqB4nT4=.aa29fd29-c476-4144-8454-78cf536ed55e@github.com>
Message-ID: <qc9NFhzOYAVZNhWoHtHkob6X1_iNYUPtsogLgp1zLm8=.32c19818-6de4-463a-b213-6131e0dcca6c@github.com>

On Fri, 5 Jul 2024 12:32:53 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Coleen's comments
>>  - David's comments
>
> src/hotspot/share/interpreter/oopMapCache.hpp line 45:
> 
>> 43: // For InterpreterOopMap the bit_mask is allocated in the C heap
>> 44: // to avoid issues with allocations from the resource area that have
>> 45: // to live accross the oop closure (see 8335409). InterpreterOopMap
> 
> We don't usually put bug numbers in the code and after this change nobody will want to move this back to resource area, so putting the bug number as a caution shouldn't be needed.  If one wants to know the details, they can git blame this file.

Removed.

> src/hotspot/share/interpreter/oopMapCache.hpp line 46:
> 
>> 44: // to avoid issues with allocations from the resource area that have
>> 45: // to live accross the oop closure (see 8335409). InterpreterOopMap
>> 46: // should only be created and deleted during same garbage collection.
> 
> Can you add 'the' to "during the same garbage collection."

Fixed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666861364
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666861445

From pchilanomate at openjdk.org  Fri Jul  5 14:17:10 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Fri, 5 Jul 2024 14:17:10 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v2]
In-Reply-To: <ICN_RiO5Rpx7xM9JGERyUT7Dh2VB-DoW-h4jGmyKDdY=.62faa586-b9f2-4915-95fd-e74e184e0bac@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
 <iZb_AvCGeJYQ51-UTqMhkxRKQwt0F6UgdM6nppalaEo=.d3c5ad91-9342-42a6-83c9-03a9e4a104bb@github.com>
 <ICN_RiO5Rpx7xM9JGERyUT7Dh2VB-DoW-h4jGmyKDdY=.62faa586-b9f2-4915-95fd-e74e184e0bac@github.com>
Message-ID: <spVCFBfSxNDbMApL5o8zC8KJA5wJ1L2VISkpHfFT8Eg=.5f07f8d4-e8cd-470c-9c78-b49bf08f3ef0@github.com>

On Fri, 5 Jul 2024 12:34:55 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> src/hotspot/share/interpreter/oopMapCache.hpp line 138:
>> 
>>> 136:   // allocated space (i.e., the bit mask was to large to hold
>>> 137:   // in-line), allocate the space from the C heap.
>>> 138:   void resource_copy(OopMapCacheEntry* from);
>> 
>> The name `resource_copy` seems somewhat of a misnomer given it may be C heap. Is it worth changing?
>
> I agree, this should probably be copy_from, and rename the parameter src.  Or something like that.

I also thought about renaming it but ended up leaving it as is in v1. I changed it to Coleen's suggestion.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666861013

From jvernee at openjdk.org  Fri Jul  5 14:27:37 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Fri, 5 Jul 2024 14:27:37 GMT
Subject: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively
 throws wrong exception [v2]
In-Reply-To: <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com>
References: <gD4D2MSMO5dqwOf-XWA1u-a50e59goP8F_6be-mermA=.d172f4cf-14ad-492b-bdcc-8cf39d77c8ef@github.com>
 <1yQze0X7kl1oxFtlWu0rtJwHF2WtnZYJ7t6OteIJAnQ=.85eae267-7848-4978-aa11-9f2720e67e00@github.com>
Message-ID: <s3ecyFzSeSB-ZY_HYForZPoga3JUOjMRftb91Zt_Wzs=.a6708d5f-e74d-43ea-afbb-e4136a356ca3@github.com>

On Thu, 4 Jul 2024 06:22:31 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this change adds special casing for `VarHandle.{access-mode}` methods.
>> 
>> The exception message is less exact, but I think that's acceptable.
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   address comments

I think this needs a CSR, to document the change in behavior. (See e.g. https://bugs.openjdk.org/browse/JDK-8335554 which is a very similar case)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20015#issuecomment-2210971780

From coleenp at openjdk.org  Fri Jul  5 14:38:32 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Fri, 5 Jul 2024 14:38:32 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v2]
In-Reply-To: <jgJVKPLVStPVPXoOtDJO5RcwYG4ForKGV-ZPpyIEOnk=.9bd869b5-5856-49c5-9fd0-727ea2e04c9f@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
 <jgJVKPLVStPVPXoOtDJO5RcwYG4ForKGV-ZPpyIEOnk=.9bd869b5-5856-49c5-9fd0-727ea2e04c9f@github.com>
Message-ID: <MFRBX5ILKT3OucHWesIQe52aS13yoxRet-jHkUjpks8=.742dddb8-52b8-4a09-94e7-2374b47535ba@github.com>

On Fri, 5 Jul 2024 14:15:33 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens.
>> 
>> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details).
>> 
>> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. 
>> 
>> I tested the patch by running it through mach5 tiers 1-6.
>> 
>> Thanks,
>> Patricio
>
> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Coleen's comments
>  - David's comments

One tiny nit.

src/hotspot/share/interpreter/oopMapCache.hpp line 93:

> 91:  protected:
> 92: #ifdef ASSERT
> 93:   bool           _used;

Can you make this a DEBUG_ONLY() too?

-------------

PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2160851216
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666882419

From pchilanomate at openjdk.org  Fri Jul  5 15:01:05 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Fri, 5 Jul 2024 15:01:05 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v3]
In-Reply-To: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
Message-ID: <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com>

> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens.
> 
> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details).
> 
> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. 
> 
> I tested the patch by running it through mach5 tiers 1-6.
> 
> Thanks,
> Patricio

Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:

  use DEBUG_ONLY on _used declaration

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20012/files
  - new: https://git.openjdk.org/jdk/pull/20012/files/805358e7..7ce559cb

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20012&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20012&range=01-02

  Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/20012.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20012/head:pull/20012

PR: https://git.openjdk.org/jdk/pull/20012

From pchilanomate at openjdk.org  Fri Jul  5 15:01:06 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Fri, 5 Jul 2024 15:01:06 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v2]
In-Reply-To: <MFRBX5ILKT3OucHWesIQe52aS13yoxRet-jHkUjpks8=.742dddb8-52b8-4a09-94e7-2374b47535ba@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
 <jgJVKPLVStPVPXoOtDJO5RcwYG4ForKGV-ZPpyIEOnk=.9bd869b5-5856-49c5-9fd0-727ea2e04c9f@github.com>
 <MFRBX5ILKT3OucHWesIQe52aS13yoxRet-jHkUjpks8=.742dddb8-52b8-4a09-94e7-2374b47535ba@github.com>
Message-ID: <1yESUDNXtrEnuUC0qHYA81qWZpXhrZEE1K9atEtZsI0=.53f41358-b333-4300-af2d-d441c8efe537@github.com>

On Fri, 5 Jul 2024 14:34:57 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Coleen's comments
>>  - David's comments
>
> src/hotspot/share/interpreter/oopMapCache.hpp line 93:
> 
>> 91:  protected:
>> 92: #ifdef ASSERT
>> 93:   bool           _used;
> 
> Can you make this a DEBUG_ONLY() too?

Fixed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1666904522

From nprasad at openjdk.org  Fri Jul  5 15:01:10 2024
From: nprasad at openjdk.org (Neethu Prasad)
Date: Fri, 5 Jul 2024 15:01:10 GMT
Subject: RFR: 8334230: Optimize C2 classes layout
Message-ID: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>

**Notes**

Rearrange C2 class fields to optimize footprint.


**Verification**

1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests.
2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes.

| Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding |
| ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- |
| ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 ->  0 | 11 -> 0  | 56 bytes -> 48 | 0 -> 3 |
| CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 ->  0 | 5 -> 0  | 24 bytes -> 16 | 7 -> 4 |
| C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 ->  0 | 7 -> 0  | 56 bytes -> 48 | 7 -> 6 |
| VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 ->  0 | 8 -> 0  | 32 bytes -> 24 | 1 -> 1 |

class ArrayPointer {
	const class Node  *        _pointer;             /*     0     8 */
	const class Node  *        _base;                /*     8     8 */
	const jlong                _constant_offset;     /*    16     8 */
	const class Node  *        _int_offset;          /*    24     8 */
	const class GrowableArray<Node*>  * _other_offsets; /*    32     8 */
	const jint                 _int_offset_shift;    /*    40     4 */
	const bool                 _is_valid;            /*    44     1 */
public:


	/* size: 48, cachelines: 1, members: 7 */
	/* padding: 3 */
	/* last cacheline: 48 bytes */
};



class CallJavaNode : public CallNode {
public:

	/* class CallNode            <ancestor>; */      /*     0   128 */
protected:

	/* --- cacheline 2 boundary (128 bytes) --- */
	class ciMethod *           _method;              /*   128     8 */
	bool                       _optimized_virtual;   /*   136     1 */
	bool                       _method_handle_invoke; /*   137     1 */
	bool                       _override_symbolic_info; /*   138     1 */
	bool                       _arg_escape;          /*   139     1 */
public:

protected:

public:


	/* size: 144, cachelines: 3, members: 6 */
	/* padding: 4 */
	/* last cacheline: 16 bytes */

	/* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */
};



class C2Access : public StackObj {
public:

	/* class StackObj            <ancestor>; */      /*     0     0 */

	/* XXX last struct has 1 byte of padding */

	int ()(void) * *           _vptr.C2Access;       /*     0     8 */
protected:

	DecoratorSet               _decorators;          /*     8     8 */
	class Node *               _base;                /*    16     8 */
	class C2AccessValuePtr &   _addr;                /*    24     8 */
	class Node *               _raw_access;          /*    32     8 */
	enum BasicType             _type;                /*    40     1 */
	uint8_t                    _barrier_data;        /*    41     1 */
public:

protected:

public:


	/* size: 48, cachelines: 1, members: 8 */
	/* padding: 6 */
	/* paddings: 1, sum paddings: 1 */
	/* last cacheline: 48 bytes */
};



class VectorSet : public AnyObj {
public:

	/* class AnyObj              <ancestor>; */      /*     0     0 */

	/* XXX last struct has 1 byte of padding */

	static const uint                 word_bits;     /*     0     0 */
	static const uint                 bit_mask;      /*     0     0 */
	uint                       _size;                /*     0     4 */
	uint                       _data_size;           /*     4     4 */
	uint32_t *                 _data;                /*     8     8 */
	class Arena *              _set_arena;           /*    16     8 */

	/* size: 24, cachelines: 1, members: 5, static members: 2 */
	/* paddings: 1, sum paddings: 1 */
	/* last cacheline: 24 bytes */
};

I wrote simple program that just assigns integer value to a variable and observed the following - 
Number of ArrayPointer instances = 58. 
Number of C2Access instances = 1390.
Number of CallJavaNode instances = 1626.
58 * 8 byte + 1390 * 8 + 1626 * 8 = 24KB
24 KB space saving at the very least and significant memory footprint savings for much complex programs.

-------------

Commit messages:
 - 8334230: Keep constructor order same as before & optimize VectorSet
 - 8334230: Optimize C2 classes layout

Changes: https://git.openjdk.org/jdk/pull/19861/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19861&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8334230
  Stats: 20 lines in 4 files changed: 8 ins; 8 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/19861.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19861/head:pull/19861

PR: https://git.openjdk.org/jdk/pull/19861

From coleenp at openjdk.org  Fri Jul  5 15:08:40 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Fri, 5 Jul 2024 15:08:40 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v3]
In-Reply-To: <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
 <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com>
Message-ID: <irnzAa0yJB8hugTmGr7TSJi6kqutUJ_2-nrfQj1w0Rc=.d279c33e-7535-47c5-bcfb-6e4903677d79@github.com>

On Fri, 5 Jul 2024 15:01:05 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens.
>> 
>> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details).
>> 
>> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. 
>> 
>> I tested the patch by running it through mach5 tiers 1-6.
>> 
>> Thanks,
>> Patricio
>
> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   use DEBUG_ONLY on _used declaration

Perfect, thanks!

-------------

Marked as reviewed by coleenp (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2160900015

From pchilanomate at openjdk.org  Fri Jul  5 15:20:13 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Fri, 5 Jul 2024 15:20:13 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom
Message-ID: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>

Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler.

I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3.

Thanks,
Patricio

-------------

Commit messages:
 - v1

Changes: https://git.openjdk.org/jdk/pull/20016/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335269
  Stats: 81 lines in 2 files changed: 81 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20016.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20016/head:pull/20016

PR: https://git.openjdk.org/jdk/pull/20016

From ccheung at openjdk.org  Fri Jul  5 16:41:32 2024
From: ccheung at openjdk.org (Calvin Cheung)
Date: Fri, 5 Jul 2024 16:41:32 GMT
Subject: RFR: 8312125: Refactor CDS enum class handling [v2]
In-Reply-To: <xxU06cCiROZP1kPcY6pWxomBLPTGPPnxrbc22c-K08E=.4c6d7a88-29d0-43eb-a917-4e90766ddfc9@github.com>
References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com>
 <xxU06cCiROZP1kPcY6pWxomBLPTGPPnxrbc22c-K08E=.4c6d7a88-29d0-43eb-a917-4e90766ddfc9@github.com>
Message-ID: <qmGt-PH377XuHmbiyWcBVmOns3mAHT1yBjtb5zLvVds=.8cec38c7-6af9-4603-8659-10d23f7943f1@github.com>

On Wed, 3 Jul 2024 19:57:51 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>> Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change.
>
> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fixed copyright

Refactoring looks good. I have one suggestion.

src/hotspot/share/cds/cdsEnumKlass.cpp line 136:

> 134:   return true;
> 135: }
> 136: #endif

Suggestion: `#endif INCLUDE_CDS_JAVA_HEAP`

-------------

Marked as reviewed by ccheung (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20013#pullrequestreview-2161016601
PR Review Comment: https://git.openjdk.org/jdk/pull/20013#discussion_r1666987825

From duke at openjdk.org  Fri Jul  5 17:25:34 2024
From: duke at openjdk.org (Mikhail Ablakatov)
Date: Fri, 5 Jul 2024 17:25:34 GMT
Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64
In-Reply-To: <nq7CFhZoe0rxlErqNKGguqM2rPTQPpg_9Fr6Cj5JHXE=.856ec966-04d8-41b8-b1cf-33ece5ff843a@github.com>
References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com>
 <Yg1PC9SInsa5q1qJvsDjEJuqUHoW5WLMYMUEo9Rx_WE=.9b40b7c4-a21e-4139-b6c3-52c2757e933d@github.com>
 <nq7CFhZoe0rxlErqNKGguqM2rPTQPpg_9Fr6Cj5JHXE=.856ec966-04d8-41b8-b1cf-33ece5ff843a@github.com>
Message-ID: <HhMPWbuQUf5pmev0UqK8WkDpVZdmzw211OLOa7OQLp8=.f41ed4f0-e1c9-4ee1-afa0-74c052e29ce5@github.com>

On Thu, 16 May 2024 12:40:30 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Hi,
>> 
>>>  I can update the patch with current results on Monday and we could decide how to proceed with this PR after that. Sounds good?
>> 
>> Yes, that's right.
>
>> Hi @theRealAph ! You may find the latest version here: [mikabl-arm at b3db421](https://github.com/mikabl-arm/jdk/commit/b3db421c795f683db1a001853990026bafc2ed4b) . I gave a short explanation in the commit message, feel free to ask for more details if required.
>> 
>> Unfortunately, it still contains critical bugs and I won't be able to take a look into the issue before the next week at best. Until it's fixed, it's not possible to run the benchmarks. Although I expect it to improve performance on longer integer arrays based on a benchmark I've written in C++ and Assembly. The results aren't comparable to the jmh results, so I won't post them here.
> 
> OK. One small thing, I think it's possible to rearrange things a bit to use `mlav`, which may help performance. No need for that until the code is correct, though.

Hi @theRealAph ! This took a while, but please find a fixed version here: https://github.com/mikabl-arm/jdk/tree/285826-vmul

Here are performance numbers collected for Neoverse V2 compared to the common baseline and the latest state of this PR:

                                                          |    d2ea6b1e657    |    f19203015fb    |    5504227bfe3   |
                                                          |     baseline      |        PR         |    285826-vmul   |
----------------------------------------------------------|---------------------------------------|------------------|------
Benchmark                               (size)  Mode  Cnt |    Score    Error |    Score    Error |    Score   Error | Units
----------------------------------------------------------|---------------------------------------|------------------|------
ArraysHashCode.bytes                         1  avgt   15 |    0.859 ?  0.166 |    0.720 ?  0.103 |    0.732 ? 0.105 | ns/op
ArraysHashCode.bytes                        10  avgt   15 |    4.440 ?  0.013 |    2.262 ?  0.009 |    3.454 ? 0.057 | ns/op
ArraysHashCode.bytes                       100  avgt   15 |   78.642 ?  0.119 |   15.997 ?  0.023 |   12.753 ? 0.072 | ns/op
ArraysHashCode.bytes                     10000  avgt   15 | 9248.961 ? 11.332 | 1879.905 ? 11.609 | 1345.014 ? 1.947 | ns/op
ArraysHashCode.chars                         1  avgt   15 |    0.695 ?  0.036 |    0.694 ?  0.035 |    0.682 ? 0.036 | ns/op
ArraysHashCode.chars                        10  avgt   15 |    4.436 ?  0.015 |    2.428 ?  0.034 |    3.352 ? 0.031 | ns/op
ArraysHashCode.chars                       100  avgt   15 |   78.660 ?  0.113 |   14.508 ?  0.075 |   11.784 ? 0.088 | ns/op
ArraysHashCode.chars                     10000  avgt   15 | 9253.807 ? 13.660 | 2010.053 ?  3.549 | 1344.716 ? 1.936 | ns/op
ArraysHashCode.ints                          1  avgt   15 |    0.635 ?  0.022 |    0.640 ?  0.022 |    0.640 ? 0.022 | ns/op
ArraysHashCode.ints                         10  avgt   15 |    4.424 ?  0.006 |    2.752 ?  0.012 |    3.388 ? 0.004 | ns/op
ArraysHashCode.ints                        100  avgt   15 |   78.680 ?  0.120 |   14.794 ?  0.131 |   11.090 ? 0.055 | ns/op
ArraysHashCode.ints                      10000  avgt   15 | 9249.520 ? 13.305 | 1997.441 ?  3.299 | 1340.916 ? 1.843 | ns/op
ArraysHashCode.multibytes                    1  avgt   15 |    0.566 ?  0.023 |    0.563 ?  0.021 |    0.554 ? 0.012 | ns/op
ArraysHashCode.multibytes                   10  avgt   15 |    2.679 ?  0.018 |    1.798 ?  0.038 |    1.973 ? 0.021 | ns/op
ArraysHashCode.multibytes                  100  avgt   15 |   36.934 ?  0.055 |    9.118 ?  0.018 |   12.712 ? 0.026 | ns/op
ArraysHashCode.multibytes                10000  avgt   15 | 4861.700 ?  6.563 | 1005.809 ?  2.260 |  721.366 ? 1.570 | ns/op
ArraysHashCode.multichars                    1  avgt   15 |    0.557 ?  0.016 |    0.552 ?  0.001 |    0.563 ? 0.021 | ns/op
ArraysHashCode.multichars                   10  avgt   15 |    2.700 ?  0.018 |    1.840 ?  0.024 |    1.978 ? 0.008 | ns/op
ArraysHashCode.multichars                  100  avgt   15 |   36.932 ?  0.054 |    8.633 ?  0.020 |    8.678 ? 0.052 | ns/op
ArraysHashCode.multichars                10000  avgt   15 | 4859.462 ?  6.693 | 1063.788 ?  3.057 |  752.857 ? 5.262 | ns/op
ArraysHashCode.multiints                     1  avgt   15 |    0.574 ?  0.023 |    0.554 ?  0.011 |    0.559 ? 0.017 | ns/op
ArraysHashCode.multiints                    10  avgt   15 |    2.707 ?  0.028 |    1.907 ?  0.031 |    1.992 ? 0.036 | ns/op
ArraysHashCode.multiints                   100  avgt   15 |   36.942 ?  0.056 |    9.141 ?  0.013 |    8.174 ? 0.029 | ns/op
ArraysHashCode.multiints                 10000  avgt   15 | 4872.540 ?  7.479 | 1187.393 ? 12.083 |  785.256 ? 9.472 | ns/op
ArraysHashCode.multishorts                   1  avgt   15 |    0.558 ?  0.016 |    0.555 ?  0.012 |    0.566 ? 0.022 | ns/op
ArraysHashCode.multishorts                  10  avgt   15 |    2.696 ?  0.015 |    1.854 ?  0.027 |    1.983 ? 0.009 | ns/op
ArraysHashCode.multishorts                 100  avgt   15 |   36.930 ?  0.051 |    8.652 ?  0.011 |    8.681 ? 0.039 | ns/op
ArraysHashCode.multishorts               10000  avgt   15 | 4863.966 ?  6.736 | 1068.627 ?  1.902 |  760.280 ? 5.150 | ns/op
ArraysHashCode.shorts                        1  avgt   15 |    0.665 ?  0.058 |    0.644 ?  0.022 |    0.636 ? 0.023 | ns/op
ArraysHashCode.shorts                       10  avgt   15 |    4.431 ?  0.006 |    2.432 ?  0.024 |    3.332 ? 0.026 | ns/op
ArraysHashCode.shorts                      100  avgt   15 |   78.630 ?  0.103 |   14.521 ?  0.077 |   11.783 ? 0.093 | ns/op
ArraysHashCode.shorts                    10000  avgt   15 | 9249.908 ? 12.039 | 2010.461 ?  2.548 | 1344.441 ? 1.818 | ns/op
StringHashCode.Algorithm.defaultLatin1       1  avgt   15 |    0.770 ?  0.001 |    0.770 ?  0.001 |    0.770 ? 0.001 | ns/op
StringHashCode.Algorithm.defaultLatin1      10  avgt   15 |    4.305 ?  0.009 |    2.260 ?  0.009 |    3.433 ? 0.015 | ns/op
StringHashCode.Algorithm.defaultLatin1     100  avgt   15 |   78.355 ?  0.102 |   16.140 ?  0.038 |   12.767 ? 0.023 | ns/op
StringHashCode.Algorithm.defaultLatin1   10000  avgt   15 | 9269.665 ? 13.817 | 1893.354 ?  3.677 | 1345.571 ? 1.930 | ns/op
StringHashCode.Algorithm.defaultUTF16        1  avgt   15 |    0.736 ?  0.100 |    0.653 ?  0.083 |    0.690 ? 0.101 | ns/op
StringHashCode.Algorithm.defaultUTF16       10  avgt   15 |    4.280 ?  0.018 |    2.374 ?  0.021 |    3.394 ? 0.010 | ns/op
StringHashCode.Algorithm.defaultUTF16      100  avgt   15 |   78.312 ?  0.118 |   14.603 ?  0.103 |   11.837 ? 0.016 | ns/op
StringHashCode.Algorithm.defaultUTF16    10000  avgt   15 | 9249.562 ? 13.113 | 2011.717 ?  4.097 | 1344.715 ? 1.896 | ns/op
StringHashCode.cached                      N/A  avgt   15 |    0.539 ?  0.027 |    0.525 ?  0.018 |    0.525 ? 0.018 | ns/op
StringHashCode.empty                       N/A  avgt   15 |    0.861 ?  0.163 |    0.670 ?  0.079 |    0.694 ? 0.093 | ns/op
StringHashCode.notCached                   N/A  avgt   15 |    0.698 ?  0.108 |    0.648 ?  0.024 |    0.637 ? 0.023 | ns/op


There are several known issues:

- [ ] For arrays shorter than the number of elements processed by a single iteration of the Neon loop performance is not optimal, though still better than the baseline's.
- [ ] The intrinsic take 364 Bytes in the worst case (for BYTE/BOOLEAN types) which may either significantly increase code size or limit inlining opportunities.
- [ ]  As mentioned before, the implementation might be affected by https://bugs.openjdk.org/browse/JDK-8139457 .

To address the first two we could implement the vectorized part of the algorithm as a separate stub method. Please let me know if this sound like a right approach or you have other suggestions.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2211186951

From aph at openjdk.org  Fri Jul  5 17:46:35 2024
From: aph at openjdk.org (Andrew Haley)
Date: Fri, 5 Jul 2024 17:46:35 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
Message-ID: <_8D8tMevrVR00rHIHSQRHnfBxjoApH7UcHH-1HRl2mo=.4866b276-5856-43b2-927e-86c2f4e9d60a@github.com>

On Mon, 1 Jul 2024 16:54:55 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294).
>> 
>> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time.
>> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
>> 
>> Besides of the code changes, one important task is to handle the legal process.
>> 
>> Thanks!
>> 
>> ## Test
>> tests:
>> * test/jdk/jdk/incubator/vector/
>> * test/hotspot/jtreg/compiler/vectorapi/
>> 
>> options:
>> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:+EnableVectorSupport -XX:-UseVectorStubs
>> 
>> ## Performance
>> 
>> ### Options
>> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
>> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
>> 
>> ### Float
>> data
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0)
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408
>> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876
>> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936
>> Float128Vector.ATAN2 | 1024 | thrpt | 10 | 0.021 | ops/ms | 135.088 | 32.449 | 4.163 | 135.721 | 32.579 | 4.166
>> Float128Vector.CBRT | 1024 | thrpt | 10 | 0.004 | ops/ms | 114.547 | 39.517 | 2....
>
> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits:
> 
>  - Merge branch 'master' into sleef-aarch64-integrate-source
>  - merge master
>  - sleef 3.6.1 for riscv
>  - sleef 3.6.1
>  - update header files for arm
>  - add inline header file for riscv64
>  - remove notes about sleef changes
>  - fix performance issue
>  - disable unused-function warnings; add log msg
>  - minor
>  - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863

I have now wasted two hours trying to duplicate your results.

I need you to write here the _exact_ command line that produced your numbers above, along with the full configure and build options you used.

I also had problems with javac running out of heap space, which was very odd. I fixed it with this:


diff --git a/make/autoconf/boot-jdk.m4 b/make/autoconf/boot-jdk.m4
index 8d272c28ad5..617ccfd8fff 100644
--- a/make/autoconf/boot-jdk.m4
+++ b/make/autoconf/boot-jdk.m4
@@ -470,7 +470,7 @@ AC_DEFUN_ONCE([BOOTJDK_SETUP_BOOT_JDK_ARGUMENTS],
   # Maximum amount of heap memory.
   JVM_HEAP_LIMIT_32="768"
   # Running a 64 bit JVM allows for and requires a bigger heap
-  JVM_HEAP_LIMIT_64="1600"
+  JVM_HEAP_LIMIT_64="6400"

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2211202867
PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2211202959

From iklam at openjdk.org  Sun Jul  7 01:50:17 2024
From: iklam at openjdk.org (Ioi Lam)
Date: Sun, 7 Jul 2024 01:50:17 GMT
Subject: RFR: 8312125: Refactor CDS enum class handling [v3]
In-Reply-To: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com>
References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com>
Message-ID: <DcNbL1qEcDW0knnYfhWkR7hhD5UFwFw1Ko-qcUmx64Y=.50b0c364-aed1-46c6-a32e-62a347195c05@github.com>

> Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change.

Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:

 - Merge branch 'master' into 8312125-refactor-cds-enum-class-handling
 - @calvinccheung comments
 - fixed copyright
 - 8312125: Refactor CDS enum class handling

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20013/files
  - new: https://git.openjdk.org/jdk/pull/20013/files/49dc109e..fcd987fb

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20013&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20013&range=01-02

  Stats: 5501 lines in 320 files changed: 2525 ins; 1662 del; 1314 mod
  Patch: https://git.openjdk.org/jdk/pull/20013.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20013/head:pull/20013

PR: https://git.openjdk.org/jdk/pull/20013

From iklam at openjdk.org  Sun Jul  7 04:23:31 2024
From: iklam at openjdk.org (Ioi Lam)
Date: Sun, 7 Jul 2024 04:23:31 GMT
Subject: RFR: 8312125: Refactor CDS enum class handling [v2]
In-Reply-To: <uy6CKlyFbVZ-yLe6Mklejpa6AmToFoMFuV_tL6VJ-f4=.0d123829-80e2-43f3-8d2c-c7ff8973cb0a@github.com>
References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com>
 <xxU06cCiROZP1kPcY6pWxomBLPTGPPnxrbc22c-K08E=.4c6d7a88-29d0-43eb-a917-4e90766ddfc9@github.com>
 <uy6CKlyFbVZ-yLe6Mklejpa6AmToFoMFuV_tL6VJ-f4=.0d123829-80e2-43f3-8d2c-c7ff8973cb0a@github.com>
Message-ID: <BuBqsQfsbxbDl8S0LldmtcVabdNX6WxUS9Bkf_G1lyM=.d2244af7-0f1d-4e65-bca5-eec48c93ae97@github.com>

On Wed, 3 Jul 2024 20:05:17 GMT, Matias Saavedra Silva <matsaave at openjdk.org> wrote:

>> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fixed copyright
>
> Thanks for the changes and clarification!

Thanks @matias9927 @calvinccheung for the review

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20013#issuecomment-2212317191

From duke at openjdk.org  Sun Jul  7 15:16:02 2024
From: duke at openjdk.org (ArsenyBochkarev)
Date: Sun, 7 Jul 2024 15:16:02 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics [v2]
In-Reply-To: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
Message-ID: <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com>

> Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK.

ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision:

 - Use t2 directly instead of temp2
 - Rename temp1 -> x0
 - Left a note on a side effect of generate_vle32_pack4

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19960/files
  - new: https://git.openjdk.org/jdk/pull/19960/files/02dc4e29..9f5c7831

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=00-01

  Stats: 20 lines in 1 file changed: 6 ins; 4 del; 10 mod
  Patch: https://git.openjdk.org/jdk/pull/19960.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19960/head:pull/19960

PR: https://git.openjdk.org/jdk/pull/19960

From duke at openjdk.org  Sun Jul  7 15:16:03 2024
From: duke at openjdk.org (ArsenyBochkarev)
Date: Sun, 7 Jul 2024 15:16:03 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics [v2]
In-Reply-To: <BWV1qtKhP0MV1SrYotttrc0LqUNWLVWjUqwF5ZQQPj0=.c586f3d6-d2f3-4325-a2c8-9de67f67b6ec@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
 <BWV1qtKhP0MV1SrYotttrc0LqUNWLVWjUqwF5ZQQPj0=.c586f3d6-d2f3-4325-a2c8-9de67f67b6ec@github.com>
Message-ID: <S-lKZVVFKzT6MT8LKqxPRIbzQjOP5i2FSAfE4qwWtdo=.96cb5827-835b-492c-8669-0109b7144a67@github.com>

On Mon, 1 Jul 2024 06:37:32 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:

>> ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision:
>> 
>>  - Use t2 directly instead of temp2
>>  - Rename temp1 -> x0
>>  - Left a note on a side effect of generate_vle32_pack4
>
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2348:
> 
>> 2346:     __ lwu(keylen, Address(key, arrayOopDesc::length_offset_in_bytes() - arrayOopDesc::base_offset_in_bytes(T_INT)));
>> 2347: 
>> 2348:     __ vsetivli(temp1, 4, Assembler::e32, Assembler::m1);
> 
> There is no use of `temp1` after, should we replace with `x0`?

Replaced, thanks!

> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2351:
> 
>> 2349:     __ vle32_v(res, from);
>> 2350:     __ vmv_v_x(vzero, zr);
>> 2351:     generate_vle32_pack4(key, vtmp1, vtmp2, vtmp3, vtmp4);
> 
> It would be great to add a quick comment mentioning the side effect on `key` of this function call. Same at https://github.com/openjdk/jdk/pull/19960/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R2355 and https://github.com/openjdk/jdk/pull/19960/files#diff-97f199af6d1c8c17b2fa4f50eb1bbc0081858cc59a899f32792a2d31f933ccc4R2359

Done

> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2362:
> 
>> 2360:     generate_rev8_pack2(vtmp1, vtmp2);
>> 2361: 
>> 2362:     __ mv(temp2, 44);
> 
> You could replace `temp2` by `t0`/`t1`/`t2`

Ok, done! I used `t2`

> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2448:
> 
>> 2446:     __ lwu(keylen, Address(key, arrayOopDesc::length_offset_in_bytes() - arrayOopDesc::base_offset_in_bytes(T_INT)));
>> 2447: 
>> 2448:     __ vsetivli(temp1, 4, Assembler::e32, Assembler::m1);
> 
> Same as for encrypt, there is no use of `temp1`, could you replace by `x0`?

Replaced

> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2459:
> 
>> 2457:     generate_aesdecrypt_round(res, vzero, vtmp1, vtmp2, vtmp3, vtmp4);
>> 2458: 
>> 2459:     generate_vle32_pack4(key, vtmp1, vtmp2, vtmp3, vtmp4);
> 
> Same as above, please add a comment on the side effect on `key`.

All done!

> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2466:
> 
>> 2464:     generate_rev8_pack2(vtmp1, vtmp2);
>> 2465: 
>> 2466:     __ mv(temp2, 44);
> 
> Same as above, could you use `t0`/`t1`/`t2` instead?

Done

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1667713599
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1667713593
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1667713596
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1667713589
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1667713582
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1667713587

From duke at openjdk.org  Mon Jul  8 05:30:40 2024
From: duke at openjdk.org (duke)
Date: Mon, 8 Jul 2024 05:30:40 GMT
Subject: Withdrawn: 8330171: Lazy W^X switch implementation
In-Reply-To: <9eymaXovxUNFdkAkzojFQP5trwl_yyY0jE2GzcMEjR4=.02ee2ef9-c476-4c7c-9e4a-e021425c38bc@github.com>
References: <9eymaXovxUNFdkAkzojFQP5trwl_yyY0jE2GzcMEjR4=.02ee2ef9-c476-4c7c-9e4a-e021425c38bc@github.com>
Message-ID: <Ed367SfEDzhRnhlB4mzXMj-ULsHh-0tK3oQ1orsj6aA=.da1f0f1b-37ec-477f-bf9d-27498c224052@github.com>

On Fri, 12 Apr 2024 14:40:05 GMT, Sergey Nazarkin <snazarki at openjdk.org> wrote:

> An alternative for preemptively switching the W^X thread mode on macOS with an AArch64 CPU. This implementation triggers the switch in response to the SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this approach, it is now feasible to eliminate all WX guards and avoid potentially costly operations. However, no significant improvement or degradation in performance has been observed.  Additionally, considering the issue with AsyncGetCallTrace, the patched JVM has been successfully operated with [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and [async-profiler](https://github.com/async-profiler/async-profiler). 
> 
> Additional testing:
> - [x] MacOS AArch64 server fastdebug *gtets*
> - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4*
> - [ ] Benchmarking
> 
> @apangin and @parttimenerd could you please check the patch on your scenarios??

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/18762

From tanksherman27 at gmail.com  Mon Jul  8 06:04:11 2024
From: tanksherman27 at gmail.com (Julian Waters)
Date: Mon, 8 Jul 2024 14:04:11 +0800
Subject: Where does VMError::print_native_stack and os::get_sender_for_C_frame
 load/use the frame pointer?
Message-ID: <CAP2b4GNCAh20cyz_JgF+kg34zzyNHznGSUB4_5_E0ot1ZJnwoA@mail.gmail.com>

Hi all,

I have a question with regards to os::get_sender_for_C_frame and
VMError::print_native_stack. In Windows specific code comments allude
to both needing the rbp register to be saved, which is why
VMError::print_native_stack
doesn't work on Windows since Microsoft Visual C doesn't save the frame
pointer, as stated:

/*
* Windows/x64 does not use stack frames the way expected by Java:
* [1] in most cases, there is no frame pointer. All locals are addressed via RSP
* [2] in rare cases, when alloca() is used, a frame pointer is used,
but this may
* not be RBP.
* See http://msdn.microsoft.com/en-us/library/ew5tede7.aspx
*
* So it's not possible to print the native stack using the
* while (...) {... fr = os::get_sender_for_C_frame(&fr); }
* loop in vmError.cpp. We need to roll our own loop.
*/

// VC++ does not save frame pointer on stack in optimized build. It
// can be turned off by -Oy-. If we really want to walk C frames,
// we can use the StackWalk() API.

I can't seem to find where rbp is loaded and used on platforms and
compilers that do save the frame pointer though. Eclipse cannot find
it through the vast collection of member methods inside the frame
class and related code. Do anyone by any chance know where the code that
loads and uses the frame pointer for os::get_sender_for_C_frame and
VMError::print_native_stack is located on such platforms?

best regards,
Julian

From xpeng at openjdk.org  Mon Jul  8 06:23:32 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Mon, 8 Jul 2024 06:23:32 GMT
Subject: RFR: 8334231: Optimize MethodData layout
In-Reply-To: <qBONEcrgJyYqSsBdiDRbA9NeV8sC8uXKRY2zbpDE8Fc=.1dfd2cbc-5982-4958-b7cb-313d0c52139a@github.com>
References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
 <qBONEcrgJyYqSsBdiDRbA9NeV8sC8uXKRY2zbpDE8Fc=.1dfd2cbc-5982-4958-b7cb-313d0c52139a@github.com>
Message-ID: <UpB5_GiY7tF1AcVV86gvr2GY3RCIoYwRpgKEPMlCmco=.efdb7ab9-e11b-49b1-b313-3ba12cc738d4@github.com>

On Thu, 4 Jul 2024 12:41:15 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Hi all,
>>      This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of  MethodData. Here is the original layout from `pahole`:
>> 
>> class MethodData : public Metadata {
>> public:
>> 
>> 	/* class Metadata            <ancestor>; */      /*     0     0 */
>> 
>> 	/* XXX 8 bytes hole, try to pack */
>> 
>> 	class Method *             _method;              /*     8     8 */
>> 	int                        _size;                /*    16     4 */
>> 	int                        _hint_di;             /*    20     4 */
>> 	class Mutex               _extra_data_lock;      /*    24   104 */
>> 	/* --- cacheline 2 boundary (128 bytes) --- */
>> 	class CompilerCounters    _compiler_counters;    /*   128    80 */
>> 	/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
>> 	intx                       _eflags;              /*   208     8 */
>> 	intx                       _arg_local;           /*   216     8 */
>> 	intx                       _arg_stack;           /*   224     8 */
>> 	intx                       _arg_returned;        /*   232     8 */
>> 	int                        _creation_mileage;    /*   240     4 */
>> 	class InvocationCounter   _invocation_counter;   /*   244     4 */
>> 	class InvocationCounter   _backedge_counter;     /*   248     4 */
>> 	int                        _invocation_counter_start; /*   252     4 */
>> 	/* --- cacheline 4 boundary (256 bytes) --- */
>> 	int                        _backedge_counter_start; /*   256     4 */
>> 	uint                       _tenure_traps;        /*   260     4 */
>> 	int                        _invoke_mask;         /*   264     4 */
>> 	int                        _backedge_mask;       /*   268     4 */
>> 	short int                  _num_loops;           /*   272     2 */
>> 	short int                  _num_blocks;          /*   274     2 */
>> 	enum WouldProfile          _would_profile;       /*   276     4 */
>> 	int                        _jvmci_ir_size;       /*   280     4 */
>> 
>> 	/* XXX 4 bytes hole, try to pack */
>> 
>> 	class FailedSpeculation *  _failed_speculations; /*   288     8 */
>> 	int                        _data_size;           /*   296     4 */
>> 	int                        _parameters_type_data_di; /*   300     4 */
>> 	int                        _exception_handler_data_di; /*   304     4 */
>> 
>> 	/* XXX 4 bytes hole, try to pack */
>> 
>> 	intptr_t                   _data[1];             /*   312     8 */
>> 
>> 	/* size: 320, cachelin...
>
> I don't think these "Optimize XXX layouts" should be marked as trivial, and I worry that we overuse the trivial rule. It circumvents the second reviewer as well as the 24hr rule, which both are necessary safeties. Especially in the wake of the xz fiasco.
> 
> "Trivial" is usually reserved for either changes that need very quick reaction (e.g. reasonably simple build errors that require immediate fixing because everyone's CI is standing still) or things that are painfully obvious in being trivial, e.g. comment changes. Memory layout changes are neither urgent nor really trivial enough IMHO.

Thanks @tstuefe @chhagedorn @dholmes-ora for the reviews and discussion about the "trivial" topic, I agree that this may not be trivial and I'll be cautious when declare PR is trivial. Honestly I don't know if there is explicit rules for trivial/non-trivial, I felt it was very simple change swapping the locations of two fields and should be qualified  as trivial(subjective judgement). 
I'm ok to remove the declaration about trivial. For the PR itself, there are already two reviewer approvals, and we have probably got enough eyes on it, if you don't have other concerns, I'll integrate it next week @tstuefe

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20019#issuecomment-2213124652

From dholmes at openjdk.org  Mon Jul  8 06:41:33 2024
From: dholmes at openjdk.org (David Holmes)
Date: Mon, 8 Jul 2024 06:41:33 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v3]
In-Reply-To: <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
 <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com>
Message-ID: <dUqU_Wdq2TZmik1puxHRebJyRI6fFugtaOPW9saIpfg=.1da6f230-c9a8-4ff9-8fdb-1e6d63590076@github.com>

On Fri, 5 Jul 2024 15:01:05 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens.
>> 
>> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details).
>> 
>> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. 
>> 
>> I tested the patch by running it through mach5 tiers 1-6.
>> 
>> Thanks,
>> Patricio
>
> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   use DEBUG_ONLY on _used declaration

Still good.

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2162398031

From david.holmes at oracle.com  Mon Jul  8 07:15:18 2024
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 8 Jul 2024 17:15:18 +1000
Subject: Where does VMError::print_native_stack and
 os::get_sender_for_C_frame load/use the frame pointer?
In-Reply-To: <CAP2b4GNCAh20cyz_JgF+kg34zzyNHznGSUB4_5_E0ot1ZJnwoA@mail.gmail.com>
References: <CAP2b4GNCAh20cyz_JgF+kg34zzyNHznGSUB4_5_E0ot1ZJnwoA@mail.gmail.com>
Message-ID: <e8667640-8e29-468d-8a51-e6f996921885@oracle.com>

Hi Julian,

On 8/07/2024 4:04 pm, Julian Waters wrote:
> Hi all,
> 
> I have a question with regards to os::get_sender_for_C_frame and
> VMError::print_native_stack. In Windows specific code comments allude
> to both needing the rbp register to be saved, which is why
> VMError::print_native_stack
> doesn't work on Windows since Microsoft Visual C doesn't save the frame
> pointer, as stated:
> 
> /*
> * Windows/x64 does not use stack frames the way expected by Java:
> * [1] in most cases, there is no frame pointer. All locals are addressed via RSP
> * [2] in rare cases, when alloca() is used, a frame pointer is used,
> but this may
> * not be RBP.
> * See http://msdn.microsoft.com/en-us/library/ew5tede7.aspx
> *
> * So it's not possible to print the native stack using the
> * while (...) {... fr = os::get_sender_for_C_frame(&fr); }
> * loop in vmError.cpp. We need to roll our own loop.
> */
> 
> // VC++ does not save frame pointer on stack in optimized build. It
> // can be turned off by -Oy-. If we really want to walk C frames,
> // we can use the StackWalk() API.
> 
> I can't seem to find where rbp is loaded and used on platforms and
> compilers that do save the frame pointer though. Eclipse cannot find
> it through the vast collection of member methods inside the frame
> class and related code. Do anyone by any chance know where the code that
> loads and uses the frame pointer for os::get_sender_for_C_frame and
> VMError::print_native_stack is located on such platforms?

Isn't this part of the ABI for these platforms, so the C/C++ compiler 
maintains them. ??

David
-----

> best regards,
> Julian

From mli at openjdk.org  Mon Jul  8 07:55:36 2024
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 8 Jul 2024 07:55:36 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <_8D8tMevrVR00rHIHSQRHnfBxjoApH7UcHH-1HRl2mo=.4866b276-5856-43b2-927e-86c2f4e9d60a@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
 <_8D8tMevrVR00rHIHSQRHnfBxjoApH7UcHH-1HRl2mo=.4866b276-5856-43b2-927e-86c2f4e9d60a@github.com>
Message-ID: <-UPo0QHxKRA0bBqkX8prOTaMedIwLVygqzwvABbA4mY=.2ebd0262-a275-4d80-94a7-77225b4d54d7@github.com>

On Fri, 5 Jul 2024 17:44:14 GMT, Andrew Haley <aph at openjdk.org> wrote:

> I also had problems with javac running out of heap space, which was very odd. I fixed it with this:
> 
> ```
> diff --git a/make/autoconf/boot-jdk.m4 b/make/autoconf/boot-jdk.m4
> index 8d272c28ad5..617ccfd8fff 100644
> --- a/make/autoconf/boot-jdk.m4
> +++ b/make/autoconf/boot-jdk.m4
> @@ -470,7 +470,7 @@ AC_DEFUN_ONCE([BOOTJDK_SETUP_BOOT_JDK_ARGUMENTS],
>    # Maximum amount of heap memory.
>    JVM_HEAP_LIMIT_32="768"
>    # Running a 64 bit JVM allows for and requires a bigger heap
> -  JVM_HEAP_LIMIT_64="1600"
> +  JVM_HEAP_LIMIT_64="6400"
> ```

For the command to run the tests, I use `make test TEST=org.openjdk.bench.jdk.incubator.vector.operation.Float" MICRO="FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs"`.
I just copy and run Double/Float benchmark tests (without copying other tests under `org.openjdk.bench.jdk.incubator.vector.operation`), in which way I think it will not have this OOM issue.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2213280630

From tanksherman27 at gmail.com  Mon Jul  8 07:59:20 2024
From: tanksherman27 at gmail.com (Julian Waters)
Date: Mon, 8 Jul 2024 15:59:20 +0800
Subject: Where does VMError::print_native_stack and
 os::get_sender_for_C_frame load/use the frame pointer?
Message-ID: <CAP2b4GNPq3Fr3X=v=_8nFLwYTbC7e=0N5Xd2i2jOvXfqqftCrQ@mail.gmail.com>

Hi David,

Ah, I think you misunderstood me, I'm aware that the frame pointer is
saved as required by the compiler (With the exception of the Microsoft
compiler, which doesn't save it at all). What I meant was that the
comments in Windows code imply that VMError::print_native_stack and
os::get_sender_for_C_frame need to use the frame pointer, yet I can't
seem to find where or how either of them obtain the frame pointer for
whatever they use it for on platforms and compilers where the frame
pointer is saved (For instance, on Linux), whether through handwritten
assembly code or some other means. It follows that if they need to use
the frame pointer, then they must grab it from somewhere, after all

best regards,
Julian

From aboldtch at openjdk.org  Mon Jul  8 08:25:02 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 8 Jul 2024 08:25:02 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping
Message-ID: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>

When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.

This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.

A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 

This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 

Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.

# Cleanups

Cleaned up displaced header usage for:
  * BasicLock
    * Contains some Zero changes
    * Renames one exported JVMCI field
  * ObjectMonitor
    * Updates comments and tests consistencies

# Refactoring

`ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.

The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 

_There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._

# LightweightSynchronizer

Working on adapting and incorporating the following section as a comment in the source code

## Fast Locking

  CAS on locking bits in markWord. 
  0b00 (Fast Locked) <--> 0b01 (Unlocked)

  When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.

  If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inflated locking is performed.

### Fast Lock Spinning (UseObjectMonitorTable)

  When a thread fails fast locking when a monitor is not yet inflated, it will spin on the markWord using a exponential backoff scheme. The thread will attempt the fast lock CAS and then SpinWait() for some time, doubling with every failed attempt, up to a maximum number of attempts. There is a diagnostic VM option LightweightFastLockingSpins which can be used to tune this value. The behavior of SpinWait() can be hardware dependent.

  A future improvement may be to adapt this spinning limit to observed behavior. Which would automatically adapt to the different hardware behavior of SpinWait(). 

## Inflated Locking

  Inflated locking means that a ObjectMonitor is associated with the object and is used for locking instead of the locking bits in the markWord.

## Inflated Locking without table (!UseObjectMonitorTable)

  An inflating thread will create a ObjectMonitor and CAS the ObjectMonitor* into the markWord along with the 0b10 (Inflated) lock bits. If the transition of the lock bits is from 0b00 (Fast Locked) the ObjectMonitor must be published with an anonymous owner (setting _owner to ANONYMOUS_OWNER). If the transition of the lock bits is from 0b00 (Unlocked) the ObjectMonitor is published with no owner.

  When encountering an ObjectMonitor with an anonymous owner the thread checks its lock stack to see if it is the owner, in which case it removes the object from its lock stack and sets itself as the owner of the ObjectMonitor along with fixing the recursion level to correspond to the number of removed lock stack entires.

## Inflated Locking with table (UseObjectMonitorTable)

  Because publishing the ObjectMonitor* and signaling that a object's monitor is inflated is not atomic, more care must be taken (in the presence of deflation) so that all threads agree on which ObjectMonitor* to use.

  When encountering an ObjectMonitor with an anonymous owner the thread checks its lock stack to see if it is the owner, in which case it removes the object from its lock stack and sets itself as the owner of the ObjectMonitor along with fixing the recursion level to correspond to the number of removed lock stack entires.

  All complications arise from deflation, or the process of disassociating an ObjectMonitor from its Java Object. So first the mechanism used for deflation is explained. Followed by retrieval and creation of ObjectMonitors.

### Deflation

  An ObjectMonitor can only be deflated if it has no owner, its queues are empty and no thread is in a scope where it has incremented and checked the contentions reference counter.

  The interactions between deflation and wait is handled by having the owner and wait queue entry overlap to blocks out deflation; the wait queue entry is protected by a waiters reference counter which is only modified by the waiters while holding the monitor, incremented before exiting the monitor and decremented after reentering the monitor.

  For enter and exit where the deflator may observe empty queues and no owner a two step mechanism is used to synchronize deflation with concurrently locking threads; deflation is synchronized using the contentions reference counter.

  In the text below we refer to "holding the contentions reference counter". This means that a thread has incremented the contentions reference counter and verified that it is not negative.
  ```c++
    if (Atomic::fetch_and_add(&monitor->_contentions, 1) >= 0) {
      // holding the contentions reference counter
    }
    Atomic::decrement(&monitor->_contentions);
  ```

#### Deflation protocol

  The first step for the deflator is to try and CAS the owner from no owner to a special marker (DEFLATER_MARKER). If this is successful it blocks any entering thread from successfully installing themselves as the owner and causes compiled code to take a slow path and call into the runtime. 

  The second step for the deflator is to check waiters reference counter and if it is 0 try CAS the contentions reference counter from 0 to a large negative value (INT_MIN). If this succeeds the monitor is deflated.

  The deflator does not have to check the entry queues because every thread on the entry queues must have either hold the contentions reference counter, or incremented the waiters reference counter, in the case they were moved from the wait queue to the entry queues by a notify. The deflator check the waiters reference counter, with the memory ordering of Waiter: { increment waiters reference counter; release owner }, Deflator: { acquire owner; check waiters reference counter }. All threads on the entry queues or wait queue invariantly holds the contentions reference counter or the waiters reference counter.

#### Deflation cleanup

  If deflation succeeds, locking bits are then transitioned back to 0b01 (Unlocked). With UseObjectMonitorTable it is required that this is done by the deflator, or it could lead to ABA problems in the locking bits. Without the table the whole ObjectMonitor* is part of the markWord transition, with its pointer being phased out of the system with a handshake, making every value distinguishable and avoiding ABA issues. 

  For UseObjectMonitorTable the deflated monitor is also removed from the table. This is done after transitioning the markWord to allow concurrently entering threads to fast lock on the object while the monitor is being removed from the hash table.

  If deflation fails after the marker (DEFLATER_MARKER) has been CASed into the owner field the owner must be restored. From the deflation threads point of view it is as simple as CASing from the marker to no owner. However to not have all threads depend on the deflation thread making progress here we allow any thread to CAS from the marker if that thread has both incremented and checked the contentions counter. This thread has now effectively canceled the deflation, but it is important that the deflator observes this fact, we do this by forgetting to decrement the contentions counter. The effect is that the contentions CAS will fail, which will force the deflator to try and restore the owner, but this will also fail because it got canceled. So the deflator decrements the contentions counter instead on behalf of the canceling thread to balance the reference counting. (Currently this is implemented by doing a +1 +1 -1 reference count on the locking thread, but a simple only +1 would s
 uffice).

### Retrieve ObjectMonitor

#### HashTable

  Maintains a mapping between Java Objects and ObjectMonitors. Lookups are done via the objects identity_hash. If the hash table contains an ObjectMonitor for a specific object then that ObjectMonitor is used for locking unless it is being deflated. 

  Only deflation removes (not dead) entries inside the HashTable.

#### ThreadLocal Cache (UseObjectMonitorTable)

  The most recently locked ObjectMonitors by a thread are cached in that thread's local storage. These are used to elide hash table lookups. These caches uses raw oops to make cache lookups trivial. However this requires special handling of the cache at safepoints. The caches are cleared when a safepoint is triggered (instead of letting the gc visit them), this to avoid keeping cache entries as gc roots.

  These cache entires may become deflated, but locking on such a monitor still participates in the normal deflation protocol. Because these entries are cleared during a safepoint, the handshake performed by monitor deflation to phase out ObjectMonitor* from the system will also phase these out.

#### StackLocal Cache

  Each monitorenter has a corresponding BasicLock entry on the stack. Each successful inflated monitorenter saves the ObjectMonitor* inside this BasicLock entry and retrieves it when performing the corresponding monitorexit.

  This means it is important that the BasicLock entry is always initialized to a known state (nullptr is used). 

  The RAII object class CacheSetter is used to ensure that the BasicLock gets initialized before leaving the runtime code, and that both caches gets updated correctly. (Only once, with the same locked ObjectMonitor).

  The cache entries are set when a monitor is entered and never used again after a that monitored has been exited. So there are no interactions with deflation here. Similarly these caches does not track the associated oop, but rely on the fact that the same BasicLock data created for a monitorenter is used when executing the corresponding monitorexit.

### Creating ObjectMonitor

  If retrieval of the ObjectMonitor fails, because there is no ObjectMonitor, either because this is the first time inflating or the ObjectMonitor has been deflated a new ObjectMonitor must be created and associated with the object.

  The inflating thread will then attempt to insert a newly created ObjectMonitor in the hash table. The important invariant is that any ObjectMonitor inserted must have an anonymous owner (setting _owner to ANONYMOUS_OWNER). 

  This solves the issue of not being able to atomically inserting the ObjectMonitor in the hash table, and transitioning the markWord to 0b10 (Inflated). We instead have all inflating threads insert an identical anonymously owned ObjectMonitor in the table and then decide ownership based on how the markWord is transitioned to 0b10 (Inflated). Note: Only one ObjectMonitor can be inserted.

  This also has the effect of blocking deflation on a newly inserted ObjectMonitor, until the contentions reference counter can be incremented. The contentions reference counter is held while transitioning the markWord to block out deflation.

  * If a thread observes 0b10 (Inflated)
    * If the current thread is the thread that fast locked, take ownership.
      Update ObjectMonitor _recursions based on fast locked recursions.
      Call ObjectMonitor::enter(current);
    * Otherwise Some other thread is the owner, and will claim ownership.
      Call ObjectMonitor::enter(current); 
  * If a thread succeeds with the CAS to 0b10 (Inflated)
    * From 0b00 (Fast Locked)
      * If the current thread is the thread that fast locked, take ownership.
        Update ObjectMonitor _recursions based on fast locked recursions.
        Call ObjectMonitor::enter(current);
      * Otherwise Some other thread is the owner, and will claim ownership.
        Call ObjectMonitor::enter(current); 
    * From 0b01 (Unlocked)
      * Claim ownership, no ObjectMonitor::enter is required.
  * If a thread fails the CAS reload markWord and retry

### Un-contended Inflated Locking

  CAS on _owner field in ObjectMonitor.
  JavaThread* (Locked By Thread) <--> nullptr (Unlocked)

### Contended Inflated Locking

  Blocks out deflation.

  Spin CAS on _owner field in ObjectMonitor.
  JavaThread* (Locked By Thread) <--> nullptr (Unlocked)

  Details in ObjectMonitor.hpp

### HashTable Resizing and Cleanup

  Resizing is currently handled with the similar logic to what the string and symbol table uses. And is delegated to the ServiceThread.

  The goal is to eventually this to deflation thread, to allow for better interactions with the deflation cycles, making it possible to also shrink the table. But this will be done incrementally as a separate enhancement. The ServiceThread is currently used to deal with the fact that we currently allow the deflation thread to be turned off via JVM options.

  Cleanup is mostly handled by the the deflator which actively removes deflated monitors, which includes monitors for dead objects. However we allow any thread to remove dead objects' ObjectMonitor* associations. But actual memory reclamation of the ObjectMonitor is always handled by the deflator.

  The table is currently initialized before `init_globals`, as such the max size of the table which is based on `MaxHeapSize` may be incorrect because it is not yet finalized.

-------------

Commit messages:
 - 8315884: New Object to ObjectMonitor mapping

Changes: https://git.openjdk.org/jdk/pull/20067/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8315884
  Stats: 3613 lines in 70 files changed: 2700 ins; 313 del; 600 mod
  Patch: https://git.openjdk.org/jdk/pull/20067.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067

PR: https://git.openjdk.org/jdk/pull/20067

From alanb at openjdk.org  Mon Jul  8 08:30:33 2024
From: alanb at openjdk.org (Alan Bateman)
Date: Mon, 8 Jul 2024 08:30:33 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom
In-Reply-To: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
Message-ID: <9E_nLqk5ThBlynnp1khLmG1iislzXOY8eH0VV3J1itA=.a776501d-b4a3-4a2f-8162-eb1c536a7839@github.com>

On Wed, 3 Jul 2024 19:54:46 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler.
> 
> I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3.
> 
> Thanks,
> Patricio

test/jdk/java/lang/Thread/virtual/ThreadYield.java line 29:

> 27:  * @summary Test that Thread.yield loop polls for safepoints
> 28:  * @requires vm.continuations
> 29:  * @modules java.base/java.lang:+open

I assume the `@modules` isn't needed as this test doesn't need to open java.lang.

test/jdk/java/lang/Thread/virtual/ThreadYield.java line 47:

> 45: import static org.junit.jupiter.api.Assertions.*;
> 46: 
> 47: class ThreadYield {

This isn't a unit test for Thread.yield so I think it would be better to rename to something specific like ThreadYieldPollsSafepoint (or better name).

test/jdk/java/lang/Thread/virtual/ThreadYield.java line 49:

> 47: class ThreadYield {
> 48:     static void foo(AtomicBoolean done) {
> 49:         synchronized (done) {

When this test makes it to the loom repo then we'll need to change it to pin by other means.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1668202884
PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1668202828
PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1668207758

From aph at openjdk.org  Mon Jul  8 08:46:36 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 8 Jul 2024 08:46:36 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
Message-ID: <u9kWsTkbNZZ5_D9E9EENwp-S5xBjQYdAXr4LHDI9VeU=.99f598ab-ca7d-40bd-adc0-b2f082ad6c59@github.com>

On Mon, 1 Jul 2024 16:54:55 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294).
>> 
>> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time.
>> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
>> 
>> Besides of the code changes, one important task is to handle the legal process.
>> 
>> Thanks!
>> 
>> ## Test
>> tests:
>> * test/jdk/jdk/incubator/vector/
>> * test/hotspot/jtreg/compiler/vectorapi/
>> 
>> options:
>> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:+EnableVectorSupport -XX:-UseVectorStubs
>> 
>> ## Performance
>> 
>> ### Options
>> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
>> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
>> 
>> ### Float
>> data
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0)
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408
>> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876
>> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936
>> Float128Vector.ATAN2 | 1024 | thrpt | 10 | 0.021 | ops/ms | 135.088 | 32.449 | 4.163 | 135.721 | 32.579 | 4.166
>> Float128Vector.CBRT | 1024 | thrpt | 10 | 0.004 | ops/ms | 114.547 | 39.517 | 2....
>
> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits:
> 
>  - Merge branch 'master' into sleef-aarch64-integrate-source
>  - merge master
>  - sleef 3.6.1 for riscv
>  - sleef 3.6.1
>  - update header files for arm
>  - add inline header file for riscv64
>  - remove notes about sleef changes
>  - fix performance issue
>  - disable unused-function warnings; add log msg
>  - minor
>  - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863

That doesn't work.


Running tests using MICRO control variable 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
Unknown test selection: 'org.openjdk.bench.jdk.incubator.vector.operation.Float'

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2213389656

From david.holmes at oracle.com  Mon Jul  8 08:48:52 2024
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 8 Jul 2024 18:48:52 +1000
Subject: Where does VMError::print_native_stack and
 os::get_sender_for_C_frame load/use the frame pointer?
In-Reply-To: <CAP2b4GNPq3Fr3X=v=_8nFLwYTbC7e=0N5Xd2i2jOvXfqqftCrQ@mail.gmail.com>
References: <CAP2b4GNPq3Fr3X=v=_8nFLwYTbC7e=0N5Xd2i2jOvXfqqftCrQ@mail.gmail.com>
Message-ID: <cb2380cc-11ad-4f89-a0f2-92281ad7d5a0@oracle.com>

On 8/07/2024 5:59 pm, Julian Waters wrote:
> Hi David,
> 
> Ah, I think you misunderstood me, I'm aware that the frame pointer is
> saved as required by the compiler (With the exception of the Microsoft
> compiler, which doesn't save it at all). What I meant was that the
> comments in Windows code imply that VMError::print_native_stack and
> os::get_sender_for_C_frame need to use the frame pointer, yet I can't
> seem to find where or how either of them obtain the frame pointer for
> whatever they use it for on platforms and compilers where the frame
> pointer is saved (For instance, on Linux), whether through handwritten
> assembly code or some other means. It follows that if they need to use
> the frame pointer, then they must grab it from somewhere, after all

Ah sorry. AFAICS we just create the frame() objects and wallk the stack 
via those. We use fetch_frame_from_context to kick things off in the 
case of a crash.

David

> best regards,
> Julian

From mli at openjdk.org  Mon Jul  8 09:27:34 2024
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 8 Jul 2024 09:27:34 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <u9kWsTkbNZZ5_D9E9EENwp-S5xBjQYdAXr4LHDI9VeU=.99f598ab-ca7d-40bd-adc0-b2f082ad6c59@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
 <u9kWsTkbNZZ5_D9E9EENwp-S5xBjQYdAXr4LHDI9VeU=.99f598ab-ca7d-40bd-adc0-b2f082ad6c59@github.com>
Message-ID: <GMhDa917-zWp4z0VZGYxdTruAnjh3gmrt-qhFQ9AS1s=.be41e581-a79c-4f75-bb61-f5a14db05a88@github.com>

On Mon, 8 Jul 2024 08:43:34 GMT, Andrew Haley <aph at openjdk.org> wrote:

> That doesn't work.
> 
> ```
> Running tests using MICRO control variable 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
> Unknown test selection: 'org.openjdk.bench.jdk.incubator.vector.operation.Float'
> ```

I think by copying the Float*.java and dependent files under test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector repo can resolve the issue.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2213501489

From duke at openjdk.org  Mon Jul  8 09:42:31 2024
From: duke at openjdk.org (Thomas Wuerthinger)
Date: Mon, 8 Jul 2024 09:42:31 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping
In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
Message-ID: <LjKHmY3_Qbp5xirMG8E6xo-Dqv2XLblJaWmGKw2-MF4=.2765e99c-dc5f-42fb-baad-56ae8295dba4@github.com>

On Mon, 8 Jul 2024 08:18:42 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
> 
> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
> 
> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
> 
> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
> 
> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
> 
> # Cleanups
> 
> Cleaned up displaced header usage for:
>   * BasicLock
>     * Contains some Zero changes
>     * Renames one exported JVMCI field
>   * ObjectMonitor
>     * Updates comments and tests consistencies
> 
> # Refactoring
> 
> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
> 
> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
> 
> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
> 
> # LightweightSynchronizer
> 
> Working on adapting and incorporating the following section as a comment in the source code
> 
> ## Fast Locking
> 
>   CAS on locking bits in markWord. 
>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
> 
>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
> 
>   If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf...

Is this change expected to require JVMCI and/or Graal JIT changes?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2213534062

From aboldtch at openjdk.org  Mon Jul  8 10:14:32 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 8 Jul 2024 10:14:32 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping
In-Reply-To: <LjKHmY3_Qbp5xirMG8E6xo-Dqv2XLblJaWmGKw2-MF4=.2765e99c-dc5f-42fb-baad-56ae8295dba4@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <LjKHmY3_Qbp5xirMG8E6xo-Dqv2XLblJaWmGKw2-MF4=.2765e99c-dc5f-42fb-baad-56ae8295dba4@github.com>
Message-ID: <YqFykA6XLEjqokNUW4xO7Rlu9yv7lbvuWRKtSr8uEio=.6746b15f-26ce-4993-a1c0-bb56a6ed3147@github.com>

On Mon, 8 Jul 2024 09:39:32 GMT, Thomas Wuerthinger <duke at openjdk.org> wrote:

> Is this change expected to require JVMCI and/or Graal JIT changes?

Support for `UseObjectMonitorTable` would require changes to Graal JIT. (`UseObjectMonitorTable` is off by default). 
Minimal support would be to call into the VM for inflated monitors. (Similarly to what this patch does for C2 for none x86 / aarch64 platforms).

For starting the VM normally without `UseObjectMonitorTable` no semantic change is required. All locking modes and VM invariants w.r.t. locking are the same.

As mentioned this patch contains a refactoring which renames one exported `JVMCI` symbol which I suspect should only be used by Graal JIT for `LM_LEGACY`. As such the Graal JIT needs to be updated to use this new symbol name.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2213602121

From shade at openjdk.org  Mon Jul  8 10:35:36 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 8 Jul 2024 10:35:36 GMT
Subject: RFR: 8334231: Optimize MethodData layout
In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
Message-ID: <QdFHYY07hwmxrteM-UEmCUepyqyWWi4IP7uEBsqFBRs=.790c912c-009b-44f3-994f-93082ec22afc@github.com>

On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> Hi all,
>      This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of  MethodData. Here is the original layout from `pahole`:
> 
> class MethodData : public Metadata {
> public:
> 
> 	/* class Metadata            <ancestor>; */      /*     0     0 */
> 
> 	/* XXX 8 bytes hole, try to pack */
> 
> 	class Method *             _method;              /*     8     8 */
> 	int                        _size;                /*    16     4 */
> 	int                        _hint_di;             /*    20     4 */
> 	class Mutex               _extra_data_lock;      /*    24   104 */
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	class CompilerCounters    _compiler_counters;    /*   128    80 */
> 	/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
> 	intx                       _eflags;              /*   208     8 */
> 	intx                       _arg_local;           /*   216     8 */
> 	intx                       _arg_stack;           /*   224     8 */
> 	intx                       _arg_returned;        /*   232     8 */
> 	int                        _creation_mileage;    /*   240     4 */
> 	class InvocationCounter   _invocation_counter;   /*   244     4 */
> 	class InvocationCounter   _backedge_counter;     /*   248     4 */
> 	int                        _invocation_counter_start; /*   252     4 */
> 	/* --- cacheline 4 boundary (256 bytes) --- */
> 	int                        _backedge_counter_start; /*   256     4 */
> 	uint                       _tenure_traps;        /*   260     4 */
> 	int                        _invoke_mask;         /*   264     4 */
> 	int                        _backedge_mask;       /*   268     4 */
> 	short int                  _num_loops;           /*   272     2 */
> 	short int                  _num_blocks;          /*   274     2 */
> 	enum WouldProfile          _would_profile;       /*   276     4 */
> 	int                        _jvmci_ir_size;       /*   280     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	class FailedSpeculation *  _failed_speculations; /*   288     8 */
> 	int                        _data_size;           /*   296     4 */
> 	int                        _parameters_type_data_di; /*   300     4 */
> 	int                        _exception_handler_data_di; /*   304     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	intptr_t                   _data[1];             /*   312     8 */
> 
> 	/* size: 320, cachelines: 5, members: 27 */
> 	/* sum members: 304, holes: 3, sum holes: 16 */
> };
> 
> 
> There are 3 holes ...

This looks fine to me. I think Thomas' comment was generally about handling the trivial PRs, which does not hold this PR from integration. The patch is simple, there had been enough eyes on this, and so we can just integrate.

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20019#pullrequestreview-2162912408

From xpeng at openjdk.org  Mon Jul  8 10:35:36 2024
From: xpeng at openjdk.org (Xiaolong Peng)
Date: Mon, 8 Jul 2024 10:35:36 GMT
Subject: Integrated: 8334231: Optimize MethodData layout
In-Reply-To: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
References: <LQiX4CeXNNdQNrc_ig6dqqBxbLdMVaFQkW4hB_9WpBY=.38d6d8ec-0dc7-4cf1-b957-4529938fd709@github.com>
Message-ID: <r_mjQmOGKGHi0of5r2BdPN0KHldq-g4IaaCADyel1DA=.5dd714ba-8689-4a5c-af97-9c0f0320c51b@github.com>

On Thu, 4 Jul 2024 00:08:35 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> Hi all,
>      This PR is a part of https://bugs.openjdk.org/browse/JDK-8334227 to optimize Hotspot C++ class layouts, this one is for the layout of  MethodData. Here is the original layout from `pahole`:
> 
> class MethodData : public Metadata {
> public:
> 
> 	/* class Metadata            <ancestor>; */      /*     0     0 */
> 
> 	/* XXX 8 bytes hole, try to pack */
> 
> 	class Method *             _method;              /*     8     8 */
> 	int                        _size;                /*    16     4 */
> 	int                        _hint_di;             /*    20     4 */
> 	class Mutex               _extra_data_lock;      /*    24   104 */
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	class CompilerCounters    _compiler_counters;    /*   128    80 */
> 	/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
> 	intx                       _eflags;              /*   208     8 */
> 	intx                       _arg_local;           /*   216     8 */
> 	intx                       _arg_stack;           /*   224     8 */
> 	intx                       _arg_returned;        /*   232     8 */
> 	int                        _creation_mileage;    /*   240     4 */
> 	class InvocationCounter   _invocation_counter;   /*   244     4 */
> 	class InvocationCounter   _backedge_counter;     /*   248     4 */
> 	int                        _invocation_counter_start; /*   252     4 */
> 	/* --- cacheline 4 boundary (256 bytes) --- */
> 	int                        _backedge_counter_start; /*   256     4 */
> 	uint                       _tenure_traps;        /*   260     4 */
> 	int                        _invoke_mask;         /*   264     4 */
> 	int                        _backedge_mask;       /*   268     4 */
> 	short int                  _num_loops;           /*   272     2 */
> 	short int                  _num_blocks;          /*   274     2 */
> 	enum WouldProfile          _would_profile;       /*   276     4 */
> 	int                        _jvmci_ir_size;       /*   280     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	class FailedSpeculation *  _failed_speculations; /*   288     8 */
> 	int                        _data_size;           /*   296     4 */
> 	int                        _parameters_type_data_di; /*   300     4 */
> 	int                        _exception_handler_data_di; /*   304     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	intptr_t                   _data[1];             /*   312     8 */
> 
> 	/* size: 320, cachelines: 5, members: 27 */
> 	/* sum members: 304, holes: 3, sum holes: 16 */
> };
> 
> 
> There are 3 holes ...

This pull request has now been integrated.

Changeset: c5a668bb
Author:    Xiaolong Peng <xpeng at openjdk.org>
Committer: Aleksey Shipilev <shade at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/c5a668bb653feb3408a9efa3274ceabf9f01a2c7
Stats:     3 lines in 1 file changed: 1 ins; 2 del; 0 mod

8334231: Optimize MethodData layout

Reviewed-by: dholmes, chagedorn, shade

-------------

PR: https://git.openjdk.org/jdk/pull/20019

From duke at openjdk.org  Mon Jul  8 11:01:32 2024
From: duke at openjdk.org (Thomas Wuerthinger)
Date: Mon, 8 Jul 2024 11:01:32 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping
In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
Message-ID: <4BqqOjbfNOV6NjtPe-hIf-98N8kT6ce_FBCuQ-vqBBY=.6e02875c-908f-43f0-8d66-ba4f5b01d488@github.com>

On Mon, 8 Jul 2024 08:18:42 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
> 
> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
> 
> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
> 
> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
> 
> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
> 
> # Cleanups
> 
> Cleaned up displaced header usage for:
>   * BasicLock
>     * Contains some Zero changes
>     * Renames one exported JVMCI field
>   * ObjectMonitor
>     * Updates comments and tests consistencies
> 
> # Refactoring
> 
> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
> 
> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
> 
> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
> 
> # LightweightSynchronizer
> 
> Working on adapting and incorporating the following section as a comment in the source code
> 
> ## Fast Locking
> 
>   CAS on locking bits in markWord. 
>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
> 
>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
> 
>   If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf...

OK. Will there be a CSR or JEP for this? When do you approximately expect this to land in main line?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2213689308

From aboldtch at openjdk.org  Mon Jul  8 11:55:33 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 8 Jul 2024 11:55:33 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping
In-Reply-To: <4BqqOjbfNOV6NjtPe-hIf-98N8kT6ce_FBCuQ-vqBBY=.6e02875c-908f-43f0-8d66-ba4f5b01d488@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <4BqqOjbfNOV6NjtPe-hIf-98N8kT6ce_FBCuQ-vqBBY=.6e02875c-908f-43f0-8d66-ba4f5b01d488@github.com>
Message-ID: <RdSvPsChCFViQ2aqHZFfZyF-2G1TWV0smRdto9q7jmY=.bfb714a5-5882-43bf-96a0-5260024454b7@github.com>

On Mon, 8 Jul 2024 10:58:29 GMT, Thomas Wuerthinger <duke at openjdk.org> wrote:

> OK. Will there be a CSR or JEP for this? 

There is no plan for this, nor should it be required. It?s an internal implementation. 

> When do you approximately expect this to land in main line?

ASAP. Compatibility for the field name is being worked on in Graal JIT. The plan is not to integrate this prior to this work being completed. 

We should probably add a more graceful transition. Such that `UseObjectMonitorTable` is turned off ergonomically even if the user specified it when running with JVMCI enabled. 

The main goal here is to get something in main line which does not affect the default behaviour of the VM. But allows for supporting future features such as Lilliput, along with enabling incremental improvement to `UseObjectMonitorTable` via support for more platforms and compiler backends, performance improvements, etc to be done towards the JDK project.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2213806034

From shade at openjdk.org  Mon Jul  8 11:58:34 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 8 Jul 2024 11:58:34 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v3]
In-Reply-To: <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
 <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com>
Message-ID: <FkDjtpGwpMtPg7NxC6vDwFgkBfK_t3noWiVmR0V5Tjk=.12715c7e-4a90-4ba9-9be2-2486d5ff77de@github.com>

On Fri, 5 Jul 2024 15:01:05 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens.
>> 
>> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details).
>> 
>> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. 
>> 
>> I tested the patch by running it through mach5 tiers 1-6.
>> 
>> Thanks,
>> Patricio
>
> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   use DEBUG_ONLY on _used declaration

Looks fine. I am tracking this for backport to 21.0.5, which already got the `ResourceMark` in `frame::oops_interpreted_do` due to JDK-8329665 backport.

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2163081402

From aboldtch at openjdk.org  Mon Jul  8 12:13:07 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 8 Jul 2024 12:13:07 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v2]
In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
Message-ID: <vy20BNZfIGQRLPRQrXH9R3pE6nUCY5kKNBDCtyhg0Y4=.f6fa0bd2-42ed-4a9b-a08e-ef56c54e8e48@github.com>

> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
> 
> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
> 
> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
> 
> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
> 
> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
> 
> # Cleanups
> 
> Cleaned up displaced header usage for:
>   * BasicLock
>     * Contains some Zero changes
>     * Renames one exported JVMCI field
>   * ObjectMonitor
>     * Updates comments and tests consistencies
> 
> # Refactoring
> 
> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
> 
> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
> 
> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
> 
> # LightweightSynchronizer
> 
> Working on adapting and incorporating the following section as a comment in the source code
> 
> ## Fast Locking
> 
>   CAS on locking bits in markWord. 
>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
> 
>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
> 
>   If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf...

Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:

  More graceful JVMCI VM option interaction

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20067/files
  - new: https://git.openjdk.org/jdk/pull/20067/files/4d835b94..28143503

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=00-01

  Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20067.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067

PR: https://git.openjdk.org/jdk/pull/20067

From duke at openjdk.org  Mon Jul  8 12:18:34 2024
From: duke at openjdk.org (Thomas Wuerthinger)
Date: Mon, 8 Jul 2024 12:18:34 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v2]
In-Reply-To: <vy20BNZfIGQRLPRQrXH9R3pE6nUCY5kKNBDCtyhg0Y4=.f6fa0bd2-42ed-4a9b-a08e-ef56c54e8e48@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <vy20BNZfIGQRLPRQrXH9R3pE6nUCY5kKNBDCtyhg0Y4=.f6fa0bd2-42ed-4a9b-a08e-ef56c54e8e48@github.com>
Message-ID: <15fwghNOC4j6Hctnqn7sVKsRFbHGs7mwcgpadcok1qo=.63b2a07a-2fe0-4557-843d-f6b131e37a09@github.com>

On Mon, 8 Jul 2024 12:13:07 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
> 
>   More graceful JVMCI VM option interaction

OK, thank you.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2213887069

From stuefe at openjdk.org  Mon Jul  8 12:33:35 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Mon, 8 Jul 2024 12:33:35 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v3]
In-Reply-To: <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
 <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com>
Message-ID: <IqgSXiXCZOj9b0mibWQ2BWb4qzDuNwXqgcix6pd0QxA=.ac7cde2a-45c5-46b4-9208-6a2c25346614@github.com>

On Fri, 5 Jul 2024 15:01:05 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens.
>> 
>> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details).
>> 
>> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. 
>> 
>> I tested the patch by running it through mach5 tiers 1-6.
>> 
>> Thanks,
>> Patricio
>
> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   use DEBUG_ONLY on _used declaration

Small nits, otherwise good. Thanks a lot for fixing.

src/hotspot/share/interpreter/oopMapCache.cpp line 183:

> 181:   if (mask_size() > small_mask_limit) {
> 182:     assert(!Thread::current()->resource_area()->contains((void*)_bit_mask[0]),
> 183:            "The bit mask should be allocated from the C heap");

Arguably, this assert is not needed. In debug builds, we have NMT enabled, and that does a check on os::free.

However, an assert that _bit_mask[0] != 0 *would* make sense, since the free quielty swallows null pointers.

src/hotspot/share/interpreter/oopMapCache.cpp line 405:

> 403: // Implementation of OopMapCache
> 404: 
> 405: void InterpreterOopMap::copy_from(OopMapCacheEntry* src) {

Possibly for another  RFE: src pointer should be const

src/hotspot/share/interpreter/oopMapCache.cpp line 423:

> 421:   } else {
> 422:     _bit_mask[0] = (uintptr_t) NEW_C_HEAP_ARRAY(uintptr_t, mask_word_size(), mtClass);
> 423:     assert(_bit_mask[0] != 0, "bit mask was not allocated");

The assert can be removed, no? NEW_C_HEAP_ARRAY does a null check by default.

src/hotspot/share/interpreter/oopMapCache.cpp line 424:

> 422:     _bit_mask[0] = (uintptr_t) NEW_C_HEAP_ARRAY(uintptr_t, mask_word_size(), mtClass);
> 423:     assert(_bit_mask[0] != 0, "bit mask was not allocated");
> 424:     memcpy((void*) _bit_mask[0], (void*) src->_bit_mask[0], mask_word_size() * BytesPerWord);

Are the (void*) cast really needed?

src/hotspot/share/interpreter/oopMapCache.hpp line 92:

> 90: 
> 91:  protected:
> 92:   DEBUG_ONLY(bool _used;)

Minor nit. This changes memory layout between debug and release builds, and this is used as part of OopMapCache. Not a big concern, but I usually prefer having the same layout between debug and release to test what we ship.

Can't we not just assert that mask size == USHRT_MAX?

-------------

PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2163133516
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1668523910
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1668536249
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1668539154
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1668535458
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1668531734

From luhenry at openjdk.org  Mon Jul  8 13:06:35 2024
From: luhenry at openjdk.org (Ludovic Henry)
Date: Mon, 8 Jul 2024 13:06:35 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics [v2]
In-Reply-To: <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
 <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com>
Message-ID: <J2n1vjnGIfkPollglwm_WzuJcFmQrvx-QvzMxKHQdEA=.f25df6b1-4a3f-4a58-8640-0f3fe2e94001@github.com>

On Sun, 7 Jul 2024 15:16:02 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:

>> Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK.
>
> ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - Use t2 directly instead of temp2
>  - Rename temp1 -> x0
>  - Left a note on a side effect of generate_vle32_pack4

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2360:

> 2358:     generate_aescrypt_round(res, vzero, vtmp1, vtmp2, vtmp3, vtmp4);
> 2359: 
> 2360:     generate_vle32_pack2(key, vtmp1, vtmp2);

Could you add the comment for `key` here as well.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2370:

> 2368:     __ vaesem_vv(res, vzero);
> 2369: 
> 2370:     generate_vle32_pack2(key, vtmp1, vtmp2);

And here as well for `key`.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2380:

> 2378:     __ vaesem_vv(res, vzero);
> 2379: 
> 2380:     generate_vle32_pack2(key, vtmp1, vtmp2);

Here as well for `key`.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2465:

> 2463:     generate_aesdecrypt_round(res, vzero, vtmp1, vtmp2, vtmp3, vtmp4);
> 2464: 
> 2465:     generate_vle32_pack2(key, vtmp1, vtmp2);

Same here for `key`.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2476:

> 2474:     __ vaesdm_vv(res, vzero);
> 2475: 
> 2476:     generate_vle32_pack2(key, vtmp1, vtmp2);

Same here for `key`.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2487:

> 2485:     __ vaesdm_vv(res, vzero);
> 2486: 
> 2487:     generate_vle32_pack2(key, vtmp1, vtmp2);

Same here for `key`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668308026
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668308210
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668308458
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668308689
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668308755
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668308837

From aph at openjdk.org  Mon Jul  8 13:39:36 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 8 Jul 2024 13:39:36 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
Message-ID: <-85jb7zkPiyjtG47_knDVtXF5iVTYH8hgMD5BTW1AM0=.ced6fcc8-4a11-409c-85ba-00d30cc35d47@github.com>

On Mon, 1 Jul 2024 16:54:55 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294).
>> 
>> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time.
>> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
>> 
>> Besides of the code changes, one important task is to handle the legal process.
>> 
>> Thanks!
>> 
>> ## Test
>> tests:
>> * test/jdk/jdk/incubator/vector/
>> * test/hotspot/jtreg/compiler/vectorapi/
>> 
>> options:
>> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:+EnableVectorSupport -XX:-UseVectorStubs
>> 
>> ## Performance
>> 
>> ### Options
>> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
>> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
>> 
>> ### Float
>> data
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0)
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408
>> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876
>> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936
>> Float128Vector.ATAN2 | 1024 | thrpt | 10 | 0.021 | ops/ms | 135.088 | 32.449 | 4.163 | 135.721 | 32.579 | 4.166
>> Float128Vector.CBRT | 1024 | thrpt | 10 | 0.004 | ops/ms | 114.547 | 39.517 | 2....
>
> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits:
> 
>  - Merge branch 'master' into sleef-aarch64-integrate-source
>  - merge master
>  - sleef 3.6.1 for riscv
>  - sleef 3.6.1
>  - update header files for arm
>  - add inline header file for riscv64
>  - remove notes about sleef changes
>  - fix performance issue
>  - disable unused-function warnings; add log msg
>  - minor
>  - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863

There is something that makes me nervous.
The big slab of preprocessed code in libvectormath/sleefinline_rvvm1.h is problematic.
Firstly, in all open source software the code should be the preferred form:

"The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed."
https://opensource.org/osd

Also, any such intermediate form is a golden example of a vector in which to hide something nasty. No one is going to read that file, and a malicious person with access to the JDK source base, either in our own github repo or in many other places downstream of OpenJDK could hide all manner of thing. In its form in this PR it's no better than checking in a binary.
See https://arstechnica.com/security/2024/04/what-we-know-about-the-xz-utils-backdoor-that-almost-infected-the-world/

I'd look at including the SLEEF source code, along with a script which generates the preprocessed form we use in the JDK build, so that more paranoid JDK builders can regenerate the preprocessed code.

Of course, I cannot be sure that my fellow reviewers will agree, but I think it's the right thing to do.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2214099558

From erikj at openjdk.org  Mon Jul  8 14:08:38 2024
From: erikj at openjdk.org (Erik Joelsson)
Date: Mon, 8 Jul 2024 14:08:38 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <-85jb7zkPiyjtG47_knDVtXF5iVTYH8hgMD5BTW1AM0=.ced6fcc8-4a11-409c-85ba-00d30cc35d47@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
 <-85jb7zkPiyjtG47_knDVtXF5iVTYH8hgMD5BTW1AM0=.ced6fcc8-4a11-409c-85ba-00d30cc35d47@github.com>
Message-ID: <6WE1CCFfFAgdyHzI32vo1L2u3t5o6JQvl214RmPeho4=.6ebc52a3-d50f-4695-b950-9458f1d71d84@github.com>

On Mon, 8 Jul 2024 13:36:36 GMT, Andrew Haley <aph at openjdk.org> wrote:

> There is something that makes me nervous. The big slab of preprocessed code in libvectormath/sleefinline_rvvm1.h is problematic. Firstly, in all open source software the code should be the preferred form:
> 
> "The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed." https://opensource.org/osd
> 
> Also, any such intermediate form is a golden example of a vector in which to hide something nasty. No one is going to read that file, and a malicious person with access to the JDK source base, either in our own github repo or in many other places downstream of OpenJDK could hide all manner of thing. In its form in this PR it's no better than checking in a binary. See https://arstechnica.com/security/2024/04/what-we-know-about-the-xz-utils-backdoor-that-almost-infected-the-world/
> 
> I'd look at including the SLEEF source code, along with a script which generates the preprocessed form we use in the JDK build, so that more paranoid JDK builders can regenerate the preprocessed code.
> 
> Of course, I cannot be sure that my fellow reviewers will agree, but I think it's the right thing to do.

While I agree with you in principle, we chose to import Sleef this way for practical reasons. (The actual importing of Sleef is happening in https://github.com/openjdk/jdk/pull/19185 / [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816).) The "preprocessing/code-generation" part of the Sleef build was considered too complex to reasonably replicate in the OpenJDK build system. Sleef is built using Cmake and we do not want to add a build dependency on Cmake and call out to a foreign build system at build time, for efficiency and complexity reasons. JDK-8329816 comes with a script to automatically generate the imported source files, to make it easy to update Sleef in the future. It should also be easy enough to verify the imported contents using the same script for anyone who wants to check the validity of the import step.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2214172864

From yzheng at openjdk.org  Mon Jul  8 14:55:36 2024
From: yzheng at openjdk.org (Yudi Zheng)
Date: Mon, 8 Jul 2024 14:55:36 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v2]
In-Reply-To: <vy20BNZfIGQRLPRQrXH9R3pE6nUCY5kKNBDCtyhg0Y4=.f6fa0bd2-42ed-4a9b-a08e-ef56c54e8e48@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <vy20BNZfIGQRLPRQrXH9R3pE6nUCY5kKNBDCtyhg0Y4=.f6fa0bd2-42ed-4a9b-a08e-ef56c54e8e48@github.com>
Message-ID: <MF3dxDlQF9N18kCQGi3Fym9NLujEcStavCSfIMckNB0=.5e9a9ea1-553f-4d2e-a94e-86c9c6178bd1@github.com>

On Mon, 8 Jul 2024 12:13:07 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
> 
>   More graceful JVMCI VM option interaction

Could you please revert 2814350 and export the following symbols to JVMCI?

diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp
index faf2cb24616..7be31aa0f5f 100644
--- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp
+++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp
@@ -241,6 +241,7 @@
   nonstatic_field(JavaThread,                  _stack_overflow_state._reserved_stack_activation, address)                            \
   nonstatic_field(JavaThread,                  _held_monitor_count,                           intx)                                  \
   nonstatic_field(JavaThread,                  _lock_stack,                                   LockStack)                             \
+  nonstatic_field(JavaThread,                  _om_cache,                                     OMCache)                               \
   JVMTI_ONLY(nonstatic_field(JavaThread,       _is_in_VTMS_transition,                        bool))                                 \
   JVMTI_ONLY(nonstatic_field(JavaThread,       _is_in_tmp_VTMS_transition,                    bool))                                 \
   JVMTI_ONLY(nonstatic_field(JavaThread,       _is_disable_suspend,                           bool))                                 \
@@ -531,6 +532,8 @@
                                                                           \
   declare_constant_with_value("CardTable::dirty_card", CardTable::dirty_card_val()) \
   declare_constant_with_value("LockStack::_end_offset", LockStack::end_offset()) \
+  declare_constant_with_value("OMCache::oop_to_oop_difference", OMCache::oop_to_oop_difference()) \
+  declare_constant_with_value("OMCache::oop_to_monitor_difference", OMCache::oop_to_monitor_difference()) \
                                                                           \
   declare_constant(CodeInstaller::VERIFIED_ENTRY)                         \
   declare_constant(CodeInstaller::UNVERIFIED_ENTRY)                       \

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2214322632

From duke at openjdk.org  Mon Jul  8 15:24:13 2024
From: duke at openjdk.org (ArsenyBochkarev)
Date: Mon, 8 Jul 2024 15:24:13 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics [v3]
In-Reply-To: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
Message-ID: <F1yms2X9VVITjLPANuQqABre5E199ILHQ4ywpS4cicY=.3e2c0af1-8070-497a-bfa0-5732eb199974@github.com>

> Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK.

ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision:

  Left a note on a side effect of generate_vle32_pack2

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19960/files
  - new: https://git.openjdk.org/jdk/pull/19960/files/9f5c7831..8520bc3a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=01-02

  Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/19960.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19960/head:pull/19960

PR: https://git.openjdk.org/jdk/pull/19960

From duke at openjdk.org  Mon Jul  8 15:24:13 2024
From: duke at openjdk.org (ArsenyBochkarev)
Date: Mon, 8 Jul 2024 15:24:13 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics [v2]
In-Reply-To: <J2n1vjnGIfkPollglwm_WzuJcFmQrvx-QvzMxKHQdEA=.f25df6b1-4a3f-4a58-8640-0f3fe2e94001@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
 <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com>
 <J2n1vjnGIfkPollglwm_WzuJcFmQrvx-QvzMxKHQdEA=.f25df6b1-4a3f-4a58-8640-0f3fe2e94001@github.com>
Message-ID: <BghNsitWJWaTDOZjWSDL4t2zEkdsZv5UPmvoJyHH-w8=.3aa312f3-9024-4827-9761-d3ca475f4a58@github.com>

On Mon, 8 Jul 2024 09:30:36 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:

>> ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision:
>> 
>>  - Use t2 directly instead of temp2
>>  - Rename temp1 -> x0
>>  - Left a note on a side effect of generate_vle32_pack4
>
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2360:
> 
>> 2358:     generate_aescrypt_round(res, vzero, vtmp1, vtmp2, vtmp3, vtmp4);
>> 2359: 
>> 2360:     generate_vle32_pack2(key, vtmp1, vtmp2);
> 
> Could you add the comment for `key` here as well.

All done!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668842170

From aboldtch at openjdk.org  Mon Jul  8 16:21:16 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 8 Jul 2024 16:21:16 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v3]
In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
Message-ID: <5CNKzDumOf1MJQXM9OBHQh0Mj7eLv2ONio1V-AXeSJI=.54302b45-2dd2-4f18-a094-6b2c6a59517c@github.com>

> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
> 
> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
> 
> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
> 
> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
> 
> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
> 
> # Cleanups
> 
> Cleaned up displaced header usage for:
>   * BasicLock
>     * Contains some Zero changes
>     * Renames one exported JVMCI field
>   * ObjectMonitor
>     * Updates comments and tests consistencies
> 
> # Refactoring
> 
> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
> 
> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
> 
> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
> 
> # LightweightSynchronizer
> 
> Working on adapting and incorporating the following section as a comment in the source code
> 
> ## Fast Locking
> 
>   CAS on locking bits in markWord. 
>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
> 
>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
> 
>   If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf...

Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision:

 - Add JVMCI symbol exports
 - Revert "More graceful JVMCI VM option interaction"
   
   This reverts commit 2814350370cf142e130fe1d38610c646039f976d.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20067/files
  - new: https://git.openjdk.org/jdk/pull/20067/files/28143503..173b75b8

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=01-02

  Stats: 8 lines in 2 files changed: 3 ins; 5 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20067.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067

PR: https://git.openjdk.org/jdk/pull/20067

From aph at openjdk.org  Mon Jul  8 16:23:37 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 8 Jul 2024 16:23:37 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
Message-ID: <TcrB6zIH-yx-6fyLfnQy4NHk5w8VqXm3anTAxbQJtXY=.8181016f-5d4d-4349-a8d7-343db9817f40@github.com>

On Mon, 1 Jul 2024 16:54:55 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294).
>> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers.
>> 
>> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time.
>> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
>> 
>> Besides of the code changes, one important task is to handle the legal process.
>> 
>> Thanks!
>> 
>> ## Test
>> tests:
>> * test/jdk/jdk/incubator/vector/
>> * test/hotspot/jtreg/compiler/vectorapi/
>> 
>> options:
>> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:+EnableVectorSupport -XX:-UseVectorStubs
>> 
>> ## Performance
>> 
>> ### Options
>> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
>> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
>> 
>> ### Float
>> data
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0)
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408
>> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876
>> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936
>> Float128Vector.ATAN...
>
> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits:
> 
>  - Merge branch 'master' into sleef-aarch64-integrate-source
>  - merge master
>  - sleef 3.6.1 for riscv
>  - sleef 3.6.1
>  - update header files for arm
>  - add inline header file for riscv64
>  - remove notes about sleef changes
>  - fix performance issue
>  - disable unused-function warnings; add log msg
>  - minor
>  - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863

I finally did some measurements. It would be nice if the JMH test were part of this patch.

It mostly looks good, but I can see an odd regression of DoubleMaxVector.TANH (by 39%) on Apple M1. I don't really know why this is, given that tanh(x) is almost certainly based on expm1(x). This probably isn't important, but it is odd.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2214587236

From ayang at openjdk.org  Mon Jul  8 16:24:08 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Mon, 8 Jul 2024 16:24:08 GMT
Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and
 VM_ParallelGCSystemGC
Message-ID: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>

Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity.

The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts.

Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser.

-------------

Commit messages:
 - pgc-vm-operation

Changes: https://git.openjdk.org/jdk/pull/20077/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20077&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335902
  Stats: 352 lines in 14 files changed: 96 ins; 169 del; 87 mod
  Patch: https://git.openjdk.org/jdk/pull/20077.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20077/head:pull/20077

PR: https://git.openjdk.org/jdk/pull/20077

From ayang at openjdk.org  Mon Jul  8 16:31:43 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Mon, 8 Jul 2024 16:31:43 GMT
Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and
 VM_ParallelGCSystemGC [v2]
In-Reply-To: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
Message-ID: <N4uBvRzIP52a4DgIeIx3ArjKPF0JrTI2bVsmHtD0rJg=.f7e1bb49-9bcd-420c-97fb-2617c798b5b7@github.com>

> Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity.
> 
> The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts.
> 
> Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser.

Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit:

  pgc-vm-operation

-------------

Changes: https://git.openjdk.org/jdk/pull/20077/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20077&range=01
  Stats: 352 lines in 14 files changed: 96 ins; 169 del; 87 mod
  Patch: https://git.openjdk.org/jdk/pull/20077.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20077/head:pull/20077

PR: https://git.openjdk.org/jdk/pull/20077

From aph at openjdk.org  Mon Jul  8 16:43:37 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 8 Jul 2024 16:43:37 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
Message-ID: <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com>

On Mon, 1 Jul 2024 16:54:55 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294).
>> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers.
>> 
>> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time.
>> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
>> 
>> Besides of the code changes, one important task is to handle the legal process.
>> 
>> Thanks!
>> 
>> ## Test
>> tests:
>> * test/jdk/jdk/incubator/vector/
>> * test/hotspot/jtreg/compiler/vectorapi/
>> 
>> options:
>> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:+EnableVectorSupport -XX:-UseVectorStubs
>> 
>> ## Performance
>> 
>> ### Options
>> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
>> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
>> 
>> ### Float
>> data
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0)
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408
>> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876
>> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936
>> Float128Vector.ATAN...
>
> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits:
> 
>  - Merge branch 'master' into sleef-aarch64-integrate-source
>  - merge master
>  - sleef 3.6.1 for riscv
>  - sleef 3.6.1
>  - update header files for arm
>  - add inline header file for riscv64
>  - remove notes about sleef changes
>  - fix performance issue
>  - disable unused-function warnings; add log msg
>  - minor
>  - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863

> While I agree with you in principle, we chose to import Sleef this way for practical reasons. (The actual importing of Sleef is happening in #19185 / [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816).) The "preprocessing/code-generation" part of the Sleef build was considered too complex to reasonably replicate in the OpenJDK build system. Sleef is built using Cmake and we do not want to add a build dependency on Cmake and call out to a foreign build system at build time, for efficiency and complexity reasons.

Of course, there is no reason to rebuild the preprocessed headers every time we build the JDK. I'd never ask for that; the last thing I want is to make building the JDK slower. However, it should be possible to do so on a checked-out JDK source tree, at the builder's option.

If there is a script, it doesn't have to be included in the OpenJDK build system itself, but it does have to be in the OpenJDK source tree. (It could be part of make/devkit, for example.)

With a script to produce preprocessed files, it should be possible for anyone building the JDK to run that script, and produce the preprocessed source. SLEEF won't take up a prohibitive amount of space.

We shouldn't be depending on some other web site somewhere being able to come up with the exact SLEEF sources we used, either. That fails the test of reproducibility.

> JDK-8329816 comes with a script to automatically generate the imported source files, to make it easy to update Sleef in the future. It should also be easy enough to verify the imported contents using the same script for anyone who wants to check the validity of the import step.

I get it, but not including everything we use in the OpenJDK tree is a dangerous precedent. It should be no big deal to do this right, given that we have the SLEEF sources and the build scripts already. I'm not asking for anything that doesn't exist already, I'm just saying that it must be checked in.

Avoiding inconvenience, however great, is not sufficient to justify such a step. This is perhaps something to discuss at the next Committers' Workshop.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2214663777

From amitkumar at openjdk.org  Mon Jul  8 16:50:40 2024
From: amitkumar at openjdk.org (Amit Kumar)
Date: Mon, 8 Jul 2024 16:50:40 GMT
Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII
 [v3]
In-Reply-To: <b4NrQs2S9jYAEddRJvmJelnXOXo8tGRulqW7b9Q_RO8=.0a33e434-6096-427d-940b-6f87facc3db6@github.com>
References: <YKa7IgCjp0GLJDZFTlLVoBfDavVdj1Fc5XmQV-xVBM8=.46792106-0555-47bd-899f-056fa5219d03@github.com>
 <nyLYOhw7-wSPlKjeWi3FyuLY0UzFwWJdj-19ijEInU4=.6f539aaf-0cff-4ab8-8ca0-3acd3b44d071@github.com>
 <EliUQk2e0HZE3BQ3BKOGvF81KROy_lLp4OgK-hRWazA=.79466db9-87df-403c-a928-15e1dea8bbd5@github.com>
 <b4NrQs2S9jYAEddRJvmJelnXOXo8tGRulqW7b9Q_RO8=.0a33e434-6096-427d-940b-6f87facc3db6@github.com>
Message-ID: <-QmwjnH5R3sEqzJJItuuVirwUQawa36T3V5iECXwZ7I=.d4bd0d93-20f0-4371-9eb4-45b550b06130@github.com>

On Thu, 4 Jul 2024 06:18:13 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This isn't really the area of my expertise, but the patch seems reasonable to me.
>
> Many thanks, @jerboaa !

Hi @tstuefe, 

GTestWrapper.java is failing on s390x, after this commit, consistently. I have opened [JDK-8335906](https://bugs.openjdk.org/browse/JDK-8335906).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19835#issuecomment-2214677436

From pchilanomate at openjdk.org  Mon Jul  8 18:14:54 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Mon, 8 Jul 2024 18:14:54 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v4]
In-Reply-To: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
Message-ID: <SDFSzAJLVcfhnlfPyRDZTI2hiF7sLfYqbymrGe8-BUw=.1004d539-7085-4b89-81eb-0e411b960385@github.com>

> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens.
> 
> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details).
> 
> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. 
> 
> I tested the patch by running it through mach5 tiers 1-6.
> 
> Thanks,
> Patricio

Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:

  address Thomas' comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20012/files
  - new: https://git.openjdk.org/jdk/pull/20012/files/7ce559cb..88d866ba

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20012&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20012&range=02-03

  Stats: 14 lines in 2 files changed: 0 ins; 6 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/20012.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20012/head:pull/20012

PR: https://git.openjdk.org/jdk/pull/20012

From pchilanomate at openjdk.org  Mon Jul  8 18:14:54 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Mon, 8 Jul 2024 18:14:54 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v3]
In-Reply-To: <IqgSXiXCZOj9b0mibWQ2BWb4qzDuNwXqgcix6pd0QxA=.ac7cde2a-45c5-46b4-9208-6a2c25346614@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
 <uFD2HVD2DS8b9XI68lOXqSyyT3gdfmNFXmYIUozJ3hc=.f5aa1a99-e90e-4f5e-9159-c9724205fbd9@github.com>
 <IqgSXiXCZOj9b0mibWQ2BWb4qzDuNwXqgcix6pd0QxA=.ac7cde2a-45c5-46b4-9208-6a2c25346614@github.com>
Message-ID: <gr0mBxJWj5dhm4l-NDkbi9tHYcB12dHKy4jw0-BvxuA=.f49eb2b3-4073-494d-a735-bde26c6c9635@github.com>

On Mon, 8 Jul 2024 12:19:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   use DEBUG_ONLY on _used declaration
>
> src/hotspot/share/interpreter/oopMapCache.cpp line 183:
> 
>> 181:   if (mask_size() > small_mask_limit) {
>> 182:     assert(!Thread::current()->resource_area()->contains((void*)_bit_mask[0]),
>> 183:            "The bit mask should be allocated from the C heap");
> 
> Arguably, this assert is not needed. In debug builds, we have NMT enabled, and that does a check on os::free.
> 
> However, an assert that _bit_mask[0] != 0 *would* make sense, since the free quielty swallows null pointers.

Fixed. We could had such case if the mask was never filled due to invalid bci, so I also improved the conditional.

> src/hotspot/share/interpreter/oopMapCache.cpp line 405:
> 
>> 403: // Implementation of OopMapCache
>> 404: 
>> 405: void InterpreterOopMap::copy_from(OopMapCacheEntry* src) {
> 
> Possibly for another  RFE: src pointer should be const

Fixed, should be fine to do it in this PR.

> src/hotspot/share/interpreter/oopMapCache.cpp line 423:
> 
>> 421:   } else {
>> 422:     _bit_mask[0] = (uintptr_t) NEW_C_HEAP_ARRAY(uintptr_t, mask_word_size(), mtClass);
>> 423:     assert(_bit_mask[0] != 0, "bit mask was not allocated");
> 
> The assert can be removed, no? NEW_C_HEAP_ARRAY does a null check by default.

Right, removed.

> src/hotspot/share/interpreter/oopMapCache.cpp line 424:
> 
>> 422:     _bit_mask[0] = (uintptr_t) NEW_C_HEAP_ARRAY(uintptr_t, mask_word_size(), mtClass);
>> 423:     assert(_bit_mask[0] != 0, "bit mask was not allocated");
>> 424:     memcpy((void*) _bit_mask[0], (void*) src->_bit_mask[0], mask_word_size() * BytesPerWord);
> 
> Are the (void*) cast really needed?

We need them here otherwise we get a compilation error on the conversion from intptr_t to void*. But we don't need them above so I removed those.

> src/hotspot/share/interpreter/oopMapCache.hpp line 92:
> 
>> 90: 
>> 91:  protected:
>> 92:   DEBUG_ONLY(bool _used;)
> 
> Minor nit. This changes memory layout between debug and release builds, and this is used as part of OopMapCache. Not a big concern, but I usually prefer having the same layout between debug and release to test what we ship.
> 
> Can't we not just assert that mask size == USHRT_MAX?

Yes, fixed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1669071718
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1669072023
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1669073705
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1669073099
PR Review Comment: https://git.openjdk.org/jdk/pull/20012#discussion_r1669072432

From stuefe at openjdk.org  Mon Jul  8 18:20:33 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Mon, 8 Jul 2024 18:20:33 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v4]
In-Reply-To: <SDFSzAJLVcfhnlfPyRDZTI2hiF7sLfYqbymrGe8-BUw=.1004d539-7085-4b89-81eb-0e411b960385@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
 <SDFSzAJLVcfhnlfPyRDZTI2hiF7sLfYqbymrGe8-BUw=.1004d539-7085-4b89-81eb-0e411b960385@github.com>
Message-ID: <ycdFLutW434YRzauiklq5o_bqnLtC5Y-hw-Bzm2celI=.7744ba02-65ff-4864-ad91-92548f93f2e0@github.com>

On Mon, 8 Jul 2024 18:14:54 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens.
>> 
>> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details).
>> 
>> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. 
>> 
>> I tested the patch by running it through mach5 tiers 1-6.
>> 
>> Thanks,
>> Patricio
>
> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   address Thomas' comments

good. thanks!

-------------

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2164051598

From jrose at openjdk.org  Mon Jul  8 18:31:07 2024
From: jrose at openjdk.org (John R Rose)
Date: Mon, 8 Jul 2024 18:31:07 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
Message-ID: <kqWVtdaVUBaMBqNHBHzvXn2oCW-AcOJ3J8tu1DQWa7Y=.0fe9af2c-99dc-4c3d-bfc4-c5724d3898e2@github.com>

On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> See bug for more discussion.
> 
> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
> 
> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
> 
> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
> 
> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
> 
> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
> 
> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
> 
> Additional testing:
>  - [x] New IR tests
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, `all`

I think we should remove `wrote_stable` and all associated logic. The full argument is in my comment the bug.

https://bugs.openjdk.org/browse/JDK-8333791

Stable fields are in some ways ?better finals?, in that they can be used to store lazy but effectively final states.  But part of the ?better? is that (correctly used) their race conditions are safe.  Since racing is part of their nature, the fences are an unnecessary expense.

So just removing that code would be the best outcome, unless I am missing something.  We will want to run such a change through heavy testing.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2161140122

From shade at openjdk.org  Mon Jul  8 18:31:07 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 8 Jul 2024 18:31:07 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
Message-ID: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>

See bug for more discussion.

Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193

A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160

AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.

I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.

Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.

C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.

Additional testing:
 - [x] New IR tests
 - [x] Linux x86_64 server fastdebug, `all`
 - [x] Linux AArch64 server fastdebug, `all`

-------------

Commit messages:
 - Variant 2: Only final-field like semantics for stable inits
 - Variant 3: Handle everything, including reads by compilers

Changes: https://git.openjdk.org/jdk/pull/19635/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19635&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8333791
  Stats: 1063 lines in 16 files changed: 1023 ins; 20 del; 20 mod
  Patch: https://git.openjdk.org/jdk/pull/19635.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19635/head:pull/19635

PR: https://git.openjdk.org/jdk/pull/19635

From matsaave at openjdk.org  Mon Jul  8 18:45:34 2024
From: matsaave at openjdk.org (Matias Saavedra Silva)
Date: Mon, 8 Jul 2024 18:45:34 GMT
Subject: RFR: 8312125: Refactor CDS enum class handling [v3]
In-Reply-To: <DcNbL1qEcDW0knnYfhWkR7hhD5UFwFw1Ko-qcUmx64Y=.50b0c364-aed1-46c6-a32e-62a347195c05@github.com>
References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com>
 <DcNbL1qEcDW0knnYfhWkR7hhD5UFwFw1Ko-qcUmx64Y=.50b0c364-aed1-46c6-a32e-62a347195c05@github.com>
Message-ID: <WvtC8lNLqbThywaIJxVbV6jsQhvOw2LYfUFy5Los7xY=.9b332ee2-b287-4af7-86d2-89f16fd4755e@github.com>

On Sun, 7 Jul 2024 01:50:17 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>> Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change.
>
> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Merge branch 'master' into 8312125-refactor-cds-enum-class-handling
>  - @calvinccheung comments
>  - fixed copyright
>  - 8312125: Refactor CDS enum class handling

Updates look good

-------------

Marked as reviewed by matsaave (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20013#pullrequestreview-2164097860

From pchilanomate at openjdk.org  Mon Jul  8 20:08:05 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Mon, 8 Jul 2024 20:08:05 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v2]
In-Reply-To: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
Message-ID: <s2b91CxdF_c01u14qHPvobUSw0Uz9IsjLJcw5o4dtF8=.5d861285-1468-4979-816d-824e8fec0f9c@github.com>

> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler.
> 
> I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3.
> 
> Thanks,
> Patricio

Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:

  use VThreadPinner

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20016/files
  - new: https://git.openjdk.org/jdk/pull/20016/files/0490e6c8..ce777598

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=00-01

  Stats: 13 lines in 1 file changed: 5 ins; 2 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/20016.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20016/head:pull/20016

PR: https://git.openjdk.org/jdk/pull/20016

From pchilanomate at openjdk.org  Mon Jul  8 20:08:05 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Mon, 8 Jul 2024 20:08:05 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v2]
In-Reply-To: <9E_nLqk5ThBlynnp1khLmG1iislzXOY8eH0VV3J1itA=.a776501d-b4a3-4a2f-8162-eb1c536a7839@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
 <9E_nLqk5ThBlynnp1khLmG1iislzXOY8eH0VV3J1itA=.a776501d-b4a3-4a2f-8162-eb1c536a7839@github.com>
Message-ID: <iaBqzib4909AgZHO95lUgxtZHa5dgRNU1KvAjOjit-w=.e2acabb6-a421-4b87-8df2-c3a3c75f2c6f@github.com>

On Mon, 8 Jul 2024 08:25:47 GMT, Alan Bateman <alanb at openjdk.org> wrote:

>> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   use VThreadPinner
>
> test/jdk/java/lang/Thread/virtual/ThreadYield.java line 29:
> 
>> 27:  * @summary Test that Thread.yield loop polls for safepoints
>> 28:  * @requires vm.continuations
>> 29:  * @modules java.base/java.lang:+open
> 
> I assume the `@modules` isn't needed as this test doesn't need to open java.lang.

Right, removed.

> test/jdk/java/lang/Thread/virtual/ThreadYield.java line 47:
> 
>> 45: import static org.junit.jupiter.api.Assertions.*;
>> 46: 
>> 47: class ThreadYield {
> 
> This isn't a unit test for Thread.yield so I think it would be better to rename to something specific like ThreadYieldPollsSafepoint (or better name).

How about ThreadPollOnYield?

> test/jdk/java/lang/Thread/virtual/ThreadYield.java line 49:
> 
>> 47: class ThreadYield {
>> 48:     static void foo(AtomicBoolean done) {
>> 49:         synchronized (done) {
> 
> When this test makes it to the loom repo then we'll need to change it to pin by other means.

I changed it to use VThreadPinner. I verified the test still times out with Graal.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1669230583
PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1669229847
PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1669231244

From iklam at openjdk.org  Mon Jul  8 20:17:38 2024
From: iklam at openjdk.org (Ioi Lam)
Date: Mon, 8 Jul 2024 20:17:38 GMT
Subject: Integrated: 8312125: Refactor CDS enum class handling
In-Reply-To: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com>
References: <ZPjUqMhW1Tgk-cnp16sjKnn1JV1JN9qoEoVjaCA5GNY=.a98686ed-8472-4e2b-bb66-58e21644c69c@github.com>
Message-ID: <PJ9_Gwsu7-gd-BGAwasKlTf8QvFiVhmur6XOabDU_3c=.0ee3de33-c94a-4a6e-8c49-6b09090a0bc5@github.com>

On Wed, 3 Jul 2024 17:00:30 GMT, Ioi Lam <iklam at openjdk.org> wrote:

> Please review this simple refactoring of the CDS code for handling enum classes. The code is moved to new files cdsEnumKlass.cpp/hpp. There's otherwise no change.

This pull request has now been integrated.

Changeset: 9c7a6eab
Author:    Ioi Lam <iklam at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/9c7a6eabb93c570fdb74076edc931576ed6be3e0
Stats:     285 lines in 5 files changed: 190 ins; 93 del; 2 mod

8312125: Refactor CDS enum class handling

Reviewed-by: matsaave, ccheung

-------------

PR: https://git.openjdk.org/jdk/pull/20013

From tanksherman27 at gmail.com  Tue Jul  9 05:14:58 2024
From: tanksherman27 at gmail.com (Julian Waters)
Date: Tue, 9 Jul 2024 13:14:58 +0800
Subject: Where does VMError::print_native_stack and
 os::get_sender_for_C_frame load/use the frame pointer?
In-Reply-To: <cb2380cc-11ad-4f89-a0f2-92281ad7d5a0@oracle.com>
References: <CAP2b4GNPq3Fr3X=v=_8nFLwYTbC7e=0N5Xd2i2jOvXfqqftCrQ@mail.gmail.com>
 <cb2380cc-11ad-4f89-a0f2-92281ad7d5a0@oracle.com>
Message-ID: <CAP2b4GNOabGyT15SxtD9tP2ok15-2joy=Fo1ag8EB5f_iOTB=A@mail.gmail.com>

Hi David,

I just looked at the code for both, and it weirdly doesn't seem that
fetch_frame_from_context is used in either. Out of curiosity I tried
removing HAVE_PLATFORM_PRINT_NATIVE_STACK from Windows/x64 and
deliberately crashed HotSpot after compiling the JDK, and the
resulting hs_err file had almost no frame information as a result:

---------------  S U M M A R Y ------------

Command Line: --enable-preview Crash

Host: AMD Ryzen 9 7845HX with Radeon Graphics        , 24 cores, 15G,
Windows 11 , 64 bit Build 22621 (10.0.22621.3672)
Time: Tue Jul  9 02:41:20 2024 Malay Peninsula Standard Time elapsed
time: 0.070338 seconds (0d 0h 0m 0s)

---------------  T H R E A D  ---------------

Current thread (0x0000017e8cead250):  JavaThread "main"
[_thread_in_vm, id=33760, stack(0x0000005c11f00000,0x0000005c12000000)
(1024K)]

Stack: [0x0000005c11f00000,0x0000005c12000000],
sp=0x0000005c11fff000,  free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0x195ab57]
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  jdk.internal.misc.Unsafe.putLong(Ljava/lang/Object;JJ)V+0
java.base at 24-internal
j  jdk.internal.misc.Unsafe.putAddress(Ljava/lang/Object;JJ)V+24
java.base at 24-internal
j  jdk.internal.misc.Unsafe.putAddress(JJ)V+4 java.base at 24-internal
j  sun.misc.Unsafe.putAddress(JJ)V+8 jdk.unsupported at 24-internal
j  Crash.main()V+53
v  ~StubRoutines::call_stub 0x0000017e9fc70fcd

siginfo: EXCEPTION_ACCESS_VIOLATION (0xc0000005), writing address
0x0000000000000000

Native frames only has 1 frame in it, indicative of further frames not
being found. I can't really tell what else is required to get it to
work with the regular VMError::print_native_stack without requiring
the Windows specific os::win32::platform_print_native_stack. I
compiled HotSpot with gcc and verified that the frame pointer is
indeed saved, so this not working is a little odd to me (Was testing
in the off chance that the Microsoft compiler could be forced to
preserve the frame pointer). There has to somehow be a way to walk the
frames on Windows when the frame pointer is available for use

best regards,
Julian

P.S. The attempted patch is attached below, if anyone is curious

diff --git a/src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp
b/src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp
index 7e0814c014b..a4fa45ed78f 100644
--- a/src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp
+++ b/src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp
@@ -71,7 +71,7 @@ extern LONG WINAPI
topLevelExceptionFilter(_EXCEPTION_POINTERS* );

 // Install a win32 structured exception handler around thread.
 void os::os_exception_wrapper(java_call_t f, JavaValue* value, const
methodHandle& method, JavaCallArguments* args, JavaThread* thread) {
-  __try {
+  WIN32_TRY {

 #ifndef AMD64
     // We store the current thread in this wrapperthread location
@@ -111,7 +111,7 @@ void os::os_exception_wrapper(java_call_t f,
JavaValue* value, const methodHandl
 #endif // !AMD64

     f(value, method, args, thread);
-  } __except(topLevelExceptionFilter((_EXCEPTION_POINTERS*)_exception_info()))
{
+  } WIN32_EXCEPT (topLevelExceptionFilter(GetExceptionInformation())) {
       // Nothing to do.
   }
 }
@@ -396,16 +396,32 @@ bool
os::win32::get_frame_at_stack_banging_point(JavaThread* thread,


 // VC++ does not save frame pointer on stack in optimized build. It
-// can be turned off by /Oy-. If we really want to walk C frames,
+// can be turned off by -Oy-. If we really want to walk C frames,
 // we can use the StackWalk() API.
 frame os::get_sender_for_C_frame(frame* fr) {
+#ifdef __GNUC__
+  return frame(fr->sender_sp(), fr->link(), fr->sender_pc());
+#elif defined(_MSC_VER)
   ShouldNotReachHere();
   return frame();
+#endif
 }

 frame os::current_frame() {
+#ifdef __GNUC__
+  frame f(reinterpret_cast<intptr_t*>(os::current_stack_pointer()),
+          reinterpret_cast<intptr_t*>(__builtin_frame_address(1)),
+          CAST_FROM_FN_PTR(address, &os::current_frame));
+  if (os::is_first_C_frame(&f)) {
+    // stack is not walkable
+    return frame();
+  } else {
+    return os::get_sender_for_C_frame(&f);
+  }
+#elif defined(_MSC_VER)
   return frame();  // cannot walk Windows frames this way.  See
os::get_native_stack
                    // and os::platform_print_native_stack
+#endif
 }

 void os::print_context(outputStream *st, const void *context) {
diff --git a/src/hotspot/os_cpu/windows_x86/os_windows_x86.inline.hpp
b/src/hotspot/os_cpu/windows_x86/os_windows_x86.inline.hpp
index f7622611da7..3461cd4c0b0 100644
--- a/src/hotspot/os_cpu/windows_x86/os_windows_x86.inline.hpp
+++ b/src/hotspot/os_cpu/windows_x86/os_windows_x86.inline.hpp
@@ -29,12 +29,14 @@
 #include "os_windows.hpp"

 #ifdef AMD64
+#ifdef _MSC_VER
 #define HAVE_PLATFORM_PRINT_NATIVE_STACK 1
 inline bool os::platform_print_native_stack(outputStream* st, const
void* context,
                                      char *buf, int buf_size,
address& lastpc) {
   return os::win32::platform_print_native_stack(st, context, buf,
buf_size, lastpc);
 }
 #endif
+#endif

 inline jlong os::rdtsc() {
   // 32 bit: 64 bit result in edx:eax
diff --git a/src/hotspot/share/runtime/os.cpp b/src/hotspot/share/runtime/os.cpp
index 7b766707b0d..d3613652f45 100644
--- a/src/hotspot/share/runtime/os.cpp
+++ b/src/hotspot/share/runtime/os.cpp
@@ -179,7 +179,7 @@ char* os::iso8601_time(jlong
milliseconds_since_19700101, char* buffer, size_t b
   // No offset when dealing with UTC
   time_t UTC_to_local = 0;
   if (!utc) {
-#if (defined(_ALLBSD_SOURCE) || defined(_GNU_SOURCE)) && !defined(AIX)
+#if (defined(_ALLBSD_SOURCE) || defined(_GNU_SOURCE)) &&
!defined(AIX) && !defined(_WIN32)
     UTC_to_local = -(time_struct.tm_gmtoff);
 #elif defined(_WINDOWS)
     long zone;
@@ -1349,7 +1349,9 @@ static bool is_pointer_bad(intptr_t* ptr) {
 bool os::is_first_C_frame(frame* fr) {

 #ifdef _WINDOWS
+#ifdef _MSC_VER
   return true; // native stack isn't walkable on windows this way.
+#endif
 #endif
   // Load up sp, fp, sender sp and sender fp, check for reasonable values.
   // Check usp first, because if that's bad the other accessors may fault


On Mon, Jul 8, 2024 at 4:49?PM David Holmes <david.holmes at oracle.com> wrote:
>
> On 8/07/2024 5:59 pm, Julian Waters wrote:
> > Hi David,
> >
> > Ah, I think you misunderstood me, I'm aware that the frame pointer is
> > saved as required by the compiler (With the exception of the Microsoft
> > compiler, which doesn't save it at all). What I meant was that the
> > comments in Windows code imply that VMError::print_native_stack and
> > os::get_sender_for_C_frame need to use the frame pointer, yet I can't
> > seem to find where or how either of them obtain the frame pointer for
> > whatever they use it for on platforms and compilers where the frame
> > pointer is saved (For instance, on Linux), whether through handwritten
> > assembly code or some other means. It follows that if they need to use
> > the frame pointer, then they must grab it from somewhere, after all
>
> Ah sorry. AFAICS we just create the frame() objects and wallk the stack
> via those. We use fetch_frame_from_context to kick things off in the
> case of a crash.
>
> David
>
> > best regards,
> > Julian

From fyang at openjdk.org  Tue Jul  9 05:30:32 2024
From: fyang at openjdk.org (Fei Yang)
Date: Tue, 9 Jul 2024 05:30:32 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics [v3]
In-Reply-To: <F1yms2X9VVITjLPANuQqABre5E199ILHQ4ywpS4cicY=.3e2c0af1-8070-497a-bfa0-5732eb199974@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
 <F1yms2X9VVITjLPANuQqABre5E199ILHQ4ywpS4cicY=.3e2c0af1-8070-497a-bfa0-5732eb199974@github.com>
Message-ID: <IATUuy7OYBIasXTq1KFmVEjeg2eQ9qFM2UP5B0UhoHw=.7a112155-e875-4752-b6f4-fbeb56248759@github.com>

On Mon, 8 Jul 2024 15:24:13 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:

>> Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK.
>
> ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Left a note on a side effect of generate_vle32_pack2

Changes requested by fyang (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/19960#pullrequestreview-2163577505

From fyang at openjdk.org  Tue Jul  9 05:30:34 2024
From: fyang at openjdk.org (Fei Yang)
Date: Tue, 9 Jul 2024 05:30:34 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics [v2]
In-Reply-To: <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
 <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com>
Message-ID: <vknSXGLwqD-p-lOrVwzn8rU6mTY3o4NP3eRbp4smvoI=.33dba76f-cd79-4d55-9e87-58e37adfeaf8@github.com>

On Sun, 7 Jul 2024 15:16:02 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:

>> Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK.
>
> ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - Use t2 directly instead of temp2
>  - Rename temp1 -> x0
>  - Left a note on a side effect of generate_vle32_pack4

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2282:

> 2280:     __ vrev8_v(vtmp1, vtmp1);
> 2281:     __ vrev8_v(vtmp2, vtmp2);
> 2282:   }

Please leave a new line after each of these newly-added functions.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2332:

> 2330:     const Register key         = c_rarg2;  // key array address
> 2331:     const Register keylen      = c_rarg3;
> 2332:     const Register x0          = c_rarg4;

I think you can use the global `x0` (aka the zero register) instead for `vsetivli`. It very confusing to have register alias names like `x0` like here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668794931
PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1668790141

From dean.long at oracle.com  Tue Jul  9 05:32:40 2024
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Mon, 8 Jul 2024 22:32:40 -0700
Subject: Where does VMError::print_native_stack and
 os::get_sender_for_C_frame load/use the frame pointer?
In-Reply-To: <CAP2b4GNCAh20cyz_JgF+kg34zzyNHznGSUB4_5_E0ot1ZJnwoA@mail.gmail.com>
References: <CAP2b4GNCAh20cyz_JgF+kg34zzyNHznGSUB4_5_E0ot1ZJnwoA@mail.gmail.com>
Message-ID: <281d9f6c-1fdc-4411-9f83-dc6e82a0faaa@oracle.com>

It sounds like you are looking for frame::link().? See also 
frame::real_fp() and frame::fp().

dl

On 7/7/24 11:04 PM, Julian Waters wrote:
> Hi all,
>
> I have a question with regards to os::get_sender_for_C_frame and
> VMError::print_native_stack. In Windows specific code comments allude
> to both needing the rbp register to be saved, which is why
> VMError::print_native_stack
> doesn't work on Windows since Microsoft Visual C doesn't save the frame
> pointer, as stated:
>
> /*
> * Windows/x64 does not use stack frames the way expected by Java:
> * [1] in most cases, there is no frame pointer. All locals are addressed via RSP
> * [2] in rare cases, when alloca() is used, a frame pointer is used,
> but this may
> * not be RBP.
> * See http://msdn.microsoft.com/en-us/library/ew5tede7.aspx
> *
> * So it's not possible to print the native stack using the
> * while (...) {... fr = os::get_sender_for_C_frame(&fr); }
> * loop in vmError.cpp. We need to roll our own loop.
> */
>
> // VC++ does not save frame pointer on stack in optimized build. It
> // can be turned off by -Oy-. If we really want to walk C frames,
> // we can use the StackWalk() API.
>
> I can't seem to find where rbp is loaded and used on platforms and
> compilers that do save the frame pointer though. Eclipse cannot find
> it through the vast collection of member methods inside the frame
> class and related code. Do anyone by any chance know where the code that
> loads and uses the frame pointer for os::get_sender_for_C_frame and
> VMError::print_native_stack is located on such platforms?
>
> best regards,
> Julian

From dnsimon at openjdk.org  Tue Jul  9 07:52:58 2024
From: dnsimon at openjdk.org (Doug Simon)
Date: Tue, 9 Jul 2024 07:52:58 GMT
Subject: RFR: 8335553: [Graal] Compiler thread calls into
 jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
Message-ID: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>

This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions:
1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing).
2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails.
3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1.

An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue.

To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible.

-------------

Commit messages:
 - improved exception translation between HotSpot and libgraal heaps

Changes: https://git.openjdk.org/jdk/pull/20083/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20083&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335553
  Stats: 84 lines in 5 files changed: 50 ins; 22 del; 12 mod
  Patch: https://git.openjdk.org/jdk/pull/20083.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20083/head:pull/20083

PR: https://git.openjdk.org/jdk/pull/20083

From dnsimon at openjdk.org  Tue Jul  9 07:52:58 2024
From: dnsimon at openjdk.org (Doug Simon)
Date: Tue, 9 Jul 2024 07:52:58 GMT
Subject: RFR: 8335553: [Graal] Compiler thread calls into
 jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
In-Reply-To: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
Message-ID: <rVd_Q0quLUtgmICEEtFkSzbGfPWD2_RkwX1y5cUS40w=.2fe82b2b-5b49-477a-81a5-9e39bf72a377@github.com>

On Mon, 8 Jul 2024 19:01:05 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

> This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions:
> 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing).
> 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails.
> 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1.
> 
> An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue.
> 
> To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible.

src/hotspot/share/utilities/exceptions.cpp line 114:

> 112: #endif // ASSERT
> 113: 
> 114:   if (h_exception.is_null() && !thread->can_call_java()) {

There is no reason to replace an existing exception object with a dummy exception object in the case where the current thread cannot call into Java. Since the exception object already exists, no Java call is necessary.

This change is necessary to allow the libgraal exception translation mechanism to know that an OOME is being translated.

src/hotspot/share/utilities/exceptions.cpp line 208:

> 206:                               Handle h_loader, Handle h_protection_domain) {
> 207:   // Check for special boot-strapping/compiler-thread handling
> 208:   if (special_exception(thread, file, line, h_cause)) return;

This fixes a long standing bug where `special_exception` is being queried with the *cause* of the exception being thrown instead of the *name* of the exception being thrown.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1669153819
PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1669148553

From fyang at openjdk.org  Tue Jul  9 08:40:35 2024
From: fyang at openjdk.org (Fei Yang)
Date: Tue, 9 Jul 2024 08:40:35 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics [v2]
In-Reply-To: <vknSXGLwqD-p-lOrVwzn8rU6mTY3o4NP3eRbp4smvoI=.33dba76f-cd79-4d55-9e87-58e37adfeaf8@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
 <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com>
 <vknSXGLwqD-p-lOrVwzn8rU6mTY3o4NP3eRbp4smvoI=.33dba76f-cd79-4d55-9e87-58e37adfeaf8@github.com>
Message-ID: <T59CuchKVcFhqy7VAzIHxakveuo2bJFrORdrKQwoFLE=.1b43c0cb-d05e-45eb-b85c-026b44dea080@github.com>

On Mon, 8 Jul 2024 14:53:03 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision:
>> 
>>  - Use t2 directly instead of temp2
>>  - Rename temp1 -> x0
>>  - Left a note on a side effect of generate_vle32_pack4
>
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2282:
> 
>> 2280:     __ vrev8_v(vtmp1, vtmp1);
>> 2281:     __ vrev8_v(vtmp2, vtmp2);
>> 2282:   }
> 
> Please leave a new line after each of these newly-added functions.

BTW: Did you compare this with the openssl version which also makes use of `vaesz_vs` instruction from `Zvkned`  [1]? 

[1] https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aes-riscv64-zvkb-zvkned.pl

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1670009486

From mli at openjdk.org  Tue Jul  9 11:48:12 2024
From: mli at openjdk.org (Hamlin Li)
Date: Tue, 9 Jul 2024 11:48:12 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v10]
In-Reply-To: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
Message-ID: <jOPhbkdyBhmThSdvmjaKeFrSa-TbIh4bLY1SPKQgmq8=.c6e21cd9-3e7c-4781-b68f-aff19d7e3552@github.com>

> Hi,
> Can you help to review the patch?
> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294).
> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers.
> 
> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time.
> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
> 
> Besides of the code changes, one important task is to handle the legal process.
> 
> Thanks!
> 
> ## Test
> tests:
> * test/jdk/jdk/incubator/vector/
> * test/hotspot/jtreg/compiler/vectorapi/
> 
> options:
> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
> * -XX:+EnableVectorSupport -XX:-UseVectorStubs
> 
> ## Performance
> 
> ### Options
> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
> 
> ### Float
> data
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0)
> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408
> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876
> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936
> Float128Vector.ATAN2 | 1024 | thrpt | 10 | 0.021 | ops/ms | 135.088 | 32.449 | 4.163 | 135.72...

Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:

  minor

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/18605/files
  - new: https://git.openjdk.org/jdk/pull/18605/files/b54fc863..da65cfa5

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=09
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=08-09

  Stats: 17 lines in 3 files changed: 11 ins; 4 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/18605.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18605/head:pull/18605

PR: https://git.openjdk.org/jdk/pull/18605

From mli at openjdk.org  Tue Jul  9 12:08:50 2024
From: mli at openjdk.org (Hamlin Li)
Date: Tue, 9 Jul 2024 12:08:50 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
Message-ID: <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>

> Hi,
> Can you help to review the patch?
> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294).
> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers.
> 
> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time.
> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
> 
> Besides of the code changes, one important task is to handle the legal process.
> 
> Thanks!
> 
> ## Test
> tests:
> * test/jdk/jdk/incubator/vector/
> * test/hotspot/jtreg/compiler/vectorapi/
> 
> options:
> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
> * -XX:+EnableVectorSupport -XX:-UseVectorStubs
> 
> ## Performance
> 
> ### Options
> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
> 
> ### Float
> data
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0)
> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408
> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876
> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936
> Float128Vector.ATAN2 | 1024 | thrpt | 10 | 0.021 | ops/ms | 135.088 | 32.449 | 4.163 | 135.72...

Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:

  skip TANH

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/18605/files
  - new: https://git.openjdk.org/jdk/pull/18605/files/da65cfa5..6061c25d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=10
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=09-10

  Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/18605.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18605/head:pull/18605

PR: https://git.openjdk.org/jdk/pull/18605

From galder at openjdk.org  Tue Jul  9 12:12:53 2024
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Tue, 9 Jul 2024 12:12:53 GMT
Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and
 Math.min(long,long)
Message-ID: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com>

This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance.

Currently vectorization does not kick in for loops containing either of these calls because of the following error:


VLoop::check_preconditions: failed: control flow in loop not allowed


The control flow is due to the java implementation for these methods, e.g.


public static long max(long a, long b) {
    return (a >= b) ? a : b;
}


This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively.
By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization.
E.g.


SuperWord::transform_loop:
    Loop: N518/N126  counted [int,int),+4 (1025 iters)  main has_sfpt strip_mined
 518  CountedLoop  === 518 246 126  [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21)


Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1):


==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR
   jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
                                                         1     1     0     0
==============================
TEST SUCCESS

long min   1155
long max   1173


After the patch, on darwin/aarch64 (M1):


==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR
   jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
                                                         1     1     0     0
==============================
TEST SUCCESS

long min   1042
long max   1042


This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes.
Therefore, it still relies on the macro expansion to transform those into CMoveL.

I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results:


==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR
   jtreg:test/hotspot/jtreg:tier1                     2500  2500     0     0
>> jtreg:test/jdk:tier1                               2413  2412     1     0 <<
   jtreg:test/langtools:tier1                         4556  4556     0     0
   jtreg:test/jaxp:tier1                                 0     0     0     0
   jtreg:test/lib-test:tier1                            33    33     0     0
==============================


The failure I got is [CODETOOLS-7903745](https://bugs.openjdk.org/browse/CODETOOLS-7903745) so unrelated to these changes.

-------------

Commit messages:
 - 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long)

Changes: https://git.openjdk.org/jdk/pull/20098/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8307513
  Stats: 32 lines in 5 files changed: 32 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20098.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20098/head:pull/20098

PR: https://git.openjdk.org/jdk/pull/20098

From dnsimon at openjdk.org  Tue Jul  9 13:46:46 2024
From: dnsimon at openjdk.org (Doug Simon)
Date: Tue, 9 Jul 2024 13:46:46 GMT
Subject: RFR: 8335553: [Graal] Compiler thread calls into
 jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
 [v2]
In-Reply-To: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
Message-ID: <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com>

> This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions:
> 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing).
> 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails.
> 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1.
> 
> An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue.
> 
> To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible.

Doug Simon has updated the pull request incrementally with one additional commit since the last revision:

  fixed TestTranslatedException

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20083/files
  - new: https://git.openjdk.org/jdk/pull/20083/files/ff544be3..aa32491c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20083&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20083&range=00-01

  Stats: 19 lines in 2 files changed: 12 ins; 0 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/20083.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20083/head:pull/20083

PR: https://git.openjdk.org/jdk/pull/20083

From alanb at openjdk.org  Tue Jul  9 14:01:34 2024
From: alanb at openjdk.org (Alan Bateman)
Date: Tue, 9 Jul 2024 14:01:34 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v2]
In-Reply-To: <iaBqzib4909AgZHO95lUgxtZHa5dgRNU1KvAjOjit-w=.e2acabb6-a421-4b87-8df2-c3a3c75f2c6f@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
 <9E_nLqk5ThBlynnp1khLmG1iislzXOY8eH0VV3J1itA=.a776501d-b4a3-4a2f-8162-eb1c536a7839@github.com>
 <iaBqzib4909AgZHO95lUgxtZHa5dgRNU1KvAjOjit-w=.e2acabb6-a421-4b87-8df2-c3a3c75f2c6f@github.com>
Message-ID: <4Nq-9jBCNPXjTt2JoVVAR4IMCWiur0HI8zTZNZepZqM=.86b3fd9d-d4c5-40b9-be91-14df861ae4eb@github.com>

On Mon, 8 Jul 2024 20:04:04 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> test/jdk/java/lang/Thread/virtual/ThreadYield.java line 47:
>> 
>>> 45: import static org.junit.jupiter.api.Assertions.*;
>>> 46: 
>>> 47: class ThreadYield {
>> 
>> This isn't a unit test for Thread.yield so I think it would be better to rename to something specific like ThreadYieldPollsSafepoint (or better name).
>
> How about ThreadPollOnYield?

That would be okay, main thing is to avoid any suggestion that it's a general test for Thread.yield.

>> test/jdk/java/lang/Thread/virtual/ThreadYield.java line 49:
>> 
>>> 47: class ThreadYield {
>>> 48:     static void foo(AtomicBoolean done) {
>>> 49:         synchronized (done) {
>> 
>> When this test makes it to the loom repo then we'll need to change it to pin by other means.
>
> I changed it to use VThreadPinner. I verified the test still times out with Graal.

Thanks, that avoids needing to update when it meets up with the monitor changes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1670582899
PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1670583501

From yzheng at openjdk.org  Tue Jul  9 14:40:33 2024
From: yzheng at openjdk.org (Yudi Zheng)
Date: Tue, 9 Jul 2024 14:40:33 GMT
Subject: RFR: 8335553: [Graal] Compiler thread calls into
 jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
 [v2]
In-Reply-To: <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com>
References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
 <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com>
Message-ID: <h9dpaL1Pnl8D1T4inI12kuGJN8-QmLte0VCFMJdp0Ig=.7c58fb03-a5be-40cb-85a1-52ee9943f63e@github.com>

On Tue, 9 Jul 2024 13:46:46 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

>> This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions:
>> 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing).
>> 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails.
>> 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1.
>> 
>> An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue.
>> 
>> To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible.
>
> Doug Simon has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fixed TestTranslatedException

src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 782:

> 780:       while (true) {
> 781:         // Trigger an OutOfMemoryError
> 782:         objArrayOop next = oopFactory::new_objectArray(0x7FFFFFFF, CHECK_NULL);

Shall we check for pending exception and break here?

test/jdk/jdk/internal/vm/TestTranslatedException.java line 167:

> 165:     private static void assertThrowableEquals(Throwable originalIn, Throwable decodedIn) {
> 166:         Throwable original = originalIn;
> 167:         Throwable decoded = decodedIn;

What is the purpose of this renaming?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1670646934
PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1670607742

From dnsimon at openjdk.org  Tue Jul  9 14:45:33 2024
From: dnsimon at openjdk.org (Doug Simon)
Date: Tue, 9 Jul 2024 14:45:33 GMT
Subject: RFR: 8335553: [Graal] Compiler thread calls into
 jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
 [v2]
In-Reply-To: <h9dpaL1Pnl8D1T4inI12kuGJN8-QmLte0VCFMJdp0Ig=.7c58fb03-a5be-40cb-85a1-52ee9943f63e@github.com>
References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
 <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com>
 <h9dpaL1Pnl8D1T4inI12kuGJN8-QmLte0VCFMJdp0Ig=.7c58fb03-a5be-40cb-85a1-52ee9943f63e@github.com>
Message-ID: <vgV9ewwD3yK8VwAqF6Uuy6zFeGju_9Ubd0tPHnQakv4=.d7988221-7ac9-46fc-b88c-a2edf4e85d64@github.com>

On Tue, 9 Jul 2024 14:37:47 GMT, Yudi Zheng <yzheng at openjdk.org> wrote:

>> Doug Simon has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fixed TestTranslatedException
>
> src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 782:
> 
>> 780:       while (true) {
>> 781:         // Trigger an OutOfMemoryError
>> 782:         objArrayOop next = oopFactory::new_objectArray(0x7FFFFFFF, CHECK_NULL);
> 
> Shall we check for pending exception and break here?

The `CHECK_NULL` macro effectively does that.

> test/jdk/jdk/internal/vm/TestTranslatedException.java line 167:
> 
>> 165:     private static void assertThrowableEquals(Throwable originalIn, Throwable decodedIn) {
>> 166:         Throwable original = originalIn;
>> 167:         Throwable decoded = decodedIn;
> 
> What is the purpose of this renaming?

So that the printing down the bottom of this message shows the complete throwable, not just the cause on which the comparison failed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1670656254
PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1670654917

From yzheng at openjdk.org  Tue Jul  9 15:01:34 2024
From: yzheng at openjdk.org (Yudi Zheng)
Date: Tue, 9 Jul 2024 15:01:34 GMT
Subject: RFR: 8335553: [Graal] Compiler thread calls into
 jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
 [v2]
In-Reply-To: <vgV9ewwD3yK8VwAqF6Uuy6zFeGju_9Ubd0tPHnQakv4=.d7988221-7ac9-46fc-b88c-a2edf4e85d64@github.com>
References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
 <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com>
 <h9dpaL1Pnl8D1T4inI12kuGJN8-QmLte0VCFMJdp0Ig=.7c58fb03-a5be-40cb-85a1-52ee9943f63e@github.com>
 <vgV9ewwD3yK8VwAqF6Uuy6zFeGju_9Ubd0tPHnQakv4=.d7988221-7ac9-46fc-b88c-a2edf4e85d64@github.com>
Message-ID: <PPjzrPv0uDmVtaDPJOM0fJeBITvDsjC7_MuE1ZAOCxg=.4195a091-f5e1-461e-ad25-5270c2119d1d@github.com>

On Tue, 9 Jul 2024 14:42:42 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

>> test/jdk/jdk/internal/vm/TestTranslatedException.java line 167:
>> 
>>> 165:     private static void assertThrowableEquals(Throwable originalIn, Throwable decodedIn) {
>>> 166:         Throwable original = originalIn;
>>> 167:         Throwable decoded = decodedIn;
>> 
>> What is the purpose of this renaming?
>
> So that the printing down the bottom of this message shows the complete throwable, not just the cause on which the comparison failed.

Thanks! I missed the reassign in the folded unchanged code.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1670683400

From yzheng at openjdk.org  Tue Jul  9 15:01:33 2024
From: yzheng at openjdk.org (Yudi Zheng)
Date: Tue, 9 Jul 2024 15:01:33 GMT
Subject: RFR: 8335553: [Graal] Compiler thread calls into
 jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
 [v2]
In-Reply-To: <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com>
References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
 <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com>
Message-ID: <YtiLAGXigNYv4VlL2owOwZL0Xsi6aoayUYJxZqaZx3I=.0da95499-63b9-4279-9ea5-85e80888ba4c@github.com>

On Tue, 9 Jul 2024 13:46:46 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

>> This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions:
>> 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing).
>> 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails.
>> 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1.
>> 
>> An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue.
>> 
>> To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible.
>
> Doug Simon has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fixed TestTranslatedException

LGTM

-------------

Marked as reviewed by yzheng (Committer).

PR Review: https://git.openjdk.org/jdk/pull/20083#pullrequestreview-2166581323

From duke at openjdk.org  Tue Jul  9 16:11:59 2024
From: duke at openjdk.org (Robert Toyonaga)
Date: Tue, 9 Jul 2024 16:11:59 GMT
Subject: RFR: 8330144: Revise os::free_memory()
Message-ID: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>

### Summary
On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting.  The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. 

`os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::free_memory_without_uncommit(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory.  

**Transparent huge pages:**
`madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted.

To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `free_memory_without_uncommit`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. 

**Explicit huge pages:**
`madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter.

To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo.  After calling `free_memory_without_uncommit`, /proc/meminfo shows no change in the number of huge pages in use.  Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead.  After calling `free_memory_without_uncommit`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an improvement upon existing behavior, but at least it's not a regression either.

#### Testing

- Added the gtest: free_without_uncommit. This test is excluded for AIX and Windows since on those platforms `free_memory_without_uncommit` does nothing. Interestingly, unlike linux, [madvise(DONTNEED) on BSD](https://man.freebsd.org/cgi/man.cgi?query=madvise&sektion=2&n=1) doesn't free pages, it only lowers their priority, increasing likelihood of future page faults. So this test has special handling is for BSD.  If our intention is to have the pages freed, maybe MADV_FREE is a better choice.
- tier1

-------------

Commit messages:
 - Improve free_memory and use madvise on linux.

Changes: https://git.openjdk.org/jdk/pull/20080/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20080&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8330144
  Stats: 55 lines in 10 files changed: 35 ins; 8 del; 12 mod
  Patch: https://git.openjdk.org/jdk/pull/20080.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20080/head:pull/20080

PR: https://git.openjdk.org/jdk/pull/20080

From duke at openjdk.org  Tue Jul  9 16:11:59 2024
From: duke at openjdk.org (Robert Toyonaga)
Date: Tue, 9 Jul 2024 16:11:59 GMT
Subject: RFR: 8330144: Revise os::free_memory()
In-Reply-To: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
Message-ID: <2uIEexD6zsgJMPAzXuBT2P88TvUTfrvPT2PLr38RPnE=.75a51a49-a339-48d3-846c-cb639897b740@github.com>

On Mon, 8 Jul 2024 17:33:41 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

> ### Summary
> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting.  The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. 
> 
> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::free_memory_without_uncommit(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory.  
> 
> **Transparent huge pages:**
> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted.
> 
> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `free_memory_without_uncommit`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. 
> 
> **Explicit huge pages:**
> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter.
> 
> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo.  After calling `free_memory_without_uncommit`, /proc/meminfo shows no change in the number of huge pages in use.  Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead.  After calling `free_memory_without_uncommit`, the `os::committed_in_range` function reports that the memory is still live. Unfortu...

Failing GHA:

-[linux-x86 / test (hs/tier1 compiler part 2)](https://github.com/roberttoyonaga/jdk/actions/runs/9857950912/job/27219620860): compiler/interpreter/Test6833129.java encounters an error with exit code 1. I don't think this is related, since these changes don't relate to compilation.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20080#issuecomment-2217943553

From pchilanomate at openjdk.org  Tue Jul  9 16:38:13 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Tue, 9 Jul 2024 16:38:13 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v3]
In-Reply-To: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
Message-ID: <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com>

> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler.
> 
> I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3.
> 
> Thanks,
> Patricio

Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:

  Rename test to ThreadPollOnYield.java

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20016/files
  - new: https://git.openjdk.org/jdk/pull/20016/files/ce777598..79be1fcc

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=01-02

  Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/20016.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20016/head:pull/20016

PR: https://git.openjdk.org/jdk/pull/20016

From alanb at openjdk.org  Tue Jul  9 16:48:34 2024
From: alanb at openjdk.org (Alan Bateman)
Date: Tue, 9 Jul 2024 16:48:34 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v3]
In-Reply-To: <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
 <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com>
Message-ID: <WuesF9Q5ft_qBS-SToSKAHFbJKj_LXZkUp-bEfmoUcQ=.a0952d22-9988-45dc-82e3-e4c0cb69e250@github.com>

On Tue, 9 Jul 2024 16:38:13 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler.
>> 
>> I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3.
>> 
>> Thanks,
>> Patricio
>
> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename test to ThreadPollOnYield.java

Marked as reviewed by alanb (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20016#pullrequestreview-2166891851

From stuefe at openjdk.org  Tue Jul  9 18:29:17 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 9 Jul 2024 18:29:17 GMT
Subject: RFR: 8330144: Revise os::free_memory()
In-Reply-To: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
Message-ID: <RkdpuSUNmZ4sLShuFs-FxWivLrnc7Hd_0t5eAQspR0g=.75741bbc-6af3-42fb-acd5-1cc413060f8a@github.com>

On Mon, 8 Jul 2024 17:33:41 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

> ### Summary
> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting.  The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. 
> 
> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::free_memory_without_uncommit(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory.  
> 
> **Transparent huge pages:**
> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted.
> 
> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `free_memory_without_uncommit`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. 
> 
> **Explicit huge pages:**
> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter.
> 
> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo.  After calling `free_memory_without_uncommit`, /proc/meminfo shows no change in the number of huge pages in use.  Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead.  After calling `free_memory_without_uncommit`, the `os::committed_in_range` function reports that the memory is still live. Unfortu...

Great, thanks @roberttoyonaga. The main work was the analysis work beforehand.

About naming, I would name the thing "os::disclaim_memory". free_without_uncommit is a mouthful. There is a precedence in the "disclaim" API on AIX, which in a future RFE may be used to implement os::disclaim_memory.

test/hotspot/gtest/runtime/test_os.cpp line 988:

> 986:   const size_t size = pages * page_sz;
> 987: 
> 988:   char *base = os::reserve_memory(size, false, mtTest);

I prefer char* base (star at type) syntax, and its much more common in hotspot.

test/hotspot/gtest/runtime/test_os.cpp line 1002:

> 1000:   size_t committed_size;
> 1001:   address committed_start;
> 1002:   ASSERT_FALSE(os::committed_in_range((address) base, size, committed_start, committed_size));

Is there a chance of this generating false positives? Do we know if the madvise effect immediate or delayed?

-------------

Changes requested by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20080#pullrequestreview-2167064443
PR Review Comment: https://git.openjdk.org/jdk/pull/20080#discussion_r1670980361
PR Review Comment: https://git.openjdk.org/jdk/pull/20080#discussion_r1670985051

From stuefe at openjdk.org  Tue Jul  9 19:22:20 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 9 Jul 2024 19:22:20 GMT
Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing [v4]
In-Reply-To: <aiYvWQf9AqVWcE_I4yy5e4l1CL3pN9KjZyYEMa0t0N8=.67276cf4-29f0-41d7-8ef2-a1eb1d4dc68e@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
 <aiYvWQf9AqVWcE_I4yy5e4l1CL3pN9KjZyYEMa0t0N8=.67276cf4-29f0-41d7-8ef2-a1eb1d4dc68e@github.com>
Message-ID: <7pLl_uA6UDHCkT7qHS4czxdPaTfYBDjcdLumY0eFR00=.0f0ea68e-9962-40bd-980e-6d86ef583067@github.com>

On Thu, 4 Jul 2024 07:49:32 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> See JBS issue.
>> 
>> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure.
>> 
>> The patch:
>> - exposes os::available_memory via Whitebox
>> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException`
>> 
>> I have some misgivings about this solution, though:
>> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. 
>> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions)
>> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException.
>> 
>> Despite my doubts, I think this is the best we can come up with if we want to have such a test.
>> 
>> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice.
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   comma

Friendly ping

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19803#issuecomment-2218464968

From aturbanov at openjdk.org  Tue Jul  9 20:06:22 2024
From: aturbanov at openjdk.org (Andrey Turbanov)
Date: Tue, 9 Jul 2024 20:06:22 GMT
Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing [v4]
In-Reply-To: <aiYvWQf9AqVWcE_I4yy5e4l1CL3pN9KjZyYEMa0t0N8=.67276cf4-29f0-41d7-8ef2-a1eb1d4dc68e@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
 <aiYvWQf9AqVWcE_I4yy5e4l1CL3pN9KjZyYEMa0t0N8=.67276cf4-29f0-41d7-8ef2-a1eb1d4dc68e@github.com>
Message-ID: <179ivC-StXqp1a8UuPYS1igE8x7h36P5On75huXrLAM=.2402195a-f3e3-4a0b-a7fb-683219d7594f@github.com>

On Thu, 4 Jul 2024 07:49:32 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> See JBS issue.
>> 
>> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure.
>> 
>> The patch:
>> - exposes os::available_memory via Whitebox
>> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException`
>> 
>> I have some misgivings about this solution, though:
>> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. 
>> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions)
>> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException.
>> 
>> Despite my doubts, I think this is the best we can come up with if we want to have such a test.
>> 
>> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice.
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   comma

test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 144:

> 142:     final static long expectedMaxNonHeapRSS = M * 256;
> 143:     // How much memory we require the host to have available before even starting the test
> 144:     final static  long requiredAvailableBefore = heapsize * 2 + expectedMaxNonHeapRSS;

Suggestion:

    final static long requiredAvailableBefore = heapsize * 2 + expectedMaxNonHeapRSS;

test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 148:

> 146:     // count the low RSS as a real error - an indication for a misfunctioning pretouch, not just a low-memory
> 147:     // condition on the system.
> 148:     final static  long requiredAvailableDuring = expectedMaxNonHeapRSS;

Suggestion:

    final static long requiredAvailableDuring = expectedMaxNonHeapRSS;

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1671167332
PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1671167658

From coleenp at openjdk.org  Tue Jul  9 21:20:21 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Tue, 9 Jul 2024 21:20:21 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v3]
In-Reply-To: <5CNKzDumOf1MJQXM9OBHQh0Mj7eLv2ONio1V-AXeSJI=.54302b45-2dd2-4f18-a094-6b2c6a59517c@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <5CNKzDumOf1MJQXM9OBHQh0Mj7eLv2ONio1V-AXeSJI=.54302b45-2dd2-4f18-a094-6b2c6a59517c@github.com>
Message-ID: <-hS6aTxhzI_HzVegg0EziUtGxdq6orpF9s1rF3l2hZY=.0c4296b2-d27a-4578-a160-d17b65163655@github.com>

On Mon, 8 Jul 2024 16:21:16 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Add JVMCI symbol exports
>  - Revert "More graceful JVMCI VM option interaction"
>    
>    This reverts commit 2814350370cf142e130fe1d38610c646039f976d.

This is really great work, Axel!  I've been reading this code for a while, and have done one pass looking through the PR with a few comments.

src/hotspot/share/opto/library_call.cpp line 4620:

> 4618:       Node *unlocked_val      = _gvn.MakeConX(markWord::unlocked_value);
> 4619:       Node *chk_unlocked      = _gvn.transform(new CmpXNode(lmasked_header, unlocked_val));
> 4620:       Node *test_not_unlocked = _gvn.transform(new BoolNode(chk_unlocked, BoolTest::ne));

I don't really know what this does.  Someone from the c2 compiler group should look at this.

src/hotspot/share/runtime/arguments.cpp line 1830:

> 1828:     FLAG_SET_CMDLINE(LockingMode, LM_LIGHTWEIGHT);
> 1829:     warning("UseObjectMonitorTable requires LM_LIGHTWEIGHT");
> 1830:   }

Maybe we want this to have the opposite sense - turn off UseObjectMonitorTable if not LM_LIGHTWEIGHT?

src/hotspot/share/runtime/javaThread.inline.hpp line 258:

> 256:   }
> 257: 
> 258:   _om_cache.clear();

This could be shorter, ie:  if (UseObjectMonitorTable) _om_cache.clear();
I think the not having an assert was to make the caller unconditional, which is good.

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 393:

> 391: 
> 392: ObjectMonitor* LightweightSynchronizer::get_or_insert_monitor(oop object, JavaThread* current, const ObjectSynchronizer::InflateCause cause, bool try_read) {
> 393:   assert(LockingMode == LM_LIGHTWEIGHT, "must be");

This assert should be assert(UseObjectMonitorTable not LM_LIGHTWEIGHT).

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 732:

> 730: 
> 731:   markWord mark = object->mark();
> 732:   assert(!mark.is_unlocked(), "must be unlocked");

"must be locked" makes more sense.

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 763:

> 761:   assert(mark.has_monitor(), "must be");
> 762:   // The monitor exists
> 763:   ObjectMonitor* monitor = ObjectSynchronizer::read_monitor(current, object, mark);

This looks in the table for the monitor in UseObjectMonitorTable, but could it first check the BasicLock?  Or we got here because BasicLock.metadata was not the ObjectMonitor?

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 773:

> 771: }
> 772: 
> 773: ObjectMonitor* LightweightSynchronizer::inflate_locked_or_imse(oop obj, const ObjectSynchronizer::InflateCause cause, TRAPS) {

I figured out at one point why we now check IMSE here but now cannot remember.  Can you add a comment why above this function?

-------------

PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2167461168
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671214948
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671216649
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671220251
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671225452
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671229697
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671231155
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671231863

From dholmes at openjdk.org  Wed Jul 10 05:28:20 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 10 Jul 2024 05:28:20 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v3]
In-Reply-To: <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
 <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com>
Message-ID: <4SmCasO8fGVxb0wnRWQcMDUM63yub0jqnDbVyRr-xBs=.042f56b8-d4f1-4460-95b9-ed09df545b3e@github.com>

On Tue, 9 Jul 2024 16:38:13 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler.
>> 
>> I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3.
>> 
>> Thanks,
>> Patricio
>
> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename test to ThreadPollOnYield.java

test/jdk/java/lang/Thread/virtual/ThreadPollOnYield.java line 39:

> 37:  * @requires vm.continuations
> 38:  * @library /test/lib
> 39:  * @run junit/othervm -Xcomp -XX:-TieredCompilation -XX:CompileCommand=inline,*::yield* -XX:CompileCommand=inline,*::*Yield ThreadPollOnYield

Given this forces -Xcomp shouldn't we skip running it when compilation mode is set via jtreg flags?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1671637893

From dholmes at openjdk.org  Wed Jul 10 05:38:20 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 10 Jul 2024 05:38:20 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v4]
In-Reply-To: <SDFSzAJLVcfhnlfPyRDZTI2hiF7sLfYqbymrGe8-BUw=.1004d539-7085-4b89-81eb-0e411b960385@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
 <SDFSzAJLVcfhnlfPyRDZTI2hiF7sLfYqbymrGe8-BUw=.1004d539-7085-4b89-81eb-0e411b960385@github.com>
Message-ID: <UtACtbpQujJHrXFzh_GqeAIzPtttQEM5T48LRQhZB84=.0183e5bf-5a04-45ed-8fe1-aea9558a301c@github.com>

On Mon, 8 Jul 2024 18:14:54 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens.
>> 
>> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details).
>> 
>> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. 
>> 
>> I tested the patch by running it through mach5 tiers 1-6.
>> 
>> Thanks,
>> Patricio
>
> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   address Thomas' comments

Still good.

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20012#pullrequestreview-2168073270

From dholmes at openjdk.org  Wed Jul 10 05:51:16 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 10 Jul 2024 05:51:16 GMT
Subject: RFR: 8335553: [Graal] Compiler thread calls into
 jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
 [v2]
In-Reply-To: <rVd_Q0quLUtgmICEEtFkSzbGfPWD2_RkwX1y5cUS40w=.2fe82b2b-5b49-477a-81a5-9e39bf72a377@github.com>
References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
 <rVd_Q0quLUtgmICEEtFkSzbGfPWD2_RkwX1y5cUS40w=.2fe82b2b-5b49-477a-81a5-9e39bf72a377@github.com>
Message-ID: <euEkVDmhbZAK3bZW_b60yHwNzbwl0BWj8d-CuHFNGsQ=.587007e5-ec4d-470c-82db-50301067390f@github.com>

On Mon, 8 Jul 2024 19:09:47 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

>> Doug Simon has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fixed TestTranslatedException
>
> src/hotspot/share/utilities/exceptions.cpp line 208:
> 
>> 206:                               Handle h_loader, Handle h_protection_domain) {
>> 207:   // Check for special boot-strapping/compiler-thread handling
>> 208:   if (special_exception(thread, file, line, h_cause)) return;
> 
> This fixes a long standing bug where `special_exception` is being queried with the *cause* of the exception being thrown instead of the *name* of the exception being thrown.

I'm not so sure this is in fact a bug. If we are throwing with a cause, but we can't actually throw and so will do vm_exit, then the exception of interest is the cause not the more generic exception that would otherwise contain the cause.

Though I have to wonder why there is not an original `_throw` for the "cause" exception, that would have triggered the special_exception handling anyway?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1671652583

From dholmes at openjdk.org  Wed Jul 10 05:51:17 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 10 Jul 2024 05:51:17 GMT
Subject: RFR: 8335553: [Graal] Compiler thread calls into
 jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
 [v2]
In-Reply-To: <euEkVDmhbZAK3bZW_b60yHwNzbwl0BWj8d-CuHFNGsQ=.587007e5-ec4d-470c-82db-50301067390f@github.com>
References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
 <rVd_Q0quLUtgmICEEtFkSzbGfPWD2_RkwX1y5cUS40w=.2fe82b2b-5b49-477a-81a5-9e39bf72a377@github.com>
 <euEkVDmhbZAK3bZW_b60yHwNzbwl0BWj8d-CuHFNGsQ=.587007e5-ec4d-470c-82db-50301067390f@github.com>
Message-ID: <3rVX0mcF68BflX71dFK30ztQEn_RJp9UPrb04AS6ZJM=.c12a4765-e310-43e2-a8ab-c4c3b2628d0c@github.com>

On Wed, 10 Jul 2024 05:46:31 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> src/hotspot/share/utilities/exceptions.cpp line 208:
>> 
>>> 206:                               Handle h_loader, Handle h_protection_domain) {
>>> 207:   // Check for special boot-strapping/compiler-thread handling
>>> 208:   if (special_exception(thread, file, line, h_cause)) return;
>> 
>> This fixes a long standing bug where `special_exception` is being queried with the *cause* of the exception being thrown instead of the *name* of the exception being thrown.
>
> I'm not so sure this is in fact a bug. If we are throwing with a cause, but we can't actually throw and so will do vm_exit, then the exception of interest is the cause not the more generic exception that would otherwise contain the cause.
> 
> Though I have to wonder why there is not an original `_throw` for the "cause" exception, that would have triggered the special_exception handling anyway?

Though I see this is inconsistent with `Exceptions::_throw_msg_cause`

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1671653968

From stuefe at openjdk.org  Wed Jul 10 06:10:41 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 10 Jul 2024 06:10:41 GMT
Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is
 failing [v5]
In-Reply-To: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
References: <ipqRXRam7YQZwHjVSJSkGEuijRakCtopFe4BZzdKIOQ=.c84dabac-e588-437f-97c8-ae25370d5ee9@github.com>
Message-ID: <JWAEg-gsIOnQEC1GeQ5GFO8vZDGf4UHo5O2y7RBbFF4=.26f9da9a-3886-425f-b533-657f7929aff4@github.com>

> See JBS issue.
> 
> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure.
> 
> The patch:
> - exposes os::available_memory via Whitebox
> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException`
> 
> I have some misgivings about this solution, though:
> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. 
> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions)
> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException.
> 
> Despite my doubts, I think this is the best we can come up with if we want to have such a test.
> 
> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice.

Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision:

 - Update test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java
   
   Co-authored-by: Andrey Turbanov <turbanoff at gmail.com>
 - Update test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java
   
   Co-authored-by: Andrey Turbanov <turbanoff at gmail.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19803/files
  - new: https://git.openjdk.org/jdk/pull/19803/files/eba72ed9..109e9172

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19803&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19803&range=03-04

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/19803.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19803/head:pull/19803

PR: https://git.openjdk.org/jdk/pull/19803

From dholmes at openjdk.org  Wed Jul 10 06:23:15 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 10 Jul 2024 06:23:15 GMT
Subject: RFR: 8335553: [Graal] Compiler thread calls into
 jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
 [v2]
In-Reply-To: <3rVX0mcF68BflX71dFK30ztQEn_RJp9UPrb04AS6ZJM=.c12a4765-e310-43e2-a8ab-c4c3b2628d0c@github.com>
References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
 <rVd_Q0quLUtgmICEEtFkSzbGfPWD2_RkwX1y5cUS40w=.2fe82b2b-5b49-477a-81a5-9e39bf72a377@github.com>
 <euEkVDmhbZAK3bZW_b60yHwNzbwl0BWj8d-CuHFNGsQ=.587007e5-ec4d-470c-82db-50301067390f@github.com>
 <3rVX0mcF68BflX71dFK30ztQEn_RJp9UPrb04AS6ZJM=.c12a4765-e310-43e2-a8ab-c4c3b2628d0c@github.com>
Message-ID: <UdqI44rJgcX2ZaV-LzZm54NUA5v3NLT724p61yB_B44=.9218d574-5402-43cf-ada7-6930c0458396@github.com>

On Wed, 10 Jul 2024 05:48:23 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> I'm not so sure this is in fact a bug. If we are throwing with a cause, but we can't actually throw and so will do vm_exit, then the exception of interest is the cause not the more generic exception that would otherwise contain the cause.
>> 
>> Though I have to wonder why there is not an original `_throw` for the "cause" exception, that would have triggered the special_exception handling anyway?
>
> Though I see this is inconsistent with `Exceptions::_throw_msg_cause`

Okay I think I see how the logic works. If we were going to abort we would never reach `_throw_cause` as the initial `_throw` would have exited. But for the `!thread->can_call_Java()` case the original `_throw` would replace the intended real exception with the dummy `VM_exception()`, which is then "caught" and we try to replace with a more specific exception to be thrown via `throw_cause`, which will again replace whichever exception is requested with the dummy `VM_exception()` - so the end result is we will throw the dummy regardless of whether the cause or wrapping exception is specified. So your fix here makes sense.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1671680471

From luhenry at openjdk.org  Wed Jul 10 07:44:19 2024
From: luhenry at openjdk.org (Ludovic Henry)
Date: Wed, 10 Jul 2024 07:44:19 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
 <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com>
Message-ID: <pwP98IvP1jzN3sU1d_fa9Lkdzf4yfxHkNAmWKQsRP3w=.29bba507-a7c4-480a-88ae-28e2dbb280dd@github.com>

On Mon, 8 Jul 2024 16:40:50 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits:
>> 
>>  - Merge branch 'master' into sleef-aarch64-integrate-source
>>  - merge master
>>  - sleef 3.6.1 for riscv
>>  - sleef 3.6.1
>>  - update header files for arm
>>  - add inline header file for riscv64
>>  - remove notes about sleef changes
>>  - fix performance issue
>>  - disable unused-function warnings; add log msg
>>  - minor
>>  - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863
>
>> While I agree with you in principle, we chose to import Sleef this way for practical reasons. (The actual importing of Sleef is happening in #19185 / [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816).) The "preprocessing/code-generation" part of the Sleef build was considered too complex to reasonably replicate in the OpenJDK build system. Sleef is built using Cmake and we do not want to add a build dependency on Cmake and call out to a foreign build system at build time, for efficiency and complexity reasons.
> 
> Of course, there is no reason to rebuild the preprocessed headers every time we build the JDK. I'd never ask for that; the last thing I want is to make building the JDK slower. However, it should be possible to do so on a checked-out JDK source tree, at the builder's option.
> 
> If there is a script, it doesn't have to be included in the OpenJDK build system itself, but it does have to be in the OpenJDK source tree. (It could be part of make/devkit, for example.)
> 
> With a script to produce preprocessed files, it should be possible for anyone building the JDK to run that script, and produce the preprocessed source. SLEEF won't take up a prohibitive amount of space.
> 
> We shouldn't be depending on some other web site somewhere being able to come up with the exact SLEEF sources we used, either. That fails the test of reproducibility.
> 
>> JDK-8329816 comes with a script to automatically generate the imported source files, to make it easy to update Sleef in the future. It should also be easy enough to verify the imported contents using the same script for anyone who wants to check the validity of the import step.
> 
> I get it, but not including everything we use in the OpenJDK tree is a dangerous precedent. It should be no big deal to do this right, given that we have the SLEEF sources and the build scripts already. I'm not asking for anything that doesn't exist already, I'm just saying that it must be checked in.
> 
> Avoiding inconvenience, however great, is not sufficient to justify such a step. This is perhaps something to discuss at the next Committers' Workshop.

@theRealAph a precendent that exists is for binutils/llvm/capstone and hsdis. Would it be sufficient for the user to choose to build SLEEF from a separate source directory assuming all the dependencies are installed already (the source are checked-out by the user; cmake and other build dependencies are installed; etc.)? We would then invoke the [make/devkit/createSleef.sh](https://github.com/openjdk/jdk/pull/19185/files#diff-4fe89562540474e866588cd87ca7385b920a06bd428da013cd3d3e4b375fdd10) script on the user's SLEEF checkout to regenerate the header files. And by default, we use the header files already checked-in the OpenJDK.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2219783035

From aboldtch at openjdk.org  Wed Jul 10 09:46:13 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Wed, 10 Jul 2024 09:46:13 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v4]
In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
Message-ID: <IfTY6OjNfJHzo0dq7zQEm45B8ZcYk5aFPilG6g7oB5o=.a1491217-2db3-46d4-b803-83029f96c525@github.com>

> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
> 
> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
> 
> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
> 
> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
> 
> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
> 
> # Cleanups
> 
> Cleaned up displaced header usage for:
>   * BasicLock
>     * Contains some Zero changes
>     * Renames one exported JVMCI field
>   * ObjectMonitor
>     * Updates comments and tests consistencies
> 
> # Refactoring
> 
> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
> 
> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
> 
> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
> 
> # LightweightSynchronizer
> 
> Working on adapting and incorporating the following section as a comment in the source code
> 
> ## Fast Locking
> 
>   CAS on locking bits in markWord. 
>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
> 
>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
> 
>   If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf...

Axel Boldt-Christmas has updated the pull request incrementally with five additional commits since the last revision:

 - Add comment LightweightSynchronizer::inflate_locked_or_imse
 - Fix BasicLock::object_monitor_cache() for other platforms
 - Update LightweightSynchronizer::exit assert
 - Update LightweightSynchronizer::get_or_insert_monitor assert
 - Update JavaThread::om_clear_monitor_cache

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20067/files
  - new: https://git.openjdk.org/jdk/pull/20067/files/173b75b8..d12aa5f6

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=02-03

  Stats: 23 lines in 3 files changed: 17 ins; 2 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/20067.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067

PR: https://git.openjdk.org/jdk/pull/20067

From aboldtch at openjdk.org  Wed Jul 10 09:46:18 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Wed, 10 Jul 2024 09:46:18 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v3]
In-Reply-To: <-hS6aTxhzI_HzVegg0EziUtGxdq6orpF9s1rF3l2hZY=.0c4296b2-d27a-4578-a160-d17b65163655@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <5CNKzDumOf1MJQXM9OBHQh0Mj7eLv2ONio1V-AXeSJI=.54302b45-2dd2-4f18-a094-6b2c6a59517c@github.com>
 <-hS6aTxhzI_HzVegg0EziUtGxdq6orpF9s1rF3l2hZY=.0c4296b2-d27a-4578-a160-d17b65163655@github.com>
Message-ID: <P4vwJuFYdy9C2GugO5UgMllMPgrFZyjQkRPCW1d3NxM=.13a88311-ce1e-4be8-8b14-b48177a75960@github.com>

On Tue, 9 Jul 2024 20:44:58 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Add JVMCI symbol exports
>>  - Revert "More graceful JVMCI VM option interaction"
>>    
>>    This reverts commit 2814350370cf142e130fe1d38610c646039f976d.
>
> src/hotspot/share/runtime/arguments.cpp line 1830:
> 
>> 1828:     FLAG_SET_CMDLINE(LockingMode, LM_LIGHTWEIGHT);
>> 1829:     warning("UseObjectMonitorTable requires LM_LIGHTWEIGHT");
>> 1830:   }
> 
> Maybe we want this to have the opposite sense - turn off UseObjectMonitorTable if not LM_LIGHTWEIGHT?

Maybe. It boils down to what to do when the JVM receives `-XX:LockingMode={LM_LEGACY,LM_MONITOR} -XX:+UseObjectMonitorTable` 
The options I see are
1. Select `LockingMode=LM_LIGHTWEIGHT`
2. Select `UseObjectMonitorTable=false`
3. Do not start the VM

Between 1. and 2. it is impossible to know what the real intentions were. But with being a newer `-XX:+UseObjectMonitorTable` it somehow seems more likely.

Option 3. is probably the sensible solution, but it is hard to determine. We tend to not close the VM because of incompatible options, rather fix them. But I believe there are precedence for both. If we do this however we will have to figure out all the interactions with our testing framework. And probably add some safeguards.

> src/hotspot/share/runtime/javaThread.inline.hpp line 258:
> 
>> 256:   }
>> 257: 
>> 258:   _om_cache.clear();
> 
> This could be shorter, ie:  if (UseObjectMonitorTable) _om_cache.clear();
> I think the not having an assert was to make the caller unconditional, which is good.

Done.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 393:
> 
>> 391: 
>> 392: ObjectMonitor* LightweightSynchronizer::get_or_insert_monitor(oop object, JavaThread* current, const ObjectSynchronizer::InflateCause cause, bool try_read) {
>> 393:   assert(LockingMode == LM_LIGHTWEIGHT, "must be");
> 
> This assert should be assert(UseObjectMonitorTable not LM_LIGHTWEIGHT).

Done.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 732:
> 
>> 730: 
>> 731:   markWord mark = object->mark();
>> 732:   assert(!mark.is_unlocked(), "must be unlocked");
> 
> "must be locked" makes more sense.

Done.

> This looks in the table for the monitor in UseObjectMonitorTable, but could it first check the BasicLock?

We could. 

> Or we got here because BasicLock.metadata was not the ObjectMonitor?

That is one reason we got here. We also get here from C1/interpreter as well as if there are other threads on the entry queues. 

I think there was an assumption that it would not be that crucial in those cases.

One off the reasons we do not read the `BasicLock` cache from the runtime is that we are not as careful with keeping the `BasicLock` initialised on platforms without `UseObjectMonitorTable`. The idea was that as long as they call into the VM, we do not need to keep it invariant. 

But this made me realise `BasicLock::print_on` will be broken on non x86/aarch64 platforms if running with `UseObjectMonitorTable`. 

Rather then fix all platforms I will condition BasicLock::object_monitor_cache to return nullptr on not supported platforms. 

Could add this then. Should probably add an overload to `ObjectSynchronizer::read_monitor` which takes the lock and push i all the way here.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 773:
> 
>> 771: }
>> 772: 
>> 773: ObjectMonitor* LightweightSynchronizer::inflate_locked_or_imse(oop obj, const ObjectSynchronizer::InflateCause cause, TRAPS) {
> 
> I figured out at one point why we now check IMSE here but now cannot remember.  Can you add a comment why above this function?

Done.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671959198
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671959362
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671959515
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671959614
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671959763
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1671959852

From jkarthikeyan at openjdk.org  Wed Jul 10 20:07:04 2024
From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan)
Date: Wed, 10 Jul 2024 20:07:04 GMT
Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and
 Math.min(long,long)
In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com>
References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com>
Message-ID: <l3QGajoAAxigBK5cfIYwdGPTKfbJJJLvnSYisn7O7x8=.15bd4030-3af2-4d3a-a013-8f9c392223f1@github.com>

On Tue, 9 Jul 2024 12:07:37 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance.
> 
> Currently vectorization does not kick in for loops containing either of these calls because of the following error:
> 
> 
> VLoop::check_preconditions: failed: control flow in loop not allowed
> 
> 
> The control flow is due to the java implementation for these methods, e.g.
> 
> 
> public static long max(long a, long b) {
>     return (a >= b) ? a : b;
> }
> 
> 
> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively.
> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization.
> E.g.
> 
> 
> SuperWord::transform_loop:
>     Loop: N518/N126  counted [int,int),+4 (1025 iters)  main has_sfpt strip_mined
>  518  CountedLoop  === 518 246 126  [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21)
> 
> 
> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1):
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR
>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>                                                          1     1     0     0
> ==============================
> TEST SUCCESS
> 
> long min   1155
> long max   1173
> 
> 
> After the patch, on darwin/aarch64 (M1):
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR
>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>                                                          1     1     0     0
> ==============================
> TEST SUCCESS
> 
> long min   1042
> long max   1042
> 
> 
> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes.
> Therefore, it still relies on the macro expansion to transform those into CMoveL.
> 
> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results:
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR
>    jtreg:test/hotspot/jtreg:tier1                     2500  2500     0     0
>>> jtreg:test/jdk:tier1                     ...

The C2 changes look nice! I just added one comment here about style. It would also be good to add some IR tests checking that the intrinsic is creating `MaxL`/`MinL` nodes before macro expansion, and a microbenchmark to compare results.

src/hotspot/share/opto/library_call.cpp line 8244:

> 8242: bool LibraryCallKit::inline_long_min_max(vmIntrinsics::ID id) {
> 8243:   assert(callee()->signature()->size() == 4, "minL/maxL has 2 parameters of size 2 each.");
> 8244:   Node *a = argument(0);

Suggestion:

  Node* a = argument(0);

And the same for `b` and `n` as well.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20098#pullrequestreview-2169250610
PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1672350809

From never at openjdk.org  Wed Jul 10 20:07:46 2024
From: never at openjdk.org (Tom Rodriguez)
Date: Wed, 10 Jul 2024 20:07:46 GMT
Subject: RFR: 8335553: [Graal] Compiler thread calls into
 jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
 [v2]
In-Reply-To: <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com>
References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
 <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com>
Message-ID: <rhD6RNQA26jgX4TALJRlCPGiuF4GYBzMosX-mgBnAQs=.eaaa928e-dd4f-40f0-8696-bf3012c480ed@github.com>

On Tue, 9 Jul 2024 13:46:46 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

>> This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions:
>> 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing).
>> 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails.
>> 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1.
>> 
>> An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue.
>> 
>> To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible.
>
> Doug Simon has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fixed TestTranslatedException

looks good.

-------------

Marked as reviewed by never (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20083#pullrequestreview-2169495478

From dnsimon at openjdk.org  Wed Jul 10 20:07:48 2024
From: dnsimon at openjdk.org (Doug Simon)
Date: Wed, 10 Jul 2024 20:07:48 GMT
Subject: RFR: 8335553: [Graal] Compiler thread calls into
 jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
 [v2]
In-Reply-To: <UdqI44rJgcX2ZaV-LzZm54NUA5v3NLT724p61yB_B44=.9218d574-5402-43cf-ada7-6930c0458396@github.com>
References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
 <rVd_Q0quLUtgmICEEtFkSzbGfPWD2_RkwX1y5cUS40w=.2fe82b2b-5b49-477a-81a5-9e39bf72a377@github.com>
 <euEkVDmhbZAK3bZW_b60yHwNzbwl0BWj8d-CuHFNGsQ=.587007e5-ec4d-470c-82db-50301067390f@github.com>
 <3rVX0mcF68BflX71dFK30ztQEn_RJp9UPrb04AS6ZJM=.c12a4765-e310-43e2-a8ab-c4c3b2628d0c@github.com>
 <UdqI44rJgcX2ZaV-LzZm54NUA5v3NLT724p61yB_B44=.9218d574-5402-43cf-ada7-6930c0458396@github.com>
Message-ID: <heTvTZOQzc0I9H3RoQujcXIubmr5S9h6dSMZQYgHSCo=.c8da13bf-f724-43d3-bf2a-a91b7460e4dc@github.com>

On Wed, 10 Jul 2024 06:19:52 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Though I see this is inconsistent with `Exceptions::_throw_msg_cause`
>
> Okay I think I see how the logic works. If we were going to abort we would never reach `_throw_cause` as the initial `_throw` would have exited. But for the `!thread->can_call_Java()` case the original `_throw` would replace the intended real exception with the dummy `VM_exception()`, which is then "caught" and we try to replace with a more specific exception to be thrown via `throw_cause`, which will again replace whichever exception is requested with the dummy `VM_exception()` - so the end result is we will throw the dummy regardless of whether the cause or wrapping exception is specified. So your fix here makes sense.

Great. Would you mind approving this PR as this is the only non-JVMCI file changed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20083#discussion_r1672520461

From duke at openjdk.org  Wed Jul 10 20:09:45 2024
From: duke at openjdk.org (Robert Toyonaga)
Date: Wed, 10 Jul 2024 20:09:45 GMT
Subject: RFR: 8330144: Revise os::free_memory() [v2]
In-Reply-To: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
Message-ID: <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com>

> ### Summary
> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting.  The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. 
> 
> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory.  
> 
> **Transparent huge pages:**
> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted.
> 
> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. 
> 
> **Explicit huge pages:**
> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter.
> 
> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo.  After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use.  Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead.  After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an improvement upon existing behav...

Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision:

 - Minor cleanup and comments.
 - rename to disclaim_memory and update test

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20080/files
  - new: https://git.openjdk.org/jdk/pull/20080/files/dcf6c80f..6c9e6d5c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20080&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20080&range=00-01

  Stats: 26 lines in 10 files changed: 2 ins; 11 del; 13 mod
  Patch: https://git.openjdk.org/jdk/pull/20080.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20080/head:pull/20080

PR: https://git.openjdk.org/jdk/pull/20080

From stuefe at openjdk.org  Wed Jul 10 20:09:49 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 10 Jul 2024 20:09:49 GMT
Subject: RFR: 8330144: Revise os::free_memory() [v2]
In-Reply-To: <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com>
References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
 <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com>
Message-ID: <6K4CYd2I1hDSi8nlwA8CEWyVkoqCsJtZ_FwE1Z6ufMQ=.d0e9c909-d1d6-4767-81a6-57f7bbda170f@github.com>

On Wed, 10 Jul 2024 17:58:25 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> ### Summary
>> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting.  The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. 
>> 
>> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory.  
>> 
>> **Transparent huge pages:**
>> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted.
>> 
>> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. 
>> 
>> **Explicit huge pages:**
>> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter.
>> 
>> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo.  After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use.  Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead.  After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an imp...
>
> Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Minor cleanup and comments.
>  - rename to disclaim_memory and update test

minor nits, fine otherwise

src/hotspot/os/windows/os_windows.cpp line 3896:

> 3894: 
> 3895: void os::pd_realign_memory(char *addr, size_t bytes, size_t alignment_hint) { }
> 3896: void os::pd_disclaim_memory(char *addr, size_t bytes) { }

Give us a little comment about what this API does?

"Hints to the OS that the memory is not needed anymore and can be reclaimed by the OS; will destroy memory content; it will be re-aquired on touch, no explicit committing needed"

Something like that

test/hotspot/gtest/runtime/test_os.cpp line 44:

> 42: #include <sys/mman.h>
> 43: #endif
> 44: 

Not needed anymore

-------------

Changes requested by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20080#pullrequestreview-2169683106
PR Review Comment: https://git.openjdk.org/jdk/pull/20080#discussion_r1672614807
PR Review Comment: https://git.openjdk.org/jdk/pull/20080#discussion_r1672616393

From duke at openjdk.org  Wed Jul 10 20:09:53 2024
From: duke at openjdk.org (Robert Toyonaga)
Date: Wed, 10 Jul 2024 20:09:53 GMT
Subject: RFR: 8330144: Revise os::free_memory() [v2]
In-Reply-To: <RkdpuSUNmZ4sLShuFs-FxWivLrnc7Hd_0t5eAQspR0g=.75741bbc-6af3-42fb-acd5-1cc413060f8a@github.com>
References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
 <RkdpuSUNmZ4sLShuFs-FxWivLrnc7Hd_0t5eAQspR0g=.75741bbc-6af3-42fb-acd5-1cc413060f8a@github.com>
Message-ID: <heNwdf-AALEgj3UMZHGlj2JRbpD2ziefcLrDPpgAYUo=.ab7733d2-c323-4d28-9ddf-1568dbf6c5cb@github.com>

On Tue, 9 Jul 2024 18:26:52 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Minor cleanup and comments.
>>  - rename to disclaim_memory and update test
>
> Great, thanks @roberttoyonaga. The main work was the analysis work beforehand.
> 
> About naming, I would name the thing "os::disclaim_memory". free_without_uncommit is a mouthful. There is a precedence in the "disclaim" API on AIX, which in a future RFE may be used to implement os::disclaim_memory.

Thank you @tstuefe for the review feedback! I've renamed `free_memory_without_uncommit` to `disclaim_memory` and removed the `committed_in_range` check so the unit test can be more reliable.

> src/hotspot/os/windows/os_windows.cpp line 3896:
> 
>> 3894: 
>> 3895: void os::pd_realign_memory(char *addr, size_t bytes, size_t alignment_hint) { }
>> 3896: void os::pd_disclaim_memory(char *addr, size_t bytes) { }
> 
> Give us a little comment about what this API does?
> 
> "Hints to the OS that the memory is not needed anymore and can be reclaimed by the OS; will destroy memory content; it will be re-aquired on touch, no explicit committing needed"
> 
> Something like that

Ok I've added a comment with a description. 

Is it good practice to add these types of descriptions in the shared code header files (os.hpp), in the platform dependent code (os_linux.hpp), or both? I see some examples of all 3 cases, but I'm wondering if there's a best practice.

> test/hotspot/gtest/runtime/test_os.cpp line 1002:
> 
>> 1000:   size_t committed_size;
>> 1001:   address committed_start;
>> 1002:   ASSERT_FALSE(os::committed_in_range((address) base, size, committed_start, committed_size));
> 
> Is there a chance of this generating false positives? Do we know if the madvise effect immediate or delayed?

That's a good point. Based on the linux [docs](https://man7.org/linux/man-pages/man2/madvise.2.html) it might not happen immediately, causing the test to be flaky. I'll remove the `committed_in_range` check.  I suppose we could poll with a timeout, but there's still no guarantee the pages actually get freed in a timely manner.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20080#issuecomment-2220609342
PR Review Comment: https://git.openjdk.org/jdk/pull/20080#discussion_r1672735056
PR Review Comment: https://git.openjdk.org/jdk/pull/20080#discussion_r1672249380

From aboldtch at openjdk.org  Wed Jul 10 20:10:07 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Wed, 10 Jul 2024 20:10:07 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v5]
In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
Message-ID: <nho0iQHJu__oLvxJF3oE1qBlFiSvUoZ6dLEIc139KqA=.5ba0e931-a6c4-443d-b9d7-715da000d045@github.com>

> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
> 
> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
> 
> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
> 
> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
> 
> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
> 
> # Cleanups
> 
> Cleaned up displaced header usage for:
>   * BasicLock
>     * Contains some Zero changes
>     * Renames one exported JVMCI field
>   * ObjectMonitor
>     * Updates comments and tests consistencies
> 
> # Refactoring
> 
> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
> 
> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
> 
> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
> 
> # LightweightSynchronizer
> 
> Working on adapting and incorporating the following section as a comment in the source code
> 
> ## Fast Locking
> 
>   CAS on locking bits in markWord. 
>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
> 
>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
> 
>   If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf...

Axel Boldt-Christmas has updated the pull request incrementally with four additional commits since the last revision:

 - Add extra comments in LightweightSynchronizer::inflate_fast_locked_object
 - Fix typos
 - Remove unused variable
 - Add missing inline qualifiers

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20067/files
  - new: https://git.openjdk.org/jdk/pull/20067/files/d12aa5f6..a207544b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=03-04

  Stats: 16 lines in 3 files changed: 8 ins; 0 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/20067.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067

PR: https://git.openjdk.org/jdk/pull/20067

From liach at openjdk.org  Wed Jul 10 20:11:11 2024
From: liach at openjdk.org (Chen Liang)
Date: Wed, 10 Jul 2024 20:11:11 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
Message-ID: <-jOMegvM_uFyEogeqPY8GwECPw70jvpiPZsabUMXB30=.976616cc-023e-4559-ad31-bebad0f92982@github.com>

On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> See bug for more discussion.
> 
> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
> 
> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
> 
> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
> 
> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
> 
> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
> 
> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
> 
> Additional testing:
>  - [x] New IR tests
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, `all`

Can the barrier issue be bypassed with this pattern:

private @Stable Value field;

Value getter() {
    var local = field; // avoid double read
    if (local == null)
        local = computeAndSet(); // avoid double read, no fence here in getter
    return local;
}

private Value computeAndSet() {
    var result = ... // compute value
    field = result; // write must be here, or barrier will be in getter
    return result; // to avoid double read
}


And since you are still inserting barriers the same way constructor barriers are inserted, can I say that such a more usual pattern:

private @Stable Value field;

Value getter() {
    var local = field; // avoid double read
    if (local == null)
        local = field = ... // inserts StoreStore after
    return local;
}

will still suffer from the regression observed in https://github.com/openjdk/jdk/pull/19433#discussion_r1619053915, or is that completely fixed?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2221087037

From rehn at openjdk.org  Wed Jul 10 20:12:07 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Wed, 10 Jul 2024 20:12:07 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v23]
In-Reply-To: <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com>
Message-ID: <oMF9O0P5E3HDqMFBYXEjS6Vg1AErulnyVarmMrTBGSk=.38ecb986-596b-4f28-8e3b-b5dd9c18998e@github.com>

On Thu, 4 Jul 2024 14:48:36 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hi all, please consider!
>> 
>> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
>> Using a very small application or running very short time we have fast patchable calls.
>> But any normal application running longer will increase the code size and code chrun/fragmentation.
>> So whatever or not you get hot fast calls rely on luck.
>> 
>> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
>> This would be the common case for a patchable call.
>> 
>> Code stream:
>> JAL <trampo>
>> Stubs:
>> AUIPC
>> LD
>> JALR
>> <DEST>
>> 
>> 
>> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
>> Even if you don't have that problem having a call to a jump is not the fastest way.
>> Loading the address avoids the pitsfalls of cmodx.
>> 
>> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
>> and instead do by default:
>> 
>> Code stream:
>> AUIPC
>> LD
>> JALR
>> Stubs:
>> <DEST>
>> 
>> An experimental option for turning trampolines back on exists.
>> 
>> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
>> 
>> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
>> 
>> fop                                        (msec)    2239       |  2128       =  0.950424
>> h2                                         (msec)    18660      |  16594      =  0.889282
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> luindex                                    (msec)    2866       |  2842       =  0.991626
>> lusearch                                   (msec)    4108       |  4311       =  1.04942
>> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
>> pmd                                        (msec)    5976       |  5897       =  0.98678
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> Avg:                                       0.974112                              
>> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
>> h2(xcomp) ...
>
> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   _ld to ld

I have not seen (new) issues in testing. I would have prefered one or two more reviewers, but since RV is not the biggest platform I'll settle with just passing the bar.
I'll go ahead and integrate if @RealFYang and @Hamlin-Li re-reviews (as the new rules are in-effect which require latest rev to be reviewed).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2220282126

From fyang at openjdk.org  Wed Jul 10 20:12:07 2024
From: fyang at openjdk.org (Fei Yang)
Date: Wed, 10 Jul 2024 20:12:07 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v23]
In-Reply-To: <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com>
Message-ID: <gBaz5XlGA4DywDyB2NIlCqY4A1zbkN5y7zhXTvEgFbM=.9fd83e49-0a5a-4530-a87d-321e05b66016@github.com>

On Thu, 4 Jul 2024 14:48:36 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hi all, please consider!
>> 
>> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
>> Using a very small application or running very short time we have fast patchable calls.
>> But any normal application running longer will increase the code size and code chrun/fragmentation.
>> So whatever or not you get hot fast calls rely on luck.
>> 
>> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
>> This would be the common case for a patchable call.
>> 
>> Code stream:
>> JAL <trampo>
>> Stubs:
>> AUIPC
>> LD
>> JALR
>> <DEST>
>> 
>> 
>> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
>> Even if you don't have that problem having a call to a jump is not the fastest way.
>> Loading the address avoids the pitsfalls of cmodx.
>> 
>> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
>> and instead do by default:
>> 
>> Code stream:
>> AUIPC
>> LD
>> JALR
>> Stubs:
>> <DEST>
>> 
>> An experimental option for turning trampolines back on exists.
>> 
>> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
>> 
>> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
>> 
>> fop                                        (msec)    2239       |  2128       =  0.950424
>> h2                                         (msec)    18660      |  16594      =  0.889282
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> luindex                                    (msec)    2866       |  2842       =  0.991626
>> lusearch                                   (msec)    4108       |  4311       =  1.04942
>> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
>> pmd                                        (msec)    5976       |  5897       =  0.98678
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> Avg:                                       0.974112                              
>> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
>> h2(xcomp) ...
>
> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   _ld to ld

Also performed tier1-3 and hotspot:tier4 on my unmatched boards. Result looks fine.
Just witnessed several unnecessary uses of namespace `Assembler`. Guess you might want to clean it up? Still good otherwise.


diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
index b39ac79be6b..e349eab3177 100644
--- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
@@ -983,9 +983,9 @@ void MacroAssembler::load_link_jump(const address source, Register temp) {
   assert_cond(source != nullptr);
   int64_t distance = source - pc();
   assert(is_simm32(distance), "Must be");
-  Assembler::auipc(temp, (int32_t)distance + 0x800);
-  Assembler::ld(temp, temp, ((int32_t)distance << 20) >> 20);
-  Assembler::jalr(x1, temp, 0);
+  auipc(temp, (int32_t)distance + 0x800);
+  ld(temp, Address(temp, ((int32_t)distance << 20) >> 20));
+  jalr(temp);
 }

 void MacroAssembler::jump_link(const address dest, Register temp) {
@@ -994,7 +994,7 @@ void MacroAssembler::jump_link(const address dest, Register temp) {
   int64_t distance = dest - pc();
   assert(is_simm21(distance), "Must be");
   assert((distance % 2) == 0, "Must be");
-  Assembler::jal(x1, distance);
+  jal(x1, distance);
 }

 void MacroAssembler::j(const address dest, Register temp) {

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2220677274

From aph at openjdk.org  Wed Jul 10 20:14:06 2024
From: aph at openjdk.org (Andrew Haley)
Date: Wed, 10 Jul 2024 20:14:06 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
 <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com>
Message-ID: <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com>

On Mon, 8 Jul 2024 16:40:50 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits:
>> 
>>  - Merge branch 'master' into sleef-aarch64-integrate-source
>>  - merge master
>>  - sleef 3.6.1 for riscv
>>  - sleef 3.6.1
>>  - update header files for arm
>>  - add inline header file for riscv64
>>  - remove notes about sleef changes
>>  - fix performance issue
>>  - disable unused-function warnings; add log msg
>>  - minor
>>  - ... and 23 more: https://git.openjdk.org/jdk/compare/2f4f6cc3...b54fc863
>
>> While I agree with you in principle, we chose to import Sleef this way for practical reasons. (The actual importing of Sleef is happening in #19185 / [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816).) The "preprocessing/code-generation" part of the Sleef build was considered too complex to reasonably replicate in the OpenJDK build system. Sleef is built using Cmake and we do not want to add a build dependency on Cmake and call out to a foreign build system at build time, for efficiency and complexity reasons.
> 
> Of course, there is no reason to rebuild the preprocessed headers every time we build the JDK. I'd never ask for that; the last thing I want is to make building the JDK slower. However, it should be possible to do so on a checked-out JDK source tree, at the builder's option.
> 
> If there is a script, it doesn't have to be included in the OpenJDK build system itself, but it does have to be in the OpenJDK source tree. (It could be part of make/devkit, for example.)
> 
> With a script to produce preprocessed files, it should be possible for anyone building the JDK to run that script, and produce the preprocessed source. SLEEF won't take up a prohibitive amount of space.
> 
> We shouldn't be depending on some other web site somewhere being able to come up with the exact SLEEF sources we used, either. That fails the test of reproducibility.
> 
>> JDK-8329816 comes with a script to automatically generate the imported source files, to make it easy to update Sleef in the future. It should also be easy enough to verify the imported contents using the same script for anyone who wants to check the validity of the import step.
> 
> I get it, but not including everything we use in the OpenJDK tree is a dangerous precedent. It should be no big deal to do this right, given that we have the SLEEF sources and the build scripts already. I'm not asking for anything that doesn't exist already, I'm just saying that it must be checked in.
> 
> Avoiding inconvenience, however great, is not sufficient to justify such a step. This is perhaps something to discuss at the next Committers' Workshop.

> @theRealAph a precendent that exists is for binutils/llvm/capstone and hsdis. Would it be sufficient for the user to choose to build SLEEF from a separate source directory assuming all the dependencies are installed already (the source are checked-out by the user; cmake and other build dependencies are installed; etc.)? 

I believe that it's those who want to deviate from the standard best practice of providing source code in its preferred form who must come up with a compelling argument why it is necessary.

I can't tell what problem we're trying to solve by not simply checking in the source code, in its preferred form, to the OpenJDK tree. Thhis has practical advantages to do with traceability and security, and in-principle reasons to do with basic Open Source practice too. On the other side, there are no disadvantages.

We've been here before, and the response from @PaulSandoz to a similar case (checking in compiler-generated asm) was:

> I don?t think this should be considered a generally acceptable approach for Vector API operations (most code for operations does not and should not follow this approach), nor is it generally acceptable for other kinds of intrinsic in HotSpot (I believe there are a few special cases under os_cpu).

https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2021-May/047094.html

Having said that, the problem in that case was much worse, in that the corresponding source code was not available at all.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2220180978

From mli at openjdk.org  Wed Jul 10 20:14:06 2024
From: mli at openjdk.org (Hamlin Li)
Date: Wed, 10 Jul 2024 20:14:06 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <TcrB6zIH-yx-6fyLfnQy4NHk5w8VqXm3anTAxbQJtXY=.8181016f-5d4d-4349-a8d7-343db9817f40@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
 <TcrB6zIH-yx-6fyLfnQy4NHk5w8VqXm3anTAxbQJtXY=.8181016f-5d4d-4349-a8d7-343db9817f40@github.com>
Message-ID: <QyiEmIS1_Pev-mjPz602JVskO6NUBdr1qwKolAmBpFo=.a672265f-3029-4b8e-bd8f-adcd42899a31@github.com>

On Mon, 8 Jul 2024 16:20:40 GMT, Andrew Haley <aph at openjdk.org> wrote:

> I finally did some measurements. 

Thanks for testing it!

> It would be nice if the JMH test were part of this patch.

OK, I can do that later.

> 
> It mostly looks good, but I can see an odd regression of DoubleMaxVector.TANH (by 39%) on Apple M1. I don't really know why this is, given that tanh(x) is almost certainly based on expm1(x). This probably isn't important, but it is odd.

Yes, it has some regression in TANH, I have modified the code to skip TANH (https://github.com/openjdk/jdk/pull/18605/commits/6061c25de00423f2c92c08ce40af4815c0fa3933)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2220231384

From pchilanomate at openjdk.org  Wed Jul 10 20:17:31 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Wed, 10 Jul 2024 20:17:31 GMT
Subject: RFR: 8335409: Can't allocate and retain memory from resource area
 in frame::oops_interpreted_do oop closure after 8329665 [v4]
In-Reply-To: <UtACtbpQujJHrXFzh_GqeAIzPtttQEM5T48LRQhZB84=.0183e5bf-5a04-45ed-8fe1-aea9558a301c@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
 <SDFSzAJLVcfhnlfPyRDZTI2hiF7sLfYqbymrGe8-BUw=.1004d539-7085-4b89-81eb-0e411b960385@github.com>
 <UtACtbpQujJHrXFzh_GqeAIzPtttQEM5T48LRQhZB84=.0183e5bf-5a04-45ed-8fe1-aea9558a301c@github.com>
Message-ID: <Y5wE-fbEaWcKgFRtjfG-GxwlmkUIxYhEoCjg3RdKCkw=.b3a8a340-a9aa-454e-b1e3-25506dd3b618@github.com>

On Wed, 10 Jul 2024 05:35:40 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   address Thomas' comments
>
> Still good.

Thanks for the reviews @dholmes-ora, @coleenp, @shipilev and @tstuefe!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20012#issuecomment-2220963716

From pchilanomate at openjdk.org  Wed Jul 10 20:17:32 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Wed, 10 Jul 2024 20:17:32 GMT
Subject: Integrated: 8335409: Can't allocate and retain memory from resource
 area in frame::oops_interpreted_do oop closure after 8329665
In-Reply-To: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
References: <6VmOqQJ-XTqstwhxY2YIP_zXpsicPqC1jczOzhkOhzc=.b7f48933-b3bc-4c80-9466-2d78cd9cdfb2@github.com>
Message-ID: <nmx8GK5dgR3wmGoPDk-HxquxG_yOySOvwi2lhfYJz5g=.b271ab2d-fa96-4796-8dea-16be764bc42a@github.com>

On Wed, 3 Jul 2024 16:24:20 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

> The ResourceMark added in 8329665 to address the case of having to allocate extra memory for the _bit_mask, prevents code in the closure from allocating and retaining memory from the resource area across the closure, relying on some ResourceMark in scope further up the stack from frame::oops_interpreted_do(). There is in fact one case today in JFR code where this kind of allocation happens.
> 
> The amount of locals and expression stack entries a method can have before having to allocate extra memory for the _bit_mask is 4*64/2 = 128. This is already big enough that we almost never have to allocate. A test run through mach5 tiers1-6 shows only a handful of methods that fall into this case, and most are artificial ones created to trigger this condition. So moving the allocation to the C heap shouldn't have any performance penalty as the comment otherwise says. This comment dates back from 2002 where instead of 128 entries we could have only 32, considering 32 bits cpus as still in main use (see bug for more history details).
> 
> The current code in InterpreterOopMap::resource_copy() has a comment expecting the InterpreterOopMap object to be recently created and empty, but it also has an assert in the allocation case path where it considers the entry might be in use already. This assert actually looks wrong since a used InterpreterOopMap object will not necessarily contain a pointer to resource area memory in _bit_mask[0]. I added an example case in the bug details. In any case, since we don't have any such cases in the codebase I added an explicit assert to verify each InterpreterOopMap is only used one. 
> 
> I tested the patch by running it through mach5 tiers 1-6.
> 
> Thanks,
> Patricio

This pull request has now been integrated.

Changeset: 7ab96c74
Author:    Patricio Chilano Mateo <pchilanomate at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/7ab96c74e2c39f430a5c2f65a981da7314a2385b
Stats:     55 lines in 3 files changed: 6 ins; 20 del; 29 mod

8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665

Reviewed-by: dholmes, stuefe, coleenp, shade

-------------

PR: https://git.openjdk.org/jdk/pull/20012

From ayang at openjdk.org  Wed Jul 10 20:29:37 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Wed, 10 Jul 2024 20:29:37 GMT
Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and
 VM_ParallelGCSystemGC [v3]
In-Reply-To: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
Message-ID: <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com>

> Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity.
> 
> The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts.
> 
> Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser.

Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - review
 - Merge branch 'master' into pgc-vm-operation
 - pgc-vm-operation

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20077/files
  - new: https://git.openjdk.org/jdk/pull/20077/files/a7c69102..1d10dd5b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20077&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20077&range=01-02

  Stats: 1388 lines in 122 files changed: 508 ins; 309 del; 571 mod
  Patch: https://git.openjdk.org/jdk/pull/20077.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20077/head:pull/20077

PR: https://git.openjdk.org/jdk/pull/20077

From zgu at openjdk.org  Wed Jul 10 20:29:40 2024
From: zgu at openjdk.org (Zhengyu Gu)
Date: Wed, 10 Jul 2024 20:29:40 GMT
Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and
 VM_ParallelGCSystemGC [v2]
In-Reply-To: <N4uBvRzIP52a4DgIeIx3ArjKPF0JrTI2bVsmHtD0rJg=.f7e1bb49-9bcd-420c-97fb-2617c798b5b7@github.com>
References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
 <N4uBvRzIP52a4DgIeIx3ArjKPF0JrTI2bVsmHtD0rJg=.f7e1bb49-9bcd-420c-97fb-2617c798b5b7@github.com>
Message-ID: <BpWx1BDGYxzS7d-mUGi1KcIUsD9sScds0q-Gu_nV1R4=.8dd4d293-6ffd-472f-9d78-39ad1d97f446@github.com>

On Mon, 8 Jul 2024 16:31:43 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity.
>> 
>> The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts.
>> 
>> Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser.
>
> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit:
> 
>   pgc-vm-operation

I really like this refactor, that brings parallel close to other GCs. Just a few nits, otherwise, LGTM

src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 273:

> 271: 
> 272:   bool is_tlab = false;
> 273:   return mem_allocate_work(size, is_tlab, gc_overhead_limit_was_exceeded);

Suggest: `return mem_allocate_work(size, false /* is_tlab */, gc_overhead_limit_was_exceeded);`

src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 478:

> 476: 
> 477:     const bool clear_all_soft_refs = true;
> 478:     do_full_collection_no_gc_locker(clear_all_soft_refs);

Suggest:  
not define 
`const bool clear_all_soft_refs = true;` 
and do
`do_full_collection_no_gc_locker(true /* clear_all_soft_refs */);` instead

src/hotspot/share/gc/parallel/psVMOperations.cpp line 68:

> 66: 
> 67:   GCCauseSetter gccs(heap, _gc_cause);
> 68:   heap->try_collect_at_safepoint(is_cause_full(_gc_cause));

can be simplified to `heap->try_collect_at_safepoint(_full);`

-------------

Marked as reviewed by zgu (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20077#pullrequestreview-2166570482
PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1670678592
PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1671439533
PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1672170373

From rehn at openjdk.org  Wed Jul 10 20:31:27 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Wed, 10 Jul 2024 20:31:27 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v23]
In-Reply-To: <gBaz5XlGA4DywDyB2NIlCqY4A1zbkN5y7zhXTvEgFbM=.9fd83e49-0a5a-4530-a87d-321e05b66016@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com>
 <gBaz5XlGA4DywDyB2NIlCqY4A1zbkN5y7zhXTvEgFbM=.9fd83e49-0a5a-4530-a87d-321e05b66016@github.com>
Message-ID: <5bnFsKi9By23yhgIvs-Kzx4KghK4lOaqE7a9igZyZnM=.4fc8a62b-1373-48ef-8c24-484933fea402@github.com>

On Wed, 10 Jul 2024 14:31:23 GMT, Fei Yang <fyang at openjdk.org> wrote:

> Also performed tier1-3 and hotspot:tier4 on my unmatched boards. Result looks fine. Just witnessed several unnecessary uses of namespace `Assembler`. Guess you might want to clean it up? Still good otherwise.

Thanks, fixed!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2221370260

From pchilanomate at openjdk.org  Wed Jul 10 20:31:43 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Wed, 10 Jul 2024 20:31:43 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v3]
In-Reply-To: <4SmCasO8fGVxb0wnRWQcMDUM63yub0jqnDbVyRr-xBs=.042f56b8-d4f1-4460-95b9-ed09df545b3e@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
 <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com>
 <4SmCasO8fGVxb0wnRWQcMDUM63yub0jqnDbVyRr-xBs=.042f56b8-d4f1-4460-95b9-ed09df545b3e@github.com>
Message-ID: <RWb7Mt_BMrYVBR3UwJvh7tRR504wpP0RNwvfC5H1R4E=.440e6564-74fb-4758-a4ad-6d2938243893@github.com>

On Wed, 10 Jul 2024 05:25:48 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Rename test to ThreadPollOnYield.java
>
> test/jdk/java/lang/Thread/virtual/ThreadPollOnYield.java line 39:
> 
>> 37:  * @requires vm.continuations
>> 38:  * @library /test/lib
>> 39:  * @run junit/othervm -Xcomp -XX:-TieredCompilation -XX:CompileCommand=inline,*::yield* -XX:CompileCommand=inline,*::*Yield ThreadPollOnYield
> 
> Given this forces -Xcomp shouldn't we skip running it when compilation mode is set via jtreg flags?

The test should never fail even with external flags, so if anything it's just extra testing. But I can add vm.flagless if you prefer.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1672931556

From rehn at openjdk.org  Wed Jul 10 20:31:27 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Wed, 10 Jul 2024 20:31:27 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v24]
In-Reply-To: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
Message-ID: <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com>

> Hi all, please consider!
> 
> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
> Using a very small application or running very short time we have fast patchable calls.
> But any normal application running longer will increase the code size and code chrun/fragmentation.
> So whatever or not you get hot fast calls rely on luck.
> 
> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
> This would be the common case for a patchable call.
> 
> Code stream:
> JAL <trampo>
> Stubs:
> AUIPC
> LD
> JALR
> <DEST>
> 
> 
> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
> Even if you don't have that problem having a call to a jump is not the fastest way.
> Loading the address avoids the pitsfalls of cmodx.
> 
> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
> and instead do by default:
> 
> Code stream:
> AUIPC
> LD
> JALR
> Stubs:
> <DEST>
> 
> An experimental option for turning trampolines back on exists.
> 
> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
> 
> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
> 
> fop                                        (msec)    2239       |  2128       =  0.950424
> h2                                         (msec)    18660      |  16594      =  0.889282
> jython                                     (msec)    22022      |  21925      =  0.995595
> luindex                                    (msec)    2866       |  2842       =  0.991626
> lusearch                                   (msec)    4108       |  4311       =  1.04942
> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
> pmd                                        (msec)    5976       |  5897       =  0.98678
> jython                                     (msec)    22022      |  21925      =  0.995595
> Avg:                                       0.974112                              
> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
> h2(xcomp)                                  (msec)    37719      |  38004      =  1.00756
> jython(xcomp)        ...

Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits:

 - Skip qualify ins
 - Merge branch 'master' into 8332689
 - _ld to ld
 - Merge branch 'master' into 8332689
 - Rename to reloc_call
 - Merge branch 'master' into 8332689
 - Rename lc
 - Merge branch 'master' into 8332689
 - Merge branch 'master' into 8332689
 - Comments
 - ... and 24 more: https://git.openjdk.org/jdk/compare/242f1133...242c3790

-------------

Changes: https://git.openjdk.org/jdk/pull/19453/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=23
  Stats: 897 lines in 16 files changed: 622 ins; 177 del; 98 mod
  Patch: https://git.openjdk.org/jdk/pull/19453.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453

PR: https://git.openjdk.org/jdk/pull/19453

From mli at openjdk.org  Wed Jul 10 20:42:06 2024
From: mli at openjdk.org (Hamlin Li)
Date: Wed, 10 Jul 2024 20:42:06 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v23]
In-Reply-To: <gBaz5XlGA4DywDyB2NIlCqY4A1zbkN5y7zhXTvEgFbM=.9fd83e49-0a5a-4530-a87d-321e05b66016@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com>
 <gBaz5XlGA4DywDyB2NIlCqY4A1zbkN5y7zhXTvEgFbM=.9fd83e49-0a5a-4530-a87d-321e05b66016@github.com>
Message-ID: <SN7gZ_XJWn2jG_DXGmzHWqVfV1xz_vG-BTkotAbuzkM=.c48958e0-a982-4e38-bb0c-fac37d4de7f1@github.com>

On Wed, 10 Jul 2024 14:31:23 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   _ld to ld
>
> Also performed tier1-3 and hotspot:tier4 on my unmatched boards. Result looks fine.
> Just witnessed several unnecessary uses of namespace `Assembler`. Guess you might want to clean it up? Still good otherwise.
> 
> 
> diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
> index b39ac79be6b..e349eab3177 100644
> --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
> +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
> @@ -983,9 +983,9 @@ void MacroAssembler::load_link_jump(const address source, Register temp) {
>    assert_cond(source != nullptr);
>    int64_t distance = source - pc();
>    assert(is_simm32(distance), "Must be");
> -  Assembler::auipc(temp, (int32_t)distance + 0x800);
> -  Assembler::ld(temp, temp, ((int32_t)distance << 20) >> 20);
> -  Assembler::jalr(x1, temp, 0);
> +  auipc(temp, (int32_t)distance + 0x800);
> +  ld(temp, Address(temp, ((int32_t)distance << 20) >> 20));
> +  jalr(temp);
>  }
> 
>  void MacroAssembler::jump_link(const address dest, Register temp) {
> @@ -994,7 +994,7 @@ void MacroAssembler::jump_link(const address dest, Register temp) {
>    int64_t distance = dest - pc();
>    assert(is_simm21(distance), "Must be");
>    assert((distance % 2) == 0, "Must be");
> -  Assembler::jal(x1, distance);
> +  jal(x1, distance);
>  }
> 
>  void MacroAssembler::j(const address dest, Register temp) {

> I have not seen (new) issues in testing. I would have prefered one or two more reviewers, but since RV is not the biggest platform I'll settle with just passing the bar. I'll go ahead and integrate if @RealFYang and @Hamlin-Li re-reviews (as the new rules are in-effect which require latest rev to be reviewed).

Still good to me. Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2221406698

From pchilanomate at openjdk.org  Wed Jul 10 21:12:13 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Wed, 10 Jul 2024 21:12:13 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v4]
In-Reply-To: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
Message-ID: <3LQOJzJSDdWhZk3Xkg7WuWavMrXEbYnWZ1mnYfkGllQ=.0dbfb507-7531-4154-9429-98783fffbf38@github.com>

> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler.
> 
> I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3.
> 
> Thanks,
> Patricio

Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:

  add new line at end

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20016/files
  - new: https://git.openjdk.org/jdk/pull/20016/files/79be1fcc..1cf425dd

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=02-03

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20016.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20016/head:pull/20016

PR: https://git.openjdk.org/jdk/pull/20016

From dlong at openjdk.org  Wed Jul 10 23:37:54 2024
From: dlong at openjdk.org (Dean Long)
Date: Wed, 10 Jul 2024 23:37:54 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
Message-ID: <tyXcVLFV6t00-jDYMMHuaRPfScHuvsuGbPsJaCO1ALc=.c002e92d-92aa-4030-a320-a0dcfe7dc5d1@github.com>

On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> See bug for more discussion.
> 
> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
> 
> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
> 
> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
> 
> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
> 
> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
> 
> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
> 
> Additional testing:
>  - [x] New IR tests
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, `all`

Do we still need separate _wrote_stable and _wrote_final flags, or could we combine them into _wrote_stable_or_final?
Then we are almost back to pre-8031818, when _wrote_final was overloaded to mean write to final or stable field.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2221706861

From liach at openjdk.org  Wed Jul 10 23:41:55 2024
From: liach at openjdk.org (Chen Liang)
Date: Wed, 10 Jul 2024 23:41:55 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
Message-ID: <Dx1u2Oy8ouhWC67EP_R94bPruoOh9bKgYkJx1C4_Yjw=.790a9fe9-0480-471f-b815-7f01ad226e37@github.com>

On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> See bug for more discussion.
> 
> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
> 
> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
> 
> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
> 
> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
> 
> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
> 
> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
> 
> Additional testing:
>  - [x] New IR tests
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, `all`

If we merge the stable and final flags, won't this:

Value getter() {
    var local = field;
    if (local == null)
        local = field = ... // makes the getter final-writing
    return local;
}

be regarded the same as any final-writing constructor?

Then every call to `getter()` is fenced, and the issue in #19433 isn't solved at all.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2221711629

From fyang at openjdk.org  Thu Jul 11 01:56:03 2024
From: fyang at openjdk.org (Fei Yang)
Date: Thu, 11 Jul 2024 01:56:03 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v24]
In-Reply-To: <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com>
Message-ID: <2z3xsgnAt9LvOSV173_Q9N2xsH889wguiWNcdeuw1Ow=.80214702-2567-4e09-bf84-932a25361e5d@github.com>

On Wed, 10 Jul 2024 20:31:27 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hi all, please consider!
>> 
>> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
>> Using a very small application or running very short time we have fast patchable calls.
>> But any normal application running longer will increase the code size and code chrun/fragmentation.
>> So whatever or not you get hot fast calls rely on luck.
>> 
>> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
>> This would be the common case for a patchable call.
>> 
>> Code stream:
>> JAL <trampo>
>> Stubs:
>> AUIPC
>> LD
>> JALR
>> <DEST>
>> 
>> 
>> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
>> Even if you don't have that problem having a call to a jump is not the fastest way.
>> Loading the address avoids the pitsfalls of cmodx.
>> 
>> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
>> and instead do by default:
>> 
>> Code stream:
>> AUIPC
>> LD
>> JALR
>> Stubs:
>> <DEST>
>> 
>> An experimental option for turning trampolines back on exists.
>> 
>> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
>> 
>> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
>> 
>> fop                                        (msec)    2239       |  2128       =  0.950424
>> h2                                         (msec)    18660      |  16594      =  0.889282
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> luindex                                    (msec)    2866       |  2842       =  0.991626
>> lusearch                                   (msec)    4108       |  4311       =  1.04942
>> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
>> pmd                                        (msec)    5976       |  5897       =  0.98678
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> Avg:                                       0.974112                              
>> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
>> h2(xcomp) ...
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits:
> 
>  - Skip qualify ins
>  - Merge branch 'master' into 8332689
>  - _ld to ld
>  - Merge branch 'master' into 8332689
>  - Rename to reloc_call
>  - Merge branch 'master' into 8332689
>  - Rename lc
>  - Merge branch 'master' into 8332689
>  - Merge branch 'master' into 8332689
>  - Comments
>  - ... and 24 more: https://git.openjdk.org/jdk/compare/242f1133...242c3790

Looks fine now. Let's ship it :-)

-------------

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19453#pullrequestreview-2170722290

From dlong at openjdk.org  Thu Jul 11 02:32:56 2024
From: dlong at openjdk.org (Dean Long)
Date: Thu, 11 Jul 2024 02:32:56 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <Dx1u2Oy8ouhWC67EP_R94bPruoOh9bKgYkJx1C4_Yjw=.790a9fe9-0480-471f-b815-7f01ad226e37@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
 <Dx1u2Oy8ouhWC67EP_R94bPruoOh9bKgYkJx1C4_Yjw=.790a9fe9-0480-471f-b815-7f01ad226e37@github.com>
Message-ID: <xPleoGsDuUPV2s-kxPiWWcqL-_xIpBdoqBy22fibr-c=.4fef483e-0603-4143-bf2a-43bbd321f69f@github.com>

On Wed, 10 Jul 2024 23:39:02 GMT, Chen Liang <liach at openjdk.org> wrote:

>> See bug for more discussion.
>> 
>> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
>> 
>> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
>> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
>> 
>> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
>> 
>> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
>> 
>> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
>> 
>> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
>> 
>> Additional testing:
>>  - [x] New IR tests
>>  - [x] Linux x86_64 server fastdebug, `all`
>>  - [x] Linux AArch64 server fastdebug, `all`
>
> If we merge the stable and final flags, won't this:
> 
> Value getter() {
>     var local = field;
>     if (local == null)
>         local = field = ... // makes the getter final-writing
>     return local;
> }
> 
> be regarded the same as any final-writing constructor?
> 
> Then every call to `getter()` is fenced, and the issue in #19433 isn't solved at all.

@liach, if I understand this PR correctly, it only adds barriers for final/stable fields in constructors.  Previous code to emit barriers for stable fields outside of constructors is removed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2221869982

From dholmes at openjdk.org  Thu Jul 11 02:42:56 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 11 Jul 2024 02:42:56 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v3]
In-Reply-To: <RWb7Mt_BMrYVBR3UwJvh7tRR504wpP0RNwvfC5H1R4E=.440e6564-74fb-4758-a4ad-6d2938243893@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
 <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com>
 <4SmCasO8fGVxb0wnRWQcMDUM63yub0jqnDbVyRr-xBs=.042f56b8-d4f1-4460-95b9-ed09df545b3e@github.com>
 <RWb7Mt_BMrYVBR3UwJvh7tRR504wpP0RNwvfC5H1R4E=.440e6564-74fb-4758-a4ad-6d2938243893@github.com>
Message-ID: <KKXg1PeYIYOr45p4L6lBqNrjMIdMoQI-aydEGygCJZM=.785a668d-9f8a-4211-877b-8fd93f52a835@github.com>

On Wed, 10 Jul 2024 20:29:19 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> test/jdk/java/lang/Thread/virtual/ThreadPollOnYield.java line 39:
>> 
>>> 37:  * @requires vm.continuations
>>> 38:  * @library /test/lib
>>> 39:  * @run junit/othervm -Xcomp -XX:-TieredCompilation -XX:CompileCommand=inline,*::yield* -XX:CompileCommand=inline,*::*Yield ThreadPollOnYield
>> 
>> Given this forces -Xcomp shouldn't we skip running it when compilation mode is set via jtreg flags?
>
> The test should never fail even with external flags, so if anything it's just extra testing. But I can add vm.flagless if you prefer.

flagless might be going too far as we won't test with other GC's etc. Can we just use `@requires vm.compMode != "Xcomp"` to exclude it from the Xcomp specific testing which is redundant.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1673317959

From dholmes at openjdk.org  Thu Jul 11 02:44:05 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 11 Jul 2024 02:44:05 GMT
Subject: RFR: 8335553: [Graal] Compiler thread calls into
 jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
 [v2]
In-Reply-To: <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com>
References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
 <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com>
Message-ID: <HkENYscsjtr10ThhWQrD3KLdRLO_V6W3lwy4n6t4-RU=.7a724248-60dc-4a93-937d-c6b0d8efcd31@github.com>

On Tue, 9 Jul 2024 13:46:46 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

>> This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions:
>> 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing).
>> 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails.
>> 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1.
>> 
>> An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue.
>> 
>> To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible.
>
> Doug Simon has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fixed TestTranslatedException

Non JVMCI changes look good. Thanks

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20083#pullrequestreview-2170778148

From liach at openjdk.org  Thu Jul 11 03:11:58 2024
From: liach at openjdk.org (Chen Liang)
Date: Thu, 11 Jul 2024 03:11:58 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
Message-ID: <Axo66lnMuZLjSYfs7beOiIUJCbEQhkhknedqpD4DHE4=.0b6c900a-b6b3-4756-bc7a-7b9587835fe9@github.com>

On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> See bug for more discussion.
> 
> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
> 
> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
> 
> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
> 
> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
> 
> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
> 
> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
> 
> Additional testing:
>  - [x] New IR tests
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, `all`

Ah, so it's like a weaker version of always safe construction. Makes sense.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2221914037

From jkratochvil at openjdk.org  Thu Jul 11 03:41:59 2024
From: jkratochvil at openjdk.org (Jan Kratochvil)
Date: Thu, 11 Jul 2024 03:41:59 GMT
Subject: RFR: 8333446: Add tests for hierarchical container support [v3]
In-Reply-To: <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com>
References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com>
 <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com>
Message-ID: <uWlQ046HkkvQZ6nmMUCFMYxlgoeqG296pNj6vBTS2uA=.2199f887-e0da-46da-831b-53fb8c5868aa@github.com>

On Mon, 1 Jul 2024 14:43:58 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges.
>> 
>> I'm adding those tests in order to not regress another time.
>> 
>> Testing:
>> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1.
>> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to  JDK-8322420)
>> - [x] GHA
>
> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>  - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>  - Fix comments
>  - 8333446: Add tests for hierarchical container support

test/hotspot/jtreg/ProblemList.txt line 119:

> 117: containers/docker/TestMemoryAwareness.java 8303470 linux-all
> 118: containers/docker/TestJFREvents.java 8327723 linux-x64
> 119: containers/systemd/SystemdMemoryAwarenessTest.java 8322420 linux-all

This line should be removed as long as it gets applied after [17198](https://github.com/openjdk/jdk/pull/17198).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19530#discussion_r1673356918

From qxing at openjdk.org  Thu Jul 11 03:43:26 2024
From: qxing at openjdk.org (Qizheng Xing)
Date: Thu, 11 Jul 2024 03:43:26 GMT
Subject: RFR: 8336163: Remove declarations of some debug-only methods in
 release build
Message-ID: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com>

Some of the methods are defined only in debug mode, but their declarations still exist in release mode.

This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail.

-------------

Commit messages:
 - Do not declare some of the methods in release mode.

Changes: https://git.openjdk.org/jdk/pull/20131/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20131&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8336163
  Stats: 16 lines in 5 files changed: 16 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20131.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20131/head:pull/20131

PR: https://git.openjdk.org/jdk/pull/20131

From jkratochvil at openjdk.org  Thu Jul 11 03:44:56 2024
From: jkratochvil at openjdk.org (Jan Kratochvil)
Date: Thu, 11 Jul 2024 03:44:56 GMT
Subject: RFR: 8333446: Add tests for hierarchical container support [v3]
In-Reply-To: <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com>
References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com>
 <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com>
Message-ID: <kSbubsK2cEF-sY-GX4AYliW9dMXZ8IYGBcKIZaalDcU=.b82004c9-2633-4996-8c61-18d3ff9b0fd0@github.com>

On Mon, 1 Jul 2024 14:43:58 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges.
>> 
>> I'm adding those tests in order to not regress another time.
>> 
>> Testing:
>> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1.
>> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to  JDK-8322420)
>> - [x] GHA
>
> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>  - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>  - Fix comments
>  - 8333446: Add tests for hierarchical container support

[test.patch.txt](https://github.com/user-attachments/files/16171122/test.patch.txt)
* `CPUQuota` (changed it to `AllowedCPUs`) does not work for me - it properly distributes the load but JDK still sees all available CPU cores (4 of my VM).
* the change 2 -> 1 cores: // We could check 2 cores ("0-1") but then it would fail on single-core nodes / virtual machines.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2221959393

From stuefe at openjdk.org  Thu Jul 11 05:09:58 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 11 Jul 2024 05:09:58 GMT
Subject: RFR: 8330144: Revise os::free_memory() [v2]
In-Reply-To: <heNwdf-AALEgj3UMZHGlj2JRbpD2ziefcLrDPpgAYUo=.ab7733d2-c323-4d28-9ddf-1568dbf6c5cb@github.com>
References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
 <RkdpuSUNmZ4sLShuFs-FxWivLrnc7Hd_0t5eAQspR0g=.75741bbc-6af3-42fb-acd5-1cc413060f8a@github.com>
 <heNwdf-AALEgj3UMZHGlj2JRbpD2ziefcLrDPpgAYUo=.ab7733d2-c323-4d28-9ddf-1568dbf6c5cb@github.com>
Message-ID: <OeVK7uUIkZK572A_hPeHEQzs6YILjp5B2sv6hKv64kk=.89d76253-2ab3-40b5-9508-678a996f9d28@github.com>

On Wed, 10 Jul 2024 17:58:25 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> src/hotspot/os/windows/os_windows.cpp line 3896:
>> 
>>> 3894: 
>>> 3895: void os::pd_realign_memory(char *addr, size_t bytes, size_t alignment_hint) { }
>>> 3896: void os::pd_disclaim_memory(char *addr, size_t bytes) { }
>> 
>> Give us a little comment about what this API does?
>> 
>> "Hints to the OS that the memory is not needed anymore and can be reclaimed by the OS; will destroy memory content; it will be re-aquired on touch, no explicit committing needed"
>> 
>> Something like that
>
> Ok I've added a comment with a description. 
> 
> Is it good practice to add these types of descriptions in the shared code header files (os.hpp), in the platform dependent code (os_linux.hpp), or both? I see some examples of all 3 cases, but I'm wondering if there's a best practice.

There is no common format. Sun did not comment any APIs in the beginning. I usually do it above the prototype in the hpp file, because then IDEs cab pick it up and show you the description in tooltips.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20080#discussion_r1673405671

From tanksherman27 at gmail.com  Thu Jul 11 05:52:43 2024
From: tanksherman27 at gmail.com (Julian Waters)
Date: Thu, 11 Jul 2024 05:52:43 +0000
Subject: Where does VMError::print_native_stack and
 os::get_sender_for_C_frame load/use the frame pointer?
Message-ID: <CAP2b4GNfyhviAjRakViZ6nqjBev9S3hYE0ejJ+zb+CNpa_r4GA@mail.gmail.com>

Hi Dean,

I eventually did find frame::link(), but ultimately it didn't seem to help as VMError::print_native_stack still doesn't work properly on Windows. It seems as though frame::link() calls addr_at on x86, which in turn calls frame::fp(), which returns _fp. I think whatever sets _fp for VMError::print_native_stack is the missing link here, but unfortunately I don't know where it's set

The code that I tried on Windows x64 is attached below

best regards,
Julian

// VC++ does not save frame pointer on stack in optimized build. It
// can be turned off by -Oy-. If we really want to walk C frames,
// we can use the StackWalk() API.
frame os::get_sender_for_C_frame(frame* fr) {
#ifdef __GNUC__
  return frame(fr->sender_sp(), fr->link(), fr->sender_pc());
#elif defined(_MSC_VER)
  ShouldNotReachHere();
  return frame();
#endif
}

frame os::current_frame() {
#ifdef __GNUC__
  frame f(reinterpret_cast<intptr_t*>(os::current_stack_pointer()),
          reinterpret_cast<intptr_t*>(__builtin_frame_address(1)),
          CAST_FROM_FN_PTR(address, &os::current_frame));
  if (os::is_first_C_frame(&f)) {
    // stack is not walkable
    return frame();
  } else {
    return os::get_sender_for_C_frame(&f);
  }
#elif defined(_MSC_VER)
  return frame();  // cannot walk Windows frames this way.  See os::get_native_stack
                   // and os::platform_print_native_stack
#endif
}

From dholmes at openjdk.org  Thu Jul 11 06:57:56 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 11 Jul 2024 06:57:56 GMT
Subject: RFR: 8336163: Remove declarations of some debug-only methods in
 release build
In-Reply-To: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com>
References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com>
Message-ID: <IFhCsMwPTpiYCPHMcIiwqdX3Gx-MJkzTZUBHlDrVgAQ=.2b7f045d-3dc3-42fc-8fb3-e85a37b05cf9@github.com>

On Thu, 11 Jul 2024 03:37:06 GMT, Qizheng Xing <qxing at openjdk.org> wrote:

> Some of the methods are defined only in debug mode, but their declarations still exist in release mode.
> 
> This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail.

Looks good. Thanks for cleaning this up.

Please update the copyright year in  src/hotspot/share/runtime/registerMap.hpp

-------------

Changes requested by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20131#pullrequestreview-2171067941

From dnsimon at openjdk.org  Thu Jul 11 07:06:06 2024
From: dnsimon at openjdk.org (Doug Simon)
Date: Thu, 11 Jul 2024 07:06:06 GMT
Subject: Integrated: 8335553: [Graal] Compiler thread calls into
 jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
In-Reply-To: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
Message-ID: <1hewrnDIw6jXlNydTQvT_A8ODaargcrVJyJkufOH-74=.baa6e8b9-e84e-43a1-bc85-ea4dc3c9b28b@github.com>

On Mon, 8 Jul 2024 19:01:05 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

> This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions:
> 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing).
> 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails.
> 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1.
> 
> An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue.
> 
> To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible.

This pull request has now been integrated.

Changeset: cf940e13
Author:    Doug Simon <dnsimon at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/cf940e139a76e5aabd52379b8a87065d82b2284c
Stats:     103 lines in 6 files changed: 62 ins; 22 del; 19 mod

8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation

Reviewed-by: yzheng, never, dholmes

-------------

PR: https://git.openjdk.org/jdk/pull/20083

From dnsimon at openjdk.org  Thu Jul 11 07:06:05 2024
From: dnsimon at openjdk.org (Doug Simon)
Date: Thu, 11 Jul 2024 07:06:05 GMT
Subject: RFR: 8335553: [Graal] Compiler thread calls into
 jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
 [v2]
In-Reply-To: <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com>
References: <vthV3LC2xWibX_cT7SOcRASLMD8FLwB84_dl1KiaxMY=.71659c02-ab14-4812-8021-c81413e83259@github.com>
 <BUPsFQTN-twZrvPQBoAMoHXNo_lqIMiTGH-pVnvVVpY=.2bfcc370-6ddb-4e12-8dcb-420aad9e4223@github.com>
Message-ID: <Ku915mcoIObN_yFr-rdMvxLTcIoewfS0x1DufVG9WPU=.4d357f51-92b0-4f1d-939f-361ea8c73b7a@github.com>

On Tue, 9 Jul 2024 13:46:46 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

>> This PR addresses intermittent failures in jtreg GC stress tests. The failures occur under these conditions:
>> 1. Using a libgraal build with assertions enabled as the top tier JIT compiler. Such a libgraal build will cause a VM exit if an assertion or GraalError occurs in a compiler thread (as this catches more errors in testing).
>> 2. A libgraal compiler thread makes a call into the VM (via `CompilerToVM`) to a routine that performs a HotSpot heap allocation that fails.
>> 3. The resulting OOME is wrapped in a GraalError, causing the VM to exit as described in 1.
>> 
>> An OOME thrown in these specific conditions should not exit the VM as it not related to an OOME in the app or test. Instead, the failure should be treated as a bailout and the libgraal compiler should continue.
>> 
>> To accomplish this, libgraal needs to be able to distinguish a GraalError caused by an OOME. This PR modifies the exception translation code to make this possible.
>
> Doug Simon has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fixed TestTranslatedException

Thanks for the reviews.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20083#issuecomment-2222187035

From qxing at openjdk.org  Thu Jul 11 07:12:35 2024
From: qxing at openjdk.org (Qizheng Xing)
Date: Thu, 11 Jul 2024 07:12:35 GMT
Subject: RFR: 8336163: Remove declarations of some debug-only methods in
 release build [v2]
In-Reply-To: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com>
References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com>
Message-ID: <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com>

> Some of the methods are defined only in debug mode, but their declarations still exist in release mode.
> 
> This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail.

Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision:

  Update copyright.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20131/files
  - new: https://git.openjdk.org/jdk/pull/20131/files/5b044462..37a14107

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20131&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20131&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20131.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20131/head:pull/20131

PR: https://git.openjdk.org/jdk/pull/20131

From qxing at openjdk.org  Thu Jul 11 07:12:35 2024
From: qxing at openjdk.org (Qizheng Xing)
Date: Thu, 11 Jul 2024 07:12:35 GMT
Subject: RFR: 8336163: Remove declarations of some debug-only methods in
 release build [v2]
In-Reply-To: <IFhCsMwPTpiYCPHMcIiwqdX3Gx-MJkzTZUBHlDrVgAQ=.2b7f045d-3dc3-42fc-8fb3-e85a37b05cf9@github.com>
References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com>
 <IFhCsMwPTpiYCPHMcIiwqdX3Gx-MJkzTZUBHlDrVgAQ=.2b7f045d-3dc3-42fc-8fb3-e85a37b05cf9@github.com>
Message-ID: <XY6W7Z_T27A7__GE4smIBj2bDIIxt-E25N3jIUBT61s=.25a24a8d-c187-4358-a7f2-d072db5a44ad@github.com>

On Thu, 11 Jul 2024 06:55:29 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Please update the copyright year in src/hotspot/share/runtime/registerMap.hpp

Updated. Thanks for your suggestion!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20131#issuecomment-2222195566

From dean.long at oracle.com  Thu Jul 11 07:17:18 2024
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Thu, 11 Jul 2024 00:17:18 -0700
Subject: [External] : Re: Where does VMError::print_native_stack and
 os::get_sender_for_C_frame load/use the frame pointer?
In-Reply-To: <CAP2b4GNfyhviAjRakViZ6nqjBev9S3hYE0ejJ+zb+CNpa_r4GA@mail.gmail.com>
References: <CAP2b4GNfyhviAjRakViZ6nqjBev9S3hYE0ejJ+zb+CNpa_r4GA@mail.gmail.com>
Message-ID: <34aeae4d-22bd-45bd-85e9-4922a368c4c1@oracle.com>

Using fr->link() in get_sender_for_C_frame() gives the wrong answer 
because it refers to the current frame, not the sender frame. There is 
no frame::sender_fp() because the information we need could be anywhere 
in the frame or even nowhere in the frame. This is what the comment 
about StackWalk() API is hinting at. Even debuggers can have trouble 
giving an accurate stack trace if external debug information is missing 
and frames do not contain the needed information themselves.

dl

On 7/10/24 10:52 PM, Julian Waters wrote:
> Hi Dean,
>
> I eventually did find frame::link(), but ultimately it didn't seem to help as VMError::print_native_stack still doesn't work properly on Windows. It seems as though frame::link() calls addr_at on x86, which in turn calls frame::fp(), which returns _fp. I think whatever sets _fp for VMError::print_native_stack is the missing link here, but unfortunately I don't know where it's set
>
> The code that I tried on Windows x64 is attached below
>
> best regards,
> Julian
>
> // VC++ does not save frame pointer on stack in optimized build. It
> // can be turned off by -Oy-. If we really want to walk C frames,
> // we can use the StackWalk() API.
> frame os::get_sender_for_C_frame(frame* fr) {
> #ifdef __GNUC__
>    return frame(fr->sender_sp(), fr->link(), fr->sender_pc());
> #elif defined(_MSC_VER)
>    ShouldNotReachHere();
>    return frame();
> #endif
> }
>
> frame os::current_frame() {
> #ifdef __GNUC__
>    frame f(reinterpret_cast<intptr_t*>(os::current_stack_pointer()),
>            reinterpret_cast<intptr_t*>(__builtin_frame_address(1)),
>            CAST_FROM_FN_PTR(address, &os::current_frame));
>    if (os::is_first_C_frame(&f)) {
>      // stack is not walkable
>      return frame();
>    } else {
>      return os::get_sender_for_C_frame(&f);
>    }
> #elif defined(_MSC_VER)
>    return frame();  // cannot walk Windows frames this way.  See os::get_native_stack
>                     // and os::platform_print_native_stack
> #endif
> }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240711/dcd9d651/attachment.htm>

From dholmes at openjdk.org  Thu Jul 11 07:18:55 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 11 Jul 2024 07:18:55 GMT
Subject: RFR: 8336163: Remove declarations of some debug-only methods in
 release build [v2]
In-Reply-To: <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com>
References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com>
 <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com>
Message-ID: <xCvFFDDqHcxVJONUa0VenAonG3oY6hP6Cjie3W54aYs=.9a3f6132-1eef-4c0a-b9cb-24ba590a9aa3@github.com>

On Thu, 11 Jul 2024 07:12:35 GMT, Qizheng Xing <qxing at openjdk.org> wrote:

>> Some of the methods are defined only in debug mode, but their declarations still exist in release mode.
>> 
>> This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail.
>
> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update copyright.

Thanks.

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20131#pullrequestreview-2171105373

From qxing at openjdk.org  Thu Jul 11 07:30:57 2024
From: qxing at openjdk.org (Qizheng Xing)
Date: Thu, 11 Jul 2024 07:30:57 GMT
Subject: RFR: 8336163: Remove declarations of some debug-only methods in
 release build [v2]
In-Reply-To: <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com>
References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com>
 <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com>
Message-ID: <4BJhfsLXcD6wvT8iZNXOeQ3IL2llZcqCPlCbesjlH4U=.34316abb-b4b0-44f9-a4d2-0ed1c1800ea2@github.com>

On Thu, 11 Jul 2024 07:12:35 GMT, Qizheng Xing <qxing at openjdk.org> wrote:

>> Some of the methods are defined only in debug mode, but their declarations still exist in release mode.
>> 
>> This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail.
>
> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update copyright.

Thanks for the review.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20131#issuecomment-2222227563

From duke at openjdk.org  Thu Jul 11 07:30:58 2024
From: duke at openjdk.org (duke)
Date: Thu, 11 Jul 2024 07:30:58 GMT
Subject: RFR: 8336163: Remove declarations of some debug-only methods in
 release build [v2]
In-Reply-To: <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com>
References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com>
 <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com>
Message-ID: <5pXzgjHqBDblF_ax7idXa5W_0QCqJHL-YFx9lhmH7Ks=.dd466800-57b9-454a-8e21-dc2f9fd84c20@github.com>

On Thu, 11 Jul 2024 07:12:35 GMT, Qizheng Xing <qxing at openjdk.org> wrote:

>> Some of the methods are defined only in debug mode, but their declarations still exist in release mode.
>> 
>> This is considered a bug because these methods may be called mistakenly in release mode and cause the build to fail.
>
> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update copyright.

@MaxXSoft 
Your change (at version 37a14107e9e5542024512bced9bbd03c0c606461) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20131#issuecomment-2222230127

From stuefe at openjdk.org  Thu Jul 11 07:38:57 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 11 Jul 2024 07:38:57 GMT
Subject: RFR: 8300732: Whitebox functions for Metaspace test should use
 byte size
In-Reply-To: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com>
References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com>
Message-ID: <7kTS7aOEGu5r0uCYvKrIb7nvf1-MBkuCngFWHxNzj2E=.1d2e2913-d442-429f-afc1-0732171cb514@github.com>

On Thu, 4 Jul 2024 15:18:29 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

> Hi all, 
> 
> This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. 
> 
> Testing: 
> - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. 
> 
> Thanks, 
> Sonia

It looks cautiously okay. Small nits remain.

Please make sure the tests pass for both 64-bit and 32-bit (to test 32-build, simplest way is to build on a x64 linux as normal, but to specify --with-target-bits=32 when configuring).

src/hotspot/share/prims/whitebox.cpp line 1769:

> 1767: 
> 1768: WB_ENTRY(jlong, WB_AllocateFromMetaspaceTestArena(JNIEnv* env, jobject wb, jlong arena, jlong size))
> 1769:   if (size % BytesPerWord != 0) {

Just assert `is_aligned(size, BytesPerWord) `

test/hotspot/jtreg/runtime/Metaspace/elastic/MetaspaceTestContext.java line 232:

> 230:         //
> 231:         long expectedMaxCommitted = usageMeasured;
> 232:         expectedMaxCommitted += Settings.ROOT_CHUNK_WORD_SIZE;

Needs scaling up by BytesPerWord now

test/hotspot/jtreg/runtime/Metaspace/elastic/TestMetaspaceAllocationMT1.java line 98:

> 96: 
> 97:         final long wordSize = Settings.WORD_SIZE;
> 98:         final long testAllocationCeiling = 1024 * 1024 * 8 * wordSize; // 8m words = 64M on 64bit

Here, and in other places where I hardcode expected memory values: don't scale by size, just hardcode now the real byte values. E.g. here, use 1024 * 1024 * 64. 

If possible, put a "KB" and "MB" define somewhere. Or even better, copy the "Unit" enum I added to TestTrimNative. Would be cool to have that somewhere central.

-------------

Changes requested by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20039#pullrequestreview-2171122397
PR Review Comment: https://git.openjdk.org/jdk/pull/20039#discussion_r1673536814
PR Review Comment: https://git.openjdk.org/jdk/pull/20039#discussion_r1673538960
PR Review Comment: https://git.openjdk.org/jdk/pull/20039#discussion_r1673547131

From tanksherman27 at gmail.com  Thu Jul 11 08:12:40 2024
From: tanksherman27 at gmail.com (Julian Waters)
Date: Thu, 11 Jul 2024 16:12:40 +0800
Subject: [External] : Re: Where does VMError::print_native_stack and
 os::get_sender_for_C_frame load/use the frame pointer?
In-Reply-To: <34aeae4d-22bd-45bd-85e9-4922a368c4c1@oracle.com>
References: <CAP2b4GNfyhviAjRakViZ6nqjBev9S3hYE0ejJ+zb+CNpa_r4GA@mail.gmail.com>
 <34aeae4d-22bd-45bd-85e9-4922a368c4c1@oracle.com>
Message-ID: <CAP2b4GNkafc7zpbXfXJf1OZJMt7wFM_GSF9cFZX4gajrujx+Zg@mail.gmail.com>

Hi Dean,

Thanks for the quick reply. At the risk of testing your patience, I
don't really follow, since that is how os::get_sender_for_C_frame is
implemented on other platforms (I copied it from Linux x86 in this
case). All I got from the comment is that the only reason we usually
have to use the StackWalk API on Windows is because the frame pointer
is not saved when using the Microsoft compiler, however in my case I'm
not using the Microsoft compiler and have verified that the frame
pointer is saved in my custom JVMs. I'm not sure how
VMError::print_native_stack on other platforms manages to work when
they also do

return frame(fr->sender_sp(), fr->link(), fr->sender_pc());

in os::get_sender_for_C_frame like I did here

Thanks for your time and patience!

best regards,
Julian

On Thu, Jul 11, 2024 at 3:17?PM <dean.long at oracle.com> wrote:
>
> Using fr->link() in get_sender_for_C_frame() gives the wrong answer because it refers to the current frame, not the sender frame. There is no frame::sender_fp() because the information we need could be anywhere in the frame or even nowhere in the frame. This is what the comment about StackWalk() API is hinting at. Even debuggers can have trouble giving an accurate stack trace if external debug information is missing and frames do not contain the needed information themselves.
>
> dl
>
> On 7/10/24 10:52 PM, Julian Waters wrote:
>
> Hi Dean,
>
> I eventually did find frame::link(), but ultimately it didn't seem to help as VMError::print_native_stack still doesn't work properly on Windows. It seems as though frame::link() calls addr_at on x86, which in turn calls frame::fp(), which returns _fp. I think whatever sets _fp for VMError::print_native_stack is the missing link here, but unfortunately I don't know where it's set
>
> The code that I tried on Windows x64 is attached below
>
> best regards,
> Julian
>
> // VC++ does not save frame pointer on stack in optimized build. It
> // can be turned off by -Oy-. If we really want to walk C frames,
> // we can use the StackWalk() API.
> frame os::get_sender_for_C_frame(frame* fr) {
> #ifdef __GNUC__
>   return frame(fr->sender_sp(), fr->link(), fr->sender_pc());
> #elif defined(_MSC_VER)
>   ShouldNotReachHere();
>   return frame();
> #endif
> }
>
> frame os::current_frame() {
> #ifdef __GNUC__
>   frame f(reinterpret_cast<intptr_t*>(os::current_stack_pointer()),
>           reinterpret_cast<intptr_t*>(__builtin_frame_address(1)),
>           CAST_FROM_FN_PTR(address, &os::current_frame));
>   if (os::is_first_C_frame(&f)) {
>     // stack is not walkable
>     return frame();
>   } else {
>     return os::get_sender_for_C_frame(&f);
>   }
> #elif defined(_MSC_VER)
>   return frame();  // cannot walk Windows frames this way.  See os::get_native_stack
>                    // and os::platform_print_native_stack
> #endif
> }

From luhenry at openjdk.org  Thu Jul 11 08:40:02 2024
From: luhenry at openjdk.org (Ludovic Henry)
Date: Thu, 11 Jul 2024 08:40:02 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v24]
In-Reply-To: <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com>
Message-ID: <7fhRknozHSB9GrctVa-AReMYCo1Wgh8cMUoMDAP9J2E=.0bf684bc-5b06-4b8e-ba57-5274c58b6ec5@github.com>

On Wed, 10 Jul 2024 20:31:27 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hi all, please consider!
>> 
>> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
>> Using a very small application or running very short time we have fast patchable calls.
>> But any normal application running longer will increase the code size and code chrun/fragmentation.
>> So whatever or not you get hot fast calls rely on luck.
>> 
>> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
>> This would be the common case for a patchable call.
>> 
>> Code stream:
>> JAL <trampo>
>> Stubs:
>> AUIPC
>> LD
>> JALR
>> <DEST>
>> 
>> 
>> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
>> Even if you don't have that problem having a call to a jump is not the fastest way.
>> Loading the address avoids the pitsfalls of cmodx.
>> 
>> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
>> and instead do by default:
>> 
>> Code stream:
>> AUIPC
>> LD
>> JALR
>> Stubs:
>> <DEST>
>> 
>> An experimental option for turning trampolines back on exists.
>> 
>> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
>> 
>> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
>> 
>> fop                                        (msec)    2239       |  2128       =  0.950424
>> h2                                         (msec)    18660      |  16594      =  0.889282
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> luindex                                    (msec)    2866       |  2842       =  0.991626
>> lusearch                                   (msec)    4108       |  4311       =  1.04942
>> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
>> pmd                                        (msec)    5976       |  5897       =  0.98678
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> Avg:                                       0.974112                              
>> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
>> h2(xcomp) ...
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits:
> 
>  - Skip qualify ins
>  - Merge branch 'master' into 8332689
>  - _ld to ld
>  - Merge branch 'master' into 8332689
>  - Rename to reloc_call
>  - Merge branch 'master' into 8332689
>  - Rename lc
>  - Merge branch 'master' into 8332689
>  - Merge branch 'master' into 8332689
>  - Comments
>  - ... and 24 more: https://git.openjdk.org/jdk/compare/242f1133...242c3790

Marked as reviewed by luhenry (Committer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/19453#pullrequestreview-2171272750

From shade at openjdk.org  Thu Jul 11 08:49:12 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 11 Jul 2024 08:49:12 GMT
Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers
Message-ID: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>

All around Hotspot, we have calls to `method->is_initializer()`. That method test for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor (instance initializer), not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. Often we get lucky by never being exposed to static initializers on particular paths.

I would like to sharpen this. I went back and forth, and ultimately decided to remove `is_initializer` completely to avoid future confusion, and rewrite the uses appropriately.

Additional testing:
 - [x] Linux AArch64 server fastdebug, `all` (includes Fuzzer and CTW tests)

-------------

Commit messages:
 - Minor touchups
 - Relax MemberName, accept clinit as Method
 - Fix

Changes: https://git.openjdk.org/jdk/pull/20120/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20120&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8336103
  Stats: 46 lines in 15 files changed: 10 ins; 17 del; 19 mod
  Patch: https://git.openjdk.org/jdk/pull/20120.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20120/head:pull/20120

PR: https://git.openjdk.org/jdk/pull/20120

From shade at openjdk.org  Thu Jul 11 08:49:12 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 11 Jul 2024 08:49:12 GMT
Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers
In-Reply-To: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
Message-ID: <M7cb-w9MyVB-dVZnJSHzKtn2IPVZYFqlZhjwXfFik9c=.d2d75ea3-67cf-42f9-9513-e1c29a758d8a@github.com>

On Wed, 10 Jul 2024 17:15:49 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> All around Hotspot, we have calls to `method->is_initializer()`. That method test for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor (instance initializer), not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. Often we get lucky by never being exposed to static initializers on particular paths.
> 
> I would like to sharpen this. I went back and forth, and ultimately decided to remove `is_initializer` completely to avoid future confusion, and rewrite the uses appropriately.
> 
> Additional testing:
>  - [x] Linux AArch64 server fastdebug, `all` (includes Fuzzer and CTW tests)

Caught some test failures, back to draft.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20120#issuecomment-2221062542

From aturbanov at openjdk.org  Thu Jul 11 08:49:13 2024
From: aturbanov at openjdk.org (Andrey Turbanov)
Date: Thu, 11 Jul 2024 08:49:13 GMT
Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers
In-Reply-To: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
Message-ID: <4b2tcmkJFGFQ3X9Uu_mg2UE_3vV_dqlP3tx24R5m7JY=.55d37e64-d4f7-40e1-913f-e757042b3629@github.com>

On Wed, 10 Jul 2024 17:15:49 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> All around Hotspot, we have calls to `method->is_initializer()`. That method test for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor (instance initializer), not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. Often we get lucky by never being exposed to static initializers on particular paths.
> 
> I would like to sharpen this. I went back and forth, and ultimately decided to remove `is_initializer` completely to avoid future confusion, and rewrite the uses appropriately.
> 
> Additional testing:
>  - [x] Linux AArch64 server fastdebug, `all` (includes Fuzzer and CTW tests)

src/hotspot/share/oops/klassVtable.cpp line 1233:

> 1231: 
> 1232: inline bool interface_method_needs_itable_index(Method* m) {
> 1233:   if (m->is_static())           return false;   // e.g., Stream.empty

code alignment is now inconsistent

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1673644568

From jrose at openjdk.org  Thu Jul 11 08:50:57 2024
From: jrose at openjdk.org (John R Rose)
Date: Thu, 11 Jul 2024 08:50:57 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
Message-ID: <pfFWmbs1q_M-WQIDyBw15ctVdRcAudSrdJ6BEQRx41E=.762c100f-7650-47fd-bfe3-ac620913384f@github.com>

On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> See bug for more discussion.
> 
> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
> 
> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
> 
> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
> 
> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
> 
> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
> 
> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
> 
> Additional testing:
>  - [x] New IR tests
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, `all`

I like this compromise.  Let me see if I got it right:  A stable write in a constructor is treated like a final write ? it triggers a barrier at the end of the constructor.  That?s a cheap move.  No other barriers are added automatically, for reads or other writes, saving us from doing less cheap moves.  The burden would be on users of stable vars (in fancy access patterns) to add more fences if needed, but we don?t see any important cases of that, at the moment.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2222374934

From tanksherman27 at gmail.com  Thu Jul 11 08:57:23 2024
From: tanksherman27 at gmail.com (Julian Waters)
Date: Thu, 11 Jul 2024 16:57:23 +0800
Subject: [External] : Re: Where does VMError::print_native_stack and
 os::get_sender_for_C_frame load/use the frame pointer?
In-Reply-To: <CAP2b4GNkafc7zpbXfXJf1OZJMt7wFM_GSF9cFZX4gajrujx+Zg@mail.gmail.com>
References: <CAP2b4GNfyhviAjRakViZ6nqjBev9S3hYE0ejJ+zb+CNpa_r4GA@mail.gmail.com>
 <34aeae4d-22bd-45bd-85e9-4922a368c4c1@oracle.com>
 <CAP2b4GNkafc7zpbXfXJf1OZJMt7wFM_GSF9cFZX4gajrujx+Zg@mail.gmail.com>
Message-ID: <CAP2b4GPnh9C5VR84KM3c9wgkYiLDf80BMJ5fK26qWsvobGXvkA@mail.gmail.com>

Seems like I found an old gem where the issue with the frame pointer was
first discovered

https://bugs.openjdk.org/browse/JDK-8022335
https://github.com/openjdk/jdk/commit/1c2a7eea85ea261102687190d6b2e92c560770b8

best regards,
Julian


On Thu, Jul 11, 2024 at 4:12?PM Julian Waters <tanksherman27 at gmail.com>
wrote:

> Hi Dean,
>
> Thanks for the quick reply. At the risk of testing your patience, I
> don't really follow, since that is how os::get_sender_for_C_frame is
> implemented on other platforms (I copied it from Linux x86 in this
> case). All I got from the comment is that the only reason we usually
> have to use the StackWalk API on Windows is because the frame pointer
> is not saved when using the Microsoft compiler, however in my case I'm
> not using the Microsoft compiler and have verified that the frame
> pointer is saved in my custom JVMs. I'm not sure how
> VMError::print_native_stack on other platforms manages to work when
> they also do
>
> return frame(fr->sender_sp(), fr->link(), fr->sender_pc());
>
> in os::get_sender_for_C_frame like I did here
>
> Thanks for your time and patience!
>
> best regards,
> Julian
>
> On Thu, Jul 11, 2024 at 3:17?PM <dean.long at oracle.com> wrote:
> >
> > Using fr->link() in get_sender_for_C_frame() gives the wrong answer
> because it refers to the current frame, not the sender frame. There is no
> frame::sender_fp() because the information we need could be anywhere in the
> frame or even nowhere in the frame. This is what the comment about
> StackWalk() API is hinting at. Even debuggers can have trouble giving an
> accurate stack trace if external debug information is missing and frames do
> not contain the needed information themselves.
> >
> > dl
> >
> > On 7/10/24 10:52 PM, Julian Waters wrote:
> >
> > Hi Dean,
> >
> > I eventually did find frame::link(), but ultimately it didn't seem to
> help as VMError::print_native_stack still doesn't work properly on Windows.
> It seems as though frame::link() calls addr_at on x86, which in turn calls
> frame::fp(), which returns _fp. I think whatever sets _fp for
> VMError::print_native_stack is the missing link here, but unfortunately I
> don't know where it's set
> >
> > The code that I tried on Windows x64 is attached below
> >
> > best regards,
> > Julian
> >
> > // VC++ does not save frame pointer on stack in optimized build. It
> > // can be turned off by -Oy-. If we really want to walk C frames,
> > // we can use the StackWalk() API.
> > frame os::get_sender_for_C_frame(frame* fr) {
> > #ifdef __GNUC__
> >   return frame(fr->sender_sp(), fr->link(), fr->sender_pc());
> > #elif defined(_MSC_VER)
> >   ShouldNotReachHere();
> >   return frame();
> > #endif
> > }
> >
> > frame os::current_frame() {
> > #ifdef __GNUC__
> >   frame f(reinterpret_cast<intptr_t*>(os::current_stack_pointer()),
> >           reinterpret_cast<intptr_t*>(__builtin_frame_address(1)),
> >           CAST_FROM_FN_PTR(address, &os::current_frame));
> >   if (os::is_first_C_frame(&f)) {
> >     // stack is not walkable
> >     return frame();
> >   } else {
> >     return os::get_sender_for_C_frame(&f);
> >   }
> > #elif defined(_MSC_VER)
> >   return frame();  // cannot walk Windows frames this way.  See
> os::get_native_stack
> >                    // and os::platform_print_native_stack
> > #endif
> > }
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240711/98540c75/attachment.htm>

From shade at openjdk.org  Thu Jul 11 08:58:11 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 11 Jul 2024 08:58:11 GMT
Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers
 [v2]
In-Reply-To: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
Message-ID: <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com>

> All around Hotspot, we have calls to `method->is_initializer()`. That method test for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor (instance initializer), not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. Often we get lucky by never being exposed to static initializers on particular paths.
> 
> I would like to sharpen this. I went back and forth, and ultimately decided to remove `is_initializer` completely to avoid future confusion, and rewrite the uses appropriately.
> 
> Additional testing:
>  - [x] Linux AArch64 server fastdebug, `all` (includes Fuzzer and CTW tests)

Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:

  Indenting

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20120/files
  - new: https://git.openjdk.org/jdk/pull/20120/files/f586e4db..c5da5ebd

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20120&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20120&range=00-01

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20120.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20120/head:pull/20120

PR: https://git.openjdk.org/jdk/pull/20120

From shade at openjdk.org  Thu Jul 11 08:58:12 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 11 Jul 2024 08:58:12 GMT
Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers
 [v2]
In-Reply-To: <4b2tcmkJFGFQ3X9Uu_mg2UE_3vV_dqlP3tx24R5m7JY=.55d37e64-d4f7-40e1-913f-e757042b3629@github.com>
References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
 <4b2tcmkJFGFQ3X9Uu_mg2UE_3vV_dqlP3tx24R5m7JY=.55d37e64-d4f7-40e1-913f-e757042b3629@github.com>
Message-ID: <zH5gC1wCdxLMrL4dCiFEB3YfvoAjnaMcSEYOudZGZmw=.846d8bbc-a9ea-46fb-81cb-b32073d253e5@github.com>

On Thu, 11 Jul 2024 08:46:38 GMT, Andrey Turbanov <aturbanov at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Indenting
>
> src/hotspot/share/oops/klassVtable.cpp line 1233:
> 
>> 1231: 
>> 1232: inline bool interface_method_needs_itable_index(Method* m) {
>> 1233:   if (m->is_static())           return false;   // e.g., Stream.empty
> 
> code alignment is now inconsistent

Fixed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1673657253

From shade at openjdk.org  Thu Jul 11 08:59:55 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 11 Jul 2024 08:59:55 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
Message-ID: <pdP9zTaLI_DskFxG11PMF1qTDRC-1mbmdfhrmHnkJ-o=.179539f1-ffc4-4faf-a1b8-12940a498650@github.com>

On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> See bug for more discussion.
> 
> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
> 
> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
> 
> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
> 
> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
> 
> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
> 
> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
> 
> Additional testing:
>  - [x] New IR tests
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, `all`

> If we merge the stable and final flags, won't this be regarded the same as any final-writing constructor?

Nope. With this patch, we only care about stable field barriers in constructors. Methods are not affected. There are new IR tests that verify this directly.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2222394296

From shade at openjdk.org  Thu Jul 11 09:02:56 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 11 Jul 2024 09:02:56 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <tyXcVLFV6t00-jDYMMHuaRPfScHuvsuGbPsJaCO1ALc=.c002e92d-92aa-4030-a320-a0dcfe7dc5d1@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
 <tyXcVLFV6t00-jDYMMHuaRPfScHuvsuGbPsJaCO1ALc=.c002e92d-92aa-4030-a320-a0dcfe7dc5d1@github.com>
Message-ID: <seQquUMTfKdTEbU7TeM5IRuZe12PeoUxnznHyA_VXTA=.e7d8c514-39f9-4980-8b9d-a9563bd54df9@github.com>

On Wed, 10 Jul 2024 23:35:01 GMT, Dean Long <dlong at openjdk.org> wrote:

> Do we still need separate _wrote_stable and _wrote_final flags, or could we combine them into _wrote_stable_or_final? Then we are almost back to pre-8031818, when _wrote_final was overloaded to mean write to final or stable field.

One of my previous iterations did this combination, but I thought it was: a) uglier; b) not future-proof, in case someone (probably me, later) would like to check the parser state for final fields writes specifically. So I thought to track final and stable field writes separately.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2222400103

From shade at openjdk.org  Thu Jul 11 09:09:55 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 11 Jul 2024 09:09:55 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <pfFWmbs1q_M-WQIDyBw15ctVdRcAudSrdJ6BEQRx41E=.762c100f-7650-47fd-bfe3-ac620913384f@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
 <pfFWmbs1q_M-WQIDyBw15ctVdRcAudSrdJ6BEQRx41E=.762c100f-7650-47fd-bfe3-ac620913384f@github.com>
Message-ID: <o_chvBrK3sgS76_FnNSCGkujpNJzMde80_Yl3PD8bX8=.ba3c80f0-ac63-4f4a-8684-90a350ad453b@github.com>

On Thu, 11 Jul 2024 08:47:59 GMT, John R Rose <jrose at openjdk.org> wrote:

> I like this compromise. Let me see if I got it right: A stable write in a constructor is treated like a final write ? it triggers a barrier at the end of the constructor. That?s a cheap move. No other barriers are added automatically, for reads or other writes, saving us from doing less cheap moves. The burden would be on users of stable vars (in fancy access patterns) to add more fences if needed, but we don?t see any important cases of that, at the moment.

Yes, pretty much.

Looking at this another way, after this patch, there is no performance or safety cost for simple changes in user code like:
 1. Changing the previously `final` field into `@Stable` field with value overwrite outside of constructor. This looks like a useful pattern in `java.lang.invoke`.
 2. Changing the previously `@Stable` field into `final` field, if the only stores are in constructor. Basically, the reversal of (1).
 3. Putting a `@Stable` over `final` field. This is where current `String` constructor gets a bad deal today.
 4. Putting a `@Stable` over any field that is written outside of constructor. This is where lazy caches like `Enum.hashCode` get a bad deal today.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2222413725

From sgehwolf at openjdk.org  Thu Jul 11 09:15:58 2024
From: sgehwolf at openjdk.org (Severin Gehwolf)
Date: Thu, 11 Jul 2024 09:15:58 GMT
Subject: RFR: 8333446: Add tests for hierarchical container support [v3]
In-Reply-To: <uWlQ046HkkvQZ6nmMUCFMYxlgoeqG296pNj6vBTS2uA=.2199f887-e0da-46da-831b-53fb8c5868aa@github.com>
References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com>
 <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com>
 <uWlQ046HkkvQZ6nmMUCFMYxlgoeqG296pNj6vBTS2uA=.2199f887-e0da-46da-831b-53fb8c5868aa@github.com>
Message-ID: <PDQay8tXsDTCW1HMDCNk8WcfB_ZNgLctDGe5J-GwHhY=.3d9841b5-5250-4d68-a775-0e7d45e612bd@github.com>

On Thu, 11 Jul 2024 03:39:37 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote:

>> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>> 
>>  - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>>  - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>>  - Fix comments
>>  - 8333446: Add tests for hierarchical container support
>
> test/hotspot/jtreg/ProblemList.txt line 119:
> 
>> 117: containers/docker/TestMemoryAwareness.java 8303470 linux-all
>> 118: containers/docker/TestJFREvents.java 8327723 linux-x64
>> 119: containers/systemd/SystemdMemoryAwarenessTest.java 8322420 linux-all
> 
> This line should be removed as long as it gets applied after [17198](https://github.com/openjdk/jdk/pull/17198).

Sure. We need to see which one goes in first and I'll adjust accordingly.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19530#discussion_r1673683205

From sgehwolf at openjdk.org  Thu Jul 11 09:26:55 2024
From: sgehwolf at openjdk.org (Severin Gehwolf)
Date: Thu, 11 Jul 2024 09:26:55 GMT
Subject: RFR: 8333446: Add tests for hierarchical container support [v3]
In-Reply-To: <kSbubsK2cEF-sY-GX4AYliW9dMXZ8IYGBcKIZaalDcU=.b82004c9-2633-4996-8c61-18d3ff9b0fd0@github.com>
References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com>
 <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com>
 <kSbubsK2cEF-sY-GX4AYliW9dMXZ8IYGBcKIZaalDcU=.b82004c9-2633-4996-8c61-18d3ff9b0fd0@github.com>
Message-ID: <0U-PWBKKJ7mHmK_GQ77s_gZ0tPRbRIsQcdjJRWdVmGg=.8b319781-3628-400b-b9d7-c0750a2a8637@github.com>

On Thu, 11 Jul 2024 03:42:27 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote:

> [test.patch.txt](https://github.com/user-attachments/files/16171122/test.patch.txt)
> 
>     * `CPUQuota` (changed it to `AllowedCPUs`) does not work for me - it properly distributes the load but JDK still sees all available CPU cores (4 of my VM).

Could you elaborate on that? What does not work? It's relying on the JVM properly detecting the set limit. `CPUQuota` sets the values in `cpu.max` on unified hierarchy for the `cpu` controller. See the [systemd doc](https://www.freedesktop.org/software/systemd/man/latest/systemd.resource-control.html). It's available since systemd 213. RHEL 7 has 219 which should be good enough. `AllowedCPUs` on the other hand uses the `cpuset` controller, which is a different thing. For the purpose of this test, we should use `CPUQuota`.
 
>     * the change 2 -> 1 cores: // We could check 2 cores ("0-1") but then it would fail on single-core nodes / virtual machines.

Yeah, we have a chicken/egg problem there. It seemed assuming 2 cores is reasonable. We could query the number of not restricted CPUs (of the physical system) using the WB API and take the minimum of the two. Let me work on that.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2222448285

From yzheng at openjdk.org  Thu Jul 11 09:29:00 2024
From: yzheng at openjdk.org (Yudi Zheng)
Date: Thu, 11 Jul 2024 09:29:00 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v5]
In-Reply-To: <nho0iQHJu__oLvxJF3oE1qBlFiSvUoZ6dLEIc139KqA=.5ba0e931-a6c4-443d-b9d7-715da000d045@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <nho0iQHJu__oLvxJF3oE1qBlFiSvUoZ6dLEIc139KqA=.5ba0e931-a6c4-443d-b9d7-715da000d045@github.com>
Message-ID: <GxCetSdxU-CzhK7QGtwUC2lxoREeieXA10J8zbizClw=.60084b5d-1f22-4592-9d6e-a99e52df478e@github.com>

On Wed, 10 Jul 2024 20:10:07 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with four additional commits since the last revision:
> 
>  - Add extra comments in LightweightSynchronizer::inflate_fast_locked_object
>  - Fix typos
>  - Remove unused variable
>  - Add missing inline qualifiers

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 843:

> 841:       movptr(monitor, Address(box, BasicLock::object_monitor_cache_offset_in_bytes()));
> 842:       // null check with ZF == 0, no valid pointer below alignof(ObjectMonitor*)
> 843:       cmpptr(monitor, alignof(ObjectMonitor*));

Is this only for keeping `ZF == 0` and can be replaced by `test; je` if we are not using `jne` to jump to the slow path? Or is there any performance concern? Btw, I think `ZF` is always rewritten before entering into the slow path https://github.com/openjdk/jdk/blob/b32e4a68bca588d908bd81a398eb3171a6876dc5/src/hotspot/cpu/x86/c2_CodeStubs_x86.cpp#L98-L102

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1673704482

From rehn at openjdk.org  Thu Jul 11 10:00:59 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Thu, 11 Jul 2024 10:00:59 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v23]
In-Reply-To: <SN7gZ_XJWn2jG_DXGmzHWqVfV1xz_vG-BTkotAbuzkM=.c48958e0-a982-4e38-bb0c-fac37d4de7f1@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com>
 <gBaz5XlGA4DywDyB2NIlCqY4A1zbkN5y7zhXTvEgFbM=.9fd83e49-0a5a-4530-a87d-321e05b66016@github.com>
 <SN7gZ_XJWn2jG_DXGmzHWqVfV1xz_vG-BTkotAbuzkM=.c48958e0-a982-4e38-bb0c-fac37d4de7f1@github.com>
Message-ID: <dj_6CgB5MNJuRLxsbuko3KB6P5EBnsNecuPyUzBzzZs=.ff095ccc-8f0b-4de9-b776-178f9df524f1@github.com>

On Wed, 10 Jul 2024 20:39:24 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Also performed tier1-3 and hotspot:tier4 on my unmatched boards. Result looks fine.
>> Just witnessed several unnecessary uses of namespace `Assembler`. Guess you might want to clean it up? Still good otherwise.
>> 
>> 
>> diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
>> index b39ac79be6b..e349eab3177 100644
>> --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
>> +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
>> @@ -983,9 +983,9 @@ void MacroAssembler::load_link_jump(const address source, Register temp) {
>>    assert_cond(source != nullptr);
>>    int64_t distance = source - pc();
>>    assert(is_simm32(distance), "Must be");
>> -  Assembler::auipc(temp, (int32_t)distance + 0x800);
>> -  Assembler::ld(temp, temp, ((int32_t)distance << 20) >> 20);
>> -  Assembler::jalr(x1, temp, 0);
>> +  auipc(temp, (int32_t)distance + 0x800);
>> +  ld(temp, Address(temp, ((int32_t)distance << 20) >> 20));
>> +  jalr(temp);
>>  }
>> 
>>  void MacroAssembler::jump_link(const address dest, Register temp) {
>> @@ -994,7 +994,7 @@ void MacroAssembler::jump_link(const address dest, Register temp) {
>>    int64_t distance = dest - pc();
>>    assert(is_simm21(distance), "Must be");
>>    assert((distance % 2) == 0, "Must be");
>> -  Assembler::jal(x1, distance);
>> +  jal(x1, distance);
>>  }
>> 
>>  void MacroAssembler::j(const address dest, Register temp) {
>
>> I have not seen (new) issues in testing. I would have prefered one or two more reviewers, but since RV is not the biggest platform I'll settle with just passing the bar. I'll go ahead and integrate if @RealFYang and @Hamlin-Li re-reviews (as the new rules are in-effect which require latest rev to be reviewed).
> 
> Still good to me. Thanks!

Thanks! @Hamlin-Li please re-approve, background: https://mail.openjdk.org/pipermail/jdk-dev/2024-July/009199.html

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2222520657

From mli at openjdk.org  Thu Jul 11 10:05:01 2024
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 11 Jul 2024 10:05:01 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v24]
In-Reply-To: <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com>
Message-ID: <7iSqtZQW-vx36D_y9R5a-lWWRwOl0p0aGdTF6GhV6P0=.5065f3b7-d33c-4b1c-8a8d-b6666d251b9f@github.com>

On Wed, 10 Jul 2024 20:31:27 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hi all, please consider!
>> 
>> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
>> Using a very small application or running very short time we have fast patchable calls.
>> But any normal application running longer will increase the code size and code chrun/fragmentation.
>> So whatever or not you get hot fast calls rely on luck.
>> 
>> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
>> This would be the common case for a patchable call.
>> 
>> Code stream:
>> JAL <trampo>
>> Stubs:
>> AUIPC
>> LD
>> JALR
>> <DEST>
>> 
>> 
>> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
>> Even if you don't have that problem having a call to a jump is not the fastest way.
>> Loading the address avoids the pitsfalls of cmodx.
>> 
>> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
>> and instead do by default:
>> 
>> Code stream:
>> AUIPC
>> LD
>> JALR
>> Stubs:
>> <DEST>
>> 
>> An experimental option for turning trampolines back on exists.
>> 
>> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
>> 
>> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
>> 
>> fop                                        (msec)    2239       |  2128       =  0.950424
>> h2                                         (msec)    18660      |  16594      =  0.889282
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> luindex                                    (msec)    2866       |  2842       =  0.991626
>> lusearch                                   (msec)    4108       |  4311       =  1.04942
>> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
>> pmd                                        (msec)    5976       |  5897       =  0.98678
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> Avg:                                       0.974112                              
>> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
>> h2(xcomp) ...
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits:
> 
>  - Skip qualify ins
>  - Merge branch 'master' into 8332689
>  - _ld to ld
>  - Merge branch 'master' into 8332689
>  - Rename to reloc_call
>  - Merge branch 'master' into 8332689
>  - Rename lc
>  - Merge branch 'master' into 8332689
>  - Merge branch 'master' into 8332689
>  - Comments
>  - ... and 24 more: https://git.openjdk.org/jdk/compare/242f1133...242c3790

Marked as reviewed by mli (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/19453#pullrequestreview-2171471540

From mli at openjdk.org  Thu Jul 11 10:05:02 2024
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 11 Jul 2024 10:05:02 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v23]
In-Reply-To: <SN7gZ_XJWn2jG_DXGmzHWqVfV1xz_vG-BTkotAbuzkM=.c48958e0-a982-4e38-bb0c-fac37d4de7f1@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <CJzw2cha3OyqX9jnxeFj9se8z4V6alfhaTAHxj_R63k=.86e35c57-9bf9-4d22-a350-45d10c4e307b@github.com>
 <gBaz5XlGA4DywDyB2NIlCqY4A1zbkN5y7zhXTvEgFbM=.9fd83e49-0a5a-4530-a87d-321e05b66016@github.com>
 <SN7gZ_XJWn2jG_DXGmzHWqVfV1xz_vG-BTkotAbuzkM=.c48958e0-a982-4e38-bb0c-fac37d4de7f1@github.com>
Message-ID: <RXAvVwAc0X_sVyyL89RAMsmVe1UDdUFIZjLnjB8ybng=.69f8bbb4-7de7-4ad9-9c2b-5d3e251866fc@github.com>

On Wed, 10 Jul 2024 20:39:24 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Also performed tier1-3 and hotspot:tier4 on my unmatched boards. Result looks fine.
>> Just witnessed several unnecessary uses of namespace `Assembler`. Guess you might want to clean it up? Still good otherwise.
>> 
>> 
>> diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
>> index b39ac79be6b..e349eab3177 100644
>> --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
>> +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
>> @@ -983,9 +983,9 @@ void MacroAssembler::load_link_jump(const address source, Register temp) {
>>    assert_cond(source != nullptr);
>>    int64_t distance = source - pc();
>>    assert(is_simm32(distance), "Must be");
>> -  Assembler::auipc(temp, (int32_t)distance + 0x800);
>> -  Assembler::ld(temp, temp, ((int32_t)distance << 20) >> 20);
>> -  Assembler::jalr(x1, temp, 0);
>> +  auipc(temp, (int32_t)distance + 0x800);
>> +  ld(temp, Address(temp, ((int32_t)distance << 20) >> 20));
>> +  jalr(temp);
>>  }
>> 
>>  void MacroAssembler::jump_link(const address dest, Register temp) {
>> @@ -994,7 +994,7 @@ void MacroAssembler::jump_link(const address dest, Register temp) {
>>    int64_t distance = dest - pc();
>>    assert(is_simm21(distance), "Must be");
>>    assert((distance % 2) == 0, "Must be");
>> -  Assembler::jal(x1, distance);
>> +  jal(x1, distance);
>>  }
>> 
>>  void MacroAssembler::j(const address dest, Register temp) {
>
>> I have not seen (new) issues in testing. I would have prefered one or two more reviewers, but since RV is not the biggest platform I'll settle with just passing the bar. I'll go ahead and integrate if @RealFYang and @Hamlin-Li re-reviews (as the new rules are in-effect which require latest rev to be reviewed).
> 
> Still good to me. Thanks!

> Thanks! @Hamlin-Li please re-approve, background: https://mail.openjdk.org/pipermail/jdk-dev/2024-July/009199.html

Done.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2222529821

From rehn at openjdk.org  Thu Jul 11 10:27:04 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Thu, 11 Jul 2024 10:27:04 GMT
Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v24]
In-Reply-To: <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
 <5ejRWsbRIP1r1H0oOENrsDrHaMebfqfNGrIMc-UjogQ=.7ccd8152-311d-4164-8a4a-17a110561cac@github.com>
Message-ID: <AsNSDOsQCpCshYp0nhTVmVESMGJ9lkF4svg_aO30cfo=.84a6c348-0a52-4d39-9cb5-b1c5bc7cea4e@github.com>

On Wed, 10 Jul 2024 20:31:27 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hi all, please consider!
>> 
>> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
>> Using a very small application or running very short time we have fast patchable calls.
>> But any normal application running longer will increase the code size and code chrun/fragmentation.
>> So whatever or not you get hot fast calls rely on luck.
>> 
>> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
>> This would be the common case for a patchable call.
>> 
>> Code stream:
>> JAL <trampo>
>> Stubs:
>> AUIPC
>> LD
>> JALR
>> <DEST>
>> 
>> 
>> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
>> Even if you don't have that problem having a call to a jump is not the fastest way.
>> Loading the address avoids the pitsfalls of cmodx.
>> 
>> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
>> and instead do by default:
>> 
>> Code stream:
>> AUIPC
>> LD
>> JALR
>> Stubs:
>> <DEST>
>> 
>> An experimental option for turning trampolines back on exists.
>> 
>> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
>> 
>> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
>> 
>> fop                                        (msec)    2239       |  2128       =  0.950424
>> h2                                         (msec)    18660      |  16594      =  0.889282
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> luindex                                    (msec)    2866       |  2842       =  0.991626
>> lusearch                                   (msec)    4108       |  4311       =  1.04942
>> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
>> pmd                                        (msec)    5976       |  5897       =  0.98678
>> jython                                     (msec)    22022      |  21925      =  0.995595
>> Avg:                                       0.974112                              
>> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
>> h2(xcomp) ...
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits:
> 
>  - Skip qualify ins
>  - Merge branch 'master' into 8332689
>  - _ld to ld
>  - Merge branch 'master' into 8332689
>  - Rename to reloc_call
>  - Merge branch 'master' into 8332689
>  - Rename lc
>  - Merge branch 'master' into 8332689
>  - Merge branch 'master' into 8332689
>  - Comments
>  - ... and 24 more: https://git.openjdk.org/jdk/compare/242f1133...242c3790

Thank you all for sticking with it!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2222568556

From rehn at openjdk.org  Thu Jul 11 10:27:05 2024
From: rehn at openjdk.org (Robbin Ehn)
Date: Thu, 11 Jul 2024 10:27:05 GMT
Subject: Integrated: 8332689: RISC-V: Use load instead of trampolines
In-Reply-To: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
References: <mELboqOrnQtwPK5ygTdrcwnRqFrrn2u8E6WaXxALXNo=.0f3ef0f7-1b36-449f-84ed-5faff3571335@github.com>
Message-ID: <jOoD3_TrxTqic6fQICCoyljdD4q7zdabxS3QKltg1Ok=.f03f9f1b-7b42-47f2-a61e-67e4e817b709@github.com>

On Wed, 29 May 2024 12:40:05 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

> Hi all, please consider!
> 
> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB).
> Using a very small application or running very short time we have fast patchable calls.
> But any normal application running longer will increase the code size and code chrun/fragmentation.
> So whatever or not you get hot fast calls rely on luck.
> 
> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to.
> This would be the common case for a patchable call.
> 
> Code stream:
> JAL <trampo>
> Stubs:
> AUIPC
> LD
> JALR
> <DEST>
> 
> 
> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive.
> Even if you don't have that problem having a call to a jump is not the fastest way.
> Loading the address avoids the pitsfalls of cmodx.
> 
> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**,
> and instead do by default:
> 
> Code stream:
> AUIPC
> LD
> JALR
> Stubs:
> <DEST>
> 
> An experimental option for turning trampolines back on exists.
> 
> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa.
> 
> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch):
> 
> fop                                        (msec)    2239       |  2128       =  0.950424
> h2                                         (msec)    18660      |  16594      =  0.889282
> jython                                     (msec)    22022      |  21925      =  0.995595
> luindex                                    (msec)    2866       |  2842       =  0.991626
> lusearch                                   (msec)    4108       |  4311       =  1.04942
> lusearch-fix                               (msec)    4406       |  4116       =  0.934181
> pmd                                        (msec)    5976       |  5897       =  0.98678
> jython                                     (msec)    22022      |  21925      =  0.995595
> Avg:                                       0.974112                              
> fop(xcomp)                                 (msec)    2721       |  2714       =  0.997427
> h2(xcomp)                                  (msec)    37719      |  38004      =  1.00756
> jython(xcomp)        ...

This pull request has now been integrated.

Changeset: 5c612c23
Author:    Robbin Ehn <rehn at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/5c612c230b0a852aed5fd36e58b82ebf2e1838af
Stats:     897 lines in 16 files changed: 622 ins; 177 del; 98 mod

8332689: RISC-V: Use load instead of trampolines

Reviewed-by: fyang, mli, luhenry

-------------

PR: https://git.openjdk.org/jdk/pull/19453

From aboldtch at openjdk.org  Thu Jul 11 10:41:57 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Thu, 11 Jul 2024 10:41:57 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v5]
In-Reply-To: <GxCetSdxU-CzhK7QGtwUC2lxoREeieXA10J8zbizClw=.60084b5d-1f22-4592-9d6e-a99e52df478e@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <nho0iQHJu__oLvxJF3oE1qBlFiSvUoZ6dLEIc139KqA=.5ba0e931-a6c4-443d-b9d7-715da000d045@github.com>
 <GxCetSdxU-CzhK7QGtwUC2lxoREeieXA10J8zbizClw=.60084b5d-1f22-4592-9d6e-a99e52df478e@github.com>
Message-ID: <50LctfChrqd3_HlWrKZKsq4gADeTHWqY1SuFMSwzpL4=.6c6df72a-9bdd-4390-a38d-5d1ee95b8543@github.com>

On Thu, 11 Jul 2024 09:25:52 GMT, Yudi Zheng <yzheng at openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with four additional commits since the last revision:
>> 
>>  - Add extra comments in LightweightSynchronizer::inflate_fast_locked_object
>>  - Fix typos
>>  - Remove unused variable
>>  - Add missing inline qualifiers
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 843:
> 
>> 841:       movptr(monitor, Address(box, BasicLock::object_monitor_cache_offset_in_bytes()));
>> 842:       // null check with ZF == 0, no valid pointer below alignof(ObjectMonitor*)
>> 843:       cmpptr(monitor, alignof(ObjectMonitor*));
> 
> Is this only for keeping `ZF == 0` and can be replaced by `test; je` if we are not using `jne` to jump to the slow path? Or is there any performance concern? Btw, I think `ZF` is always rewritten before entering into the slow path https://github.com/openjdk/jdk/blob/b32e4a68bca588d908bd81a398eb3171a6876dc5/src/hotspot/cpu/x86/c2_CodeStubs_x86.cpp#L98-L102

You are correct the condition flag is not important here. 

At some point we had more than just `nullptr` and and `ObjectMonitor*` values, but also some small signal values which allowed us to move some slow path code into the runtime. When this was removed I just made the checks do the same on both aarch64 and x86. (Where aarch64 does not have a stub and jumps directly to the continuation requiring the correct condition flags after the branch.)

_Side note: This might be something that will be explored further in the future. And allow to move a lot of the LM_LIGHTWEIGHT slow path code away from the lock node and code stub into the runtime._

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1673800250

From coleenp at openjdk.org  Thu Jul 11 13:11:04 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Thu, 11 Jul 2024 13:11:04 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v3]
In-Reply-To: <P4vwJuFYdy9C2GugO5UgMllMPgrFZyjQkRPCW1d3NxM=.13a88311-ce1e-4be8-8b14-b48177a75960@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <5CNKzDumOf1MJQXM9OBHQh0Mj7eLv2ONio1V-AXeSJI=.54302b45-2dd2-4f18-a094-6b2c6a59517c@github.com>
 <-hS6aTxhzI_HzVegg0EziUtGxdq6orpF9s1rF3l2hZY=.0c4296b2-d27a-4578-a160-d17b65163655@github.com>
 <P4vwJuFYdy9C2GugO5UgMllMPgrFZyjQkRPCW1d3NxM=.13a88311-ce1e-4be8-8b14-b48177a75960@github.com>
Message-ID: <aEhln1AWzy1her4u3ffOamJl3Tz9eZaBb4ujhh4catg=.e5101f5f-f007-49d8-aae8-eb01b8a7fd25@github.com>

On Wed, 10 Jul 2024 09:41:08 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> src/hotspot/share/runtime/arguments.cpp line 1830:
>> 
>>> 1828:     FLAG_SET_CMDLINE(LockingMode, LM_LIGHTWEIGHT);
>>> 1829:     warning("UseObjectMonitorTable requires LM_LIGHTWEIGHT");
>>> 1830:   }
>> 
>> Maybe we want this to have the opposite sense - turn off UseObjectMonitorTable if not LM_LIGHTWEIGHT?
>
> Maybe. It boils down to what to do when the JVM receives `-XX:LockingMode={LM_LEGACY,LM_MONITOR} -XX:+UseObjectMonitorTable` 
> The options I see are
> 1. Select `LockingMode=LM_LIGHTWEIGHT`
> 2. Select `UseObjectMonitorTable=false`
> 3. Do not start the VM
> 
> Between 1. and 2. it is impossible to know what the real intentions were. But with being a newer `-XX:+UseObjectMonitorTable` it somehow seems more likely.
> 
> Option 3. is probably the sensible solution, but it is hard to determine. We tend to not close the VM because of incompatible options, rather fix them. But I believe there are precedence for both. If we do this however we will have to figure out all the interactions with our testing framework. And probably add some safeguards.

UseObjectMonitorTable is a Diagnostic option and LockingMode is (Deprecated) but a full-fledged product option, so I think the product option should override.  So I pick 2.  They might have changed to Legacy to compare performance or something like that, and missed that the table is only for lightweight locking when switching the command line.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1673989707

From coleenp at openjdk.org  Thu Jul 11 13:13:57 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Thu, 11 Jul 2024 13:13:57 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v3]
In-Reply-To: <P4vwJuFYdy9C2GugO5UgMllMPgrFZyjQkRPCW1d3NxM=.13a88311-ce1e-4be8-8b14-b48177a75960@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <5CNKzDumOf1MJQXM9OBHQh0Mj7eLv2ONio1V-AXeSJI=.54302b45-2dd2-4f18-a094-6b2c6a59517c@github.com>
 <-hS6aTxhzI_HzVegg0EziUtGxdq6orpF9s1rF3l2hZY=.0c4296b2-d27a-4578-a160-d17b65163655@github.com>
 <P4vwJuFYdy9C2GugO5UgMllMPgrFZyjQkRPCW1d3NxM=.13a88311-ce1e-4be8-8b14-b48177a75960@github.com>
Message-ID: <7_99_ouQ3MAtPFgIQzG01AlOOgFLGPrK1h1LMzzhK60=.0578b32c-8f8b-460c-a5c1-e5686369aba5@github.com>

On Wed, 10 Jul 2024 09:41:37 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 763:
>> 
>>> 761:   assert(mark.has_monitor(), "must be");
>>> 762:   // The monitor exists
>>> 763:   ObjectMonitor* monitor = ObjectSynchronizer::read_monitor(current, object, mark);
>> 
>> This looks in the table for the monitor in UseObjectMonitorTable, but could it first check the BasicLock?  Or we got here because BasicLock.metadata was not the ObjectMonitor?
>
>> This looks in the table for the monitor in UseObjectMonitorTable, but could it first check the BasicLock?
> 
> We could. 
> 
>> Or we got here because BasicLock.metadata was not the ObjectMonitor?
> 
> That is one reason we got here. We also get here from C1/interpreter as well as if there are other threads on the entry queues. 
> 
> I think there was an assumption that it would not be that crucial in those cases.
> 
> One off the reasons we do not read the `BasicLock` cache from the runtime is that we are not as careful with keeping the `BasicLock` initialised on platforms without `UseObjectMonitorTable`. The idea was that as long as they call into the VM, we do not need to keep it invariant. 
> 
> But this made me realise `BasicLock::print_on` will be broken on non x86/aarch64 platforms if running with `UseObjectMonitorTable`. 
> 
> Rather then fix all platforms I will condition BasicLock::object_monitor_cache to return nullptr on not supported platforms. 
> 
> Could add this then. Should probably add an overload to `ObjectSynchronizer::read_monitor` which takes the lock and push i all the way here.

I think I'd prefer not another overloading of read_monitor.  It's kind of confusing as is.  This is okay and we'll see if there's any performance benefit to checking BasicLock instead later.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1673993770

From jkratochvil at openjdk.org  Thu Jul 11 14:28:55 2024
From: jkratochvil at openjdk.org (Jan Kratochvil)
Date: Thu, 11 Jul 2024 14:28:55 GMT
Subject: RFR: 8333446: Add tests for hierarchical container support [v3]
In-Reply-To: <0U-PWBKKJ7mHmK_GQ77s_gZ0tPRbRIsQcdjJRWdVmGg=.8b319781-3628-400b-b9d7-c0750a2a8637@github.com>
References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com>
 <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com>
 <kSbubsK2cEF-sY-GX4AYliW9dMXZ8IYGBcKIZaalDcU=.b82004c9-2633-4996-8c61-18d3ff9b0fd0@github.com>
 <0U-PWBKKJ7mHmK_GQ77s_gZ0tPRbRIsQcdjJRWdVmGg=.8b319781-3628-400b-b9d7-c0750a2a8637@github.com>
Message-ID: <VHCtyVakvKnmu2bUd9tMRb0TbOIN0PvxllFqpJJX28g=.5e946e9c-8e69-4785-bdd2-f65485d22cd8@github.com>

On Thu, 11 Jul 2024 09:23:58 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

> > ```
> > * `CPUQuota` (changed it to `AllowedCPUs`) does not work for me - it properly distributes the load but JDK still sees all available CPU cores (4 of my VM).
> > ```
> 
> Could you elaborate on that? What does not work?

In the log there is (`/proc/cpuinfo` has 4 entries on this system):

[0.139s][trace][os,container] OSContainer::active_processor_count: 4 

and therefore it fails with:
```    
java.lang.RuntimeException: 'OSContainer::active_processor_count: 2' missing from stdout/stderr
        at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:252)
        at SystemdMemoryAwarenessTest.testHelloSystemd(SystemdMemoryAwarenessTest.java:58)
        at SystemdMemoryAwarenessTest.main(SystemdMemoryAwarenessTest.java:43)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333)
        at java.base/java.lang.Thread.run(Thread.java:1575)

It is on Fedora 40 x86_64 (`systemd-255.8-1.fc40.x86_64`).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2223083324

From sgehwolf at openjdk.org  Thu Jul 11 14:39:55 2024
From: sgehwolf at openjdk.org (Severin Gehwolf)
Date: Thu, 11 Jul 2024 14:39:55 GMT
Subject: RFR: 8333446: Add tests for hierarchical container support [v3]
In-Reply-To: <VHCtyVakvKnmu2bUd9tMRb0TbOIN0PvxllFqpJJX28g=.5e946e9c-8e69-4785-bdd2-f65485d22cd8@github.com>
References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com>
 <t_jUv9-mkIFcGRInYKmcnfP0W8VwXEtflahjSUiK8zI=.d524b51c-1963-4024-87e0-b12911d475d0@github.com>
 <kSbubsK2cEF-sY-GX4AYliW9dMXZ8IYGBcKIZaalDcU=.b82004c9-2633-4996-8c61-18d3ff9b0fd0@github.com>
 <0U-PWBKKJ7mHmK_GQ77s_gZ0tPRbRIsQcdjJRWdVmGg=.8b319781-3628-400b-b9d7-c0750a2a8637@github.com>
 <VHCtyVakvKnmu2bUd9tMRb0TbOIN0PvxllFqpJJX28g=.5e946e9c-8e69-4785-bdd2-f65485d22cd8@github.com>
Message-ID: <PM9EbHpQiv_K9fkWrZNP5OlqdpOoXuoSWMQbSBEAbHM=.33629d4c-e0d2-4f54-a588-2d4b3599bffb@github.com>

On Thu, 11 Jul 2024 14:26:23 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote:

> > > ```
> > > * `CPUQuota` (changed it to `AllowedCPUs`) does not work for me - it properly distributes the load but JDK still sees all available CPU cores (4 of my VM).
> > > ```
> > 
> > 
> > Could you elaborate on that? What does not work?
> 
> In the log there is (`/proc/cpuinfo` has 4 entries on this system):
> 
> ```
> [0.139s][trace][os,container] OSContainer::active_processor_count: 4 
> ```
> 
> and therefore it fails with:
> 
> ```
> java.lang.RuntimeException: 'OSContainer::active_processor_count: 2' missing from stdout/stderr
>         at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:252)
>         at SystemdMemoryAwarenessTest.testHelloSystemd(SystemdMemoryAwarenessTest.java:58)
>         at SystemdMemoryAwarenessTest.main(SystemdMemoryAwarenessTest.java:43)
>         at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:580)
>         at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333)
>         at java.base/java.lang.Thread.run(Thread.java:1575)
> ```
> 
> It is on Fedora 40 x86_64 (`systemd-255.8-1.fc40.x86_64`).

Well yes, because the limit isn't properly detected (needs a JVM change that does that; imo https://github.com/openjdk/jdk/pull/17198).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2223109654

From sgehwolf at openjdk.org  Thu Jul 11 16:46:13 2024
From: sgehwolf at openjdk.org (Severin Gehwolf)
Date: Thu, 11 Jul 2024 16:46:13 GMT
Subject: RFR: 8333446: Add tests for hierarchical container support [v4]
In-Reply-To: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com>
References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com>
Message-ID: <ZUmCX2Tqmw_48beJOefsyDEgjElCZWV6IVl7SMZi4r0=.37d3a4ee-2740-4745-ae47-766da3b7fb6e@github.com>

> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges.
> 
> I'm adding those tests in order to not regress another time.
> 
> Testing:
> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1.
> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to  JDK-8322420)
> - [x] GHA

Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:

 - Add Whitebox check for host cpu
 - Merge branch 'master' into jdk-8333446-systemd-slice-tests
 - Merge branch 'master' into jdk-8333446-systemd-slice-tests
 - Merge branch 'master' into jdk-8333446-systemd-slice-tests
 - Fix comments
 - 8333446: Add tests for hierarchical container support

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19530/files
  - new: https://git.openjdk.org/jdk/pull/19530/files/22141a48..139a9069

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=02-03

  Stats: 13132 lines in 454 files changed: 8669 ins; 2561 del; 1902 mod
  Patch: https://git.openjdk.org/jdk/pull/19530.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19530/head:pull/19530

PR: https://git.openjdk.org/jdk/pull/19530

From sgehwolf at openjdk.org  Thu Jul 11 16:59:56 2024
From: sgehwolf at openjdk.org (Severin Gehwolf)
Date: Thu, 11 Jul 2024 16:59:56 GMT
Subject: RFR: 8333446: Add tests for hierarchical container support [v4]
In-Reply-To: <ZUmCX2Tqmw_48beJOefsyDEgjElCZWV6IVl7SMZi4r0=.37d3a4ee-2740-4745-ae47-766da3b7fb6e@github.com>
References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com>
 <ZUmCX2Tqmw_48beJOefsyDEgjElCZWV6IVl7SMZi4r0=.37d3a4ee-2740-4745-ae47-766da3b7fb6e@github.com>
Message-ID: <h4NcwefKxH-wDTz-VekY135tQneTS5ti8HcnzXqOP2M=.096ee7bc-25ee-43ba-85ae-af8aace12e1d@github.com>

On Thu, 11 Jul 2024 16:46:13 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges.
>> 
>> I'm adding those tests in order to not regress another time.
>> 
>> Testing:
>> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1.
>> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to  JDK-8322420)
>> - [x] GHA
>
> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:
> 
>  - Add Whitebox check for host cpu
>  - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>  - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>  - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>  - Fix comments
>  - 8333446: Add tests for hierarchical container support

Example test run on cgv1 with a fixed JVM: https://cr.openjdk.org/~sgehwolf/webrevs/jdk-8333446-systemd-slice-tests/cgv1/SystemdMemoryAwarenessTest.jtr
Example test run on cgv2 with a fixed JVM: https://cr.openjdk.org/~sgehwolf/webrevs/jdk-8333446-systemd-slice-tests/cgv2/SystemdMemoryAwarenessTest.jtr

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2223432957

From gli at openjdk.org  Thu Jul 11 17:38:15 2024
From: gli at openjdk.org (Guoxiong Li)
Date: Thu, 11 Jul 2024 17:38:15 GMT
Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and
 VM_ParallelGCSystemGC [v3]
In-Reply-To: <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com>
References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
 <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com>
Message-ID: <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com>

On Wed, 10 Jul 2024 20:29:37 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity.
>> 
>> The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts.
>> 
>> Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser.
>
> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - review
>  - Merge branch 'master' into pgc-vm-operation
>  - pgc-vm-operation

Nice refactor.

src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 434:

> 432: void ParallelScavengeHeap::do_full_collection_no_gc_locker(bool clear_all_soft_refs) {
> 433:   bool maximum_compaction = clear_all_soft_refs;
> 434:   PSParallelCompact::invoke(maximum_compaction);

The parameter `maximum_heap_compaction` of the method `PSParallelCompact::invoke` was changed to `clear_all_soft_refs` in [JDK-8334445](https://git.openjdk.org/jdk/pull/19763), so the variable `maximum_compaction` seems not necessary here.

src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 443:

> 441:   if (result == nullptr && !is_tlab) {
> 442:     // auto expand inside
> 443:     result = old_gen()->allocate(size);

If we expand the generation in the method `PSOldGen::allocate`. I think it is good to rename the method to `expand_and_allocate` (just like `TenuredGeneration::expand_and_allocate` in SerialGC). It is better to be polished at a followup issue.

src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 446:

> 444:   }
> 445:   return result;   // Could be null if we are out of space.
> 446: }

I notice the method `PSOldGen::allocate` can expand the size of the old gen, but the method `PSYoungGen::allocate` can't expand the size of the young gen. It is similar to a bug [1] in Serial. Fortunately, the size of the young generation can be resized during Parallel GC if the option `UseAdaptiveSizePolicy` is `true`. When the `UseAdaptiveSizePolicy` is set to `false` manually by the user, I suspect it is a bug in Parallel because of the unexpanded young generation size.

[1] https://bugs.openjdk.org/browse/JDK-8333386

src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 478:

> 476: 
> 477:     const bool clear_all_soft_refs = true;
> 478:     do_full_collection_no_gc_locker(clear_all_soft_refs);

If the young collection succeeded in method `collect_at_safepoint`. The normal full collection won't run in `collect_at_safepoint`. If the successful young collection didn't release any memory (or only released little memory but not enough for allocation), the allocation in line 462 will fail too. Then a full collection with maximum compaction will be run. It is strange. In my opinion, I think the steps look like below:

1. allocation
2. young collection
3. allocation
4. normal full collection
5. allocation
6. maximum full collection
7. allocation
8. OOM

But in current patch, the step 4-5 may be skipped.

src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 114:

> 112: 
> 113:   // Perform a full collection
> 114:   void do_full_collection(bool clear_all_soft_refs) override;

The comment seems redundant.

src/hotspot/share/gc/parallel/psScavenge.cpp line 232:

> 230: // Note that this method should only be called from the vm_thread while
> 231: // at a safepoint!
> 232: bool PSScavenge::invoke() {

Nice removal. It is strange to run a full collection in `PSScavenge` before.

-------------

Changes requested by gli (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20077#pullrequestreview-2172099153
PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674132961
PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674390370
PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674247874
PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674379967
PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674344233
PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674384804

From gli at openjdk.org  Thu Jul 11 17:38:15 2024
From: gli at openjdk.org (Guoxiong Li)
Date: Thu, 11 Jul 2024 17:38:15 GMT
Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and
 VM_ParallelGCSystemGC [v3]
In-Reply-To: <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com>
References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
 <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com>
 <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com>
Message-ID: <LYtIo9zp0PwC7-1PMJtaovc3MOTjNr9neZWgWgiA1IQ=.e4b4293d-f729-47c4-9df4-0be908726682@github.com>

On Thu, 11 Jul 2024 14:39:47 GMT, Guoxiong Li <gli at openjdk.org> wrote:

>> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - review
>>  - Merge branch 'master' into pgc-vm-operation
>>  - pgc-vm-operation
>
> src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 434:
> 
>> 432: void ParallelScavengeHeap::do_full_collection_no_gc_locker(bool clear_all_soft_refs) {
>> 433:   bool maximum_compaction = clear_all_soft_refs;
>> 434:   PSParallelCompact::invoke(maximum_compaction);
> 
> The parameter `maximum_heap_compaction` of the method `PSParallelCompact::invoke` was changed to `clear_all_soft_refs` in [JDK-8334445](https://git.openjdk.org/jdk/pull/19763), so the variable `maximum_compaction` seems not necessary here.

If the variable `maximum_compaction` is removed, it may be better to use `PSParallelCompact::invoke` directly and remove the method `do_full_collection_no_gc_locker` (just like using `PSScavenge::invoke` directly).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674260428

From ayang at openjdk.org  Thu Jul 11 18:06:34 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Thu, 11 Jul 2024 18:06:34 GMT
Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and
 VM_ParallelGCSystemGC [v4]
In-Reply-To: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
Message-ID: <viPT0XNzMpheGP6HtlZ0RI1Gbi-nA9DkCraQSfo81rA=.481fe804-9c5c-441b-b069-7ad7baee772a@github.com>

> Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity.
> 
> The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts.
> 
> Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser.

Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:

 - Merge branch 'master' into pgc-vm-operation
 - review
 - review
 - Merge branch 'master' into pgc-vm-operation
 - pgc-vm-operation

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20077/files
  - new: https://git.openjdk.org/jdk/pull/20077/files/1d10dd5b..974b6b08

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20077&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20077&range=02-03

  Stats: 1640 lines in 65 files changed: 1034 ins; 342 del; 264 mod
  Patch: https://git.openjdk.org/jdk/pull/20077.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20077/head:pull/20077

PR: https://git.openjdk.org/jdk/pull/20077

From ayang at openjdk.org  Thu Jul 11 18:14:37 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Thu, 11 Jul 2024 18:14:37 GMT
Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and
 VM_ParallelGCSystemGC [v3]
In-Reply-To: <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com>
References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
 <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com>
 <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com>
Message-ID: <JZIUoUDNkyU49Kkaso_UnStuexRVA-yCmrT2Dt-rfsY=.873c408d-6478-4f7f-87d5-7cbeccb20714@github.com>

On Thu, 11 Jul 2024 17:10:58 GMT, Guoxiong Li <gli at openjdk.org> wrote:

> If the successful young collection didn't release any memory (or only released little memory but not enough for allocation),

A successful young-gc often leave young-gen completely empty. Otherwise, max-compaction full-gc should be run -- there is little benefit of running non-max-compaction full-gc if old-gen is too packed to hold all young-gen objs.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674452022

From shade at openjdk.org  Thu Jul 11 19:19:20 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 11 Jul 2024 19:19:20 GMT
Subject: RFR: 8329597: C2: Intrinsify Reference.clear
Message-ID: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>

[JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks.

Additional testing:
 - [x] Linux x86_64 server fastdebug, `all`
 - [ ] Linux AArch64 server fastdebug, `all`

-------------

Commit messages:
 - Move the membar at the end
 - Revert C1 parts
 - Work

Changes: https://git.openjdk.org/jdk/pull/20139/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8329597
  Stats: 132 lines in 7 files changed: 132 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20139.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139

PR: https://git.openjdk.org/jdk/pull/20139

From shade at openjdk.org  Thu Jul 11 19:19:20 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 11 Jul 2024 19:19:20 GMT
Subject: RFR: 8329597: C2: Intrinsify Reference.clear
In-Reply-To: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
Message-ID: <PkDvYvRdAryzDFqEAUgGNspKH6gsl3Kjp4a_4_6lJos=.fe9d87a5-b15b-47a9-8bcc-9e287ee70944@github.com>

On Thu, 11 Jul 2024 15:28:37 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks.
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [ ] Linux AArch64 server fastdebug, `all`

On Mac AArch64, which suffers from both native call and WX transition:


Benchmark                   Mode  Cnt  Score   Error  Units

# Intrinsic OFF
ReferenceClear.phantom      avgt    9  52,297 ? 0,294  ns/op
ReferenceClear.phantom_new  avgt    9  57,075 ? 0,296  ns/op
ReferenceClear.soft         avgt    9  52,567 ? 0,393  ns/op
ReferenceClear.soft_new     avgt    9  57,640 ? 0,264  ns/op
ReferenceClear.weak         avgt    9  53,018 ? 1,285  ns/op
ReferenceClear.weak_new     avgt    9  57,227 ? 0,483  ns/op

# Intrinsic ON (default)
ReferenceClear.phantom      avgt    9   0,780 ? 0,017  ns/op
ReferenceClear.soft         avgt    9   0,784 ? 0,022  ns/op
ReferenceClear.weak         avgt    9   0,793 ? 0,033  ns/op
ReferenceClear.phantom_new  avgt    9   3,018 ? 0,015  ns/op
ReferenceClear.soft_new     avgt    9   3,268 ? 0,014  ns/op
ReferenceClear.weak_new     avgt    9   3,004 ? 0,057  ns/op


On x86_64 m7a.16xlarge, which only suffers from the native call:


Benchmark                   Mode  Cnt  Score   Error  Units

# Intrinsic OFF
ReferenceClear.phantom      avgt    9  14.643 ? 0.049  ns/op
ReferenceClear.soft         avgt    9  14.939 ? 0.438  ns/op
ReferenceClear.weak         avgt    9  14.648 ? 0.081  ns/op
ReferenceClear.phantom_new  avgt    9  19.859 ? 2.405  ns/op
ReferenceClear.soft_new     avgt    9  20.208 ? 1.805  ns/op
ReferenceClear.weak_new     avgt    9  20.385 ? 2.570  ns/op

# Intrinsic ON (default)
ReferenceClear.phantom      avgt    9   0.821 ? 0.010  ns/op
ReferenceClear.soft         avgt    9   0.817 ? 0.007  ns/op
ReferenceClear.weak         avgt    9   0.819 ? 0.010  ns/op
ReferenceClear.phantom_new  avgt    9   4.195 ? 0.729  ns/op
ReferenceClear.soft_new     avgt    9   4.315 ? 0.599  ns/op
ReferenceClear.weak_new     avgt    9   3.986 ? 0.596  ns/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2223248114

From pchilanomate at openjdk.org  Thu Jul 11 20:11:50 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Thu, 11 Jul 2024 20:11:50 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v5]
In-Reply-To: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
Message-ID: <BL9VfiV5JAdKbG-6FyvPjxE20A8zd6W6xJbbIgphzvc=.f5af6373-f7bf-4df8-bd31-a3a801c373ad@github.com>

> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler.
> 
> I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3.
> 
> Thanks,
> Patricio

Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:

  add vm.compMode != Xcomp

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20016/files
  - new: https://git.openjdk.org/jdk/pull/20016/files/1cf425dd..2a8b7076

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=03-04

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20016.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20016/head:pull/20016

PR: https://git.openjdk.org/jdk/pull/20016

From pchilanomate at openjdk.org  Thu Jul 11 20:11:50 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Thu, 11 Jul 2024 20:11:50 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v3]
In-Reply-To: <KKXg1PeYIYOr45p4L6lBqNrjMIdMoQI-aydEGygCJZM=.785a668d-9f8a-4211-877b-8fd93f52a835@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
 <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com>
 <4SmCasO8fGVxb0wnRWQcMDUM63yub0jqnDbVyRr-xBs=.042f56b8-d4f1-4460-95b9-ed09df545b3e@github.com>
 <RWb7Mt_BMrYVBR3UwJvh7tRR504wpP0RNwvfC5H1R4E=.440e6564-74fb-4758-a4ad-6d2938243893@github.com>
 <KKXg1PeYIYOr45p4L6lBqNrjMIdMoQI-aydEGygCJZM=.785a668d-9f8a-4211-877b-8fd93f52a835@github.com>
Message-ID: <PgG16h9CBBYOBbokn0AY0NsW2xmHKYKczvaAzmqlzk8=.5ebde8ff-0cf9-4128-8429-26cdf6b97aa3@github.com>

On Thu, 11 Jul 2024 02:40:08 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> The test should never fail even with external flags, so if anything it's just extra testing. But I can add vm.flagless if you prefer.
>
> flagless might be going too far as we won't test with other GC's etc. Can we just use `@requires vm.compMode != "Xcomp"` to exclude it from the Xcomp specific testing which is redundant.

Done.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1674629390

From dean.long at oracle.com  Thu Jul 11 20:51:46 2024
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Thu, 11 Jul 2024 13:51:46 -0700
Subject: [External] : Re: Where does VMError::print_native_stack and
 os::get_sender_for_C_frame load/use the frame pointer?
In-Reply-To: <CAP2b4GNkafc7zpbXfXJf1OZJMt7wFM_GSF9cFZX4gajrujx+Zg@mail.gmail.com>
References: <CAP2b4GNfyhviAjRakViZ6nqjBev9S3hYE0ejJ+zb+CNpa_r4GA@mail.gmail.com>
 <34aeae4d-22bd-45bd-85e9-4922a368c4c1@oracle.com>
 <CAP2b4GNkafc7zpbXfXJf1OZJMt7wFM_GSF9cFZX4gajrujx+Zg@mail.gmail.com>
Message-ID: <5663ffff-8924-40a9-b5be-e3dacb86381c@oracle.com>

Sorry, I responded too quickly.? For some reason I was thinking link() 
was the same as fp().

If link() works with your compiler, then that is indeed the correct choice.

dl

On 7/11/24 1:12 AM, Julian Waters wrote:
> Hi Dean,
>
> Thanks for the quick reply. At the risk of testing your patience, I
> don't really follow, since that is how os::get_sender_for_C_frame is
> implemented on other platforms (I copied it from Linux x86 in this
> case). All I got from the comment is that the only reason we usually
> have to use the StackWalk API on Windows is because the frame pointer
> is not saved when using the Microsoft compiler, however in my case I'm
> not using the Microsoft compiler and have verified that the frame
> pointer is saved in my custom JVMs. I'm not sure how
> VMError::print_native_stack on other platforms manages to work when
> they also do
>
> return frame(fr->sender_sp(), fr->link(), fr->sender_pc());
>
> in os::get_sender_for_C_frame like I did here
>
> Thanks for your time and patience!
>
> best regards,
> Julian
>
> On Thu, Jul 11, 2024 at 3:17?PM <dean.long at oracle.com> wrote:
>> Using fr->link() in get_sender_for_C_frame() gives the wrong answer because it refers to the current frame, not the sender frame. There is no frame::sender_fp() because the information we need could be anywhere in the frame or even nowhere in the frame. This is what the comment about StackWalk() API is hinting at. Even debuggers can have trouble giving an accurate stack trace if external debug information is missing and frames do not contain the needed information themselves.
>>
>> dl
>>
>> On 7/10/24 10:52 PM, Julian Waters wrote:
>>
>> Hi Dean,
>>
>> I eventually did find frame::link(), but ultimately it didn't seem to help as VMError::print_native_stack still doesn't work properly on Windows. It seems as though frame::link() calls addr_at on x86, which in turn calls frame::fp(), which returns _fp. I think whatever sets _fp for VMError::print_native_stack is the missing link here, but unfortunately I don't know where it's set
>>
>> The code that I tried on Windows x64 is attached below
>>
>> best regards,
>> Julian
>>
>> // VC++ does not save frame pointer on stack in optimized build. It
>> // can be turned off by -Oy-. If we really want to walk C frames,
>> // we can use the StackWalk() API.
>> frame os::get_sender_for_C_frame(frame* fr) {
>> #ifdef __GNUC__
>>    return frame(fr->sender_sp(), fr->link(), fr->sender_pc());
>> #elif defined(_MSC_VER)
>>    ShouldNotReachHere();
>>    return frame();
>> #endif
>> }
>>
>> frame os::current_frame() {
>> #ifdef __GNUC__
>>    frame f(reinterpret_cast<intptr_t*>(os::current_stack_pointer()),
>>            reinterpret_cast<intptr_t*>(__builtin_frame_address(1)),
>>            CAST_FROM_FN_PTR(address, &os::current_frame));
>>    if (os::is_first_C_frame(&f)) {
>>      // stack is not walkable
>>      return frame();
>>    } else {
>>      return os::get_sender_for_C_frame(&f);
>>    }
>> #elif defined(_MSC_VER)
>>    return frame();  // cannot walk Windows frames this way.  See os::get_native_stack
>>                     // and os::platform_print_native_stack
>> #endif
>> }

From dean.long at oracle.com  Thu Jul 11 21:02:27 2024
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Thu, 11 Jul 2024 14:02:27 -0700
Subject: [External] : Re: Where does VMError::print_native_stack and
 os::get_sender_for_C_frame load/use the frame pointer?
In-Reply-To: <CAP2b4GPnh9C5VR84KM3c9wgkYiLDf80BMJ5fK26qWsvobGXvkA@mail.gmail.com>
References: <CAP2b4GNfyhviAjRakViZ6nqjBev9S3hYE0ejJ+zb+CNpa_r4GA@mail.gmail.com>
 <34aeae4d-22bd-45bd-85e9-4922a368c4c1@oracle.com>
 <CAP2b4GNkafc7zpbXfXJf1OZJMt7wFM_GSF9cFZX4gajrujx+Zg@mail.gmail.com>
 <CAP2b4GPnh9C5VR84KM3c9wgkYiLDf80BMJ5fK26qWsvobGXvkA@mail.gmail.com>
Message-ID: <0d5a728f-1dac-4a48-85f1-5d3cba200917@oracle.com>

Right.? It shows an alternative to print_native_stack to use with the 
Microsoft compiler.? If you are using a different compiler that stores 
the caller FP at frame::link_offset then following the example of other 
platforms, like Linux, should work.

dl

On 7/11/24 1:57 AM, Julian Waters wrote:
> Seems like I found an old gem where the issue with the frame pointer 
> was first discovered
>
> https://bugs.openjdk.org/browse/JDK-8022335
> https://github.com/openjdk/jdk/commit/1c2a7eea85ea261102687190d6b2e92c560770b8
>
> best regards,
> Julian
>
>
> On Thu, Jul 11, 2024 at 4:12?PM Julian Waters 
> <tanksherman27 at gmail.com> wrote:
>
>     Hi Dean,
>
>     Thanks for the quick reply. At the risk of testing your patience, I
>     don't really follow, since that is how os::get_sender_for_C_frame is
>     implemented on other platforms (I copied it from Linux x86 in this
>     case). All I got from the comment is that the only reason we usually
>     have to use the StackWalk API on Windows is because the frame pointer
>     is not saved when using the Microsoft compiler, however in my case I'm
>     not using the Microsoft compiler and have verified that the frame
>     pointer is saved in my custom JVMs. I'm not sure how
>     VMError::print_native_stack on other platforms manages to work when
>     they also do
>
>     return frame(fr->sender_sp(), fr->link(), fr->sender_pc());
>
>     in os::get_sender_for_C_frame like I did here
>
>     Thanks for your time and patience!
>
>     best regards,
>     Julian
>
>     On Thu, Jul 11, 2024 at 3:17?PM <dean.long at oracle.com> wrote:
>     >
>     > Using fr->link() in get_sender_for_C_frame() gives the wrong
>     answer because it refers to the current frame, not the sender
>     frame. There is no frame::sender_fp() because the information we
>     need could be anywhere in the frame or even nowhere in the frame.
>     This is what the comment about StackWalk() API is hinting at. Even
>     debuggers can have trouble giving an accurate stack trace if
>     external debug information is missing and frames do not contain
>     the needed information themselves.
>     >
>     > dl
>     >
>     > On 7/10/24 10:52 PM, Julian Waters wrote:
>     >
>     > Hi Dean,
>     >
>     > I eventually did find frame::link(), but ultimately it didn't
>     seem to help as VMError::print_native_stack still doesn't work
>     properly on Windows. It seems as though frame::link() calls
>     addr_at on x86, which in turn calls frame::fp(), which returns
>     _fp. I think whatever sets _fp for VMError::print_native_stack is
>     the missing link here, but unfortunately I don't know where it's set
>     >
>     > The code that I tried on Windows x64 is attached below
>     >
>     > best regards,
>     > Julian
>     >
>     > // VC++ does not save frame pointer on stack in optimized build. It
>     > // can be turned off by -Oy-. If we really want to walk C frames,
>     > // we can use the StackWalk() API.
>     > frame os::get_sender_for_C_frame(frame* fr) {
>     > #ifdef __GNUC__
>     >? ?return frame(fr->sender_sp(), fr->link(), fr->sender_pc());
>     > #elif defined(_MSC_VER)
>     >? ?ShouldNotReachHere();
>     >? ?return frame();
>     > #endif
>     > }
>     >
>     > frame os::current_frame() {
>     > #ifdef __GNUC__
>     >? ?frame f(reinterpret_cast<intptr_t*>(os::current_stack_pointer()),
>     > ?reinterpret_cast<intptr_t*>(__builtin_frame_address(1)),
>     >? ? ? ? ? ?CAST_FROM_FN_PTR(address, &os::current_frame));
>     >? ?if (os::is_first_C_frame(&f)) {
>     >? ? ?// stack is not walkable
>     >? ? ?return frame();
>     >? ?} else {
>     >? ? ?return os::get_sender_for_C_frame(&f);
>     >? ?}
>     > #elif defined(_MSC_VER)
>     >? ?return frame();? // cannot walk Windows frames this way.? See
>     os::get_native_stack
>     >? ? ? ? ? ? ? ? ? ? // and os::platform_print_native_stack
>     > #endif
>     > }
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240711/31e86e76/attachment-0001.htm>

From dnsimon at openjdk.org  Thu Jul 11 21:27:51 2024
From: dnsimon at openjdk.org (Doug Simon)
Date: Thu, 11 Jul 2024 21:27:51 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v5]
In-Reply-To: <BL9VfiV5JAdKbG-6FyvPjxE20A8zd6W6xJbbIgphzvc=.f5af6373-f7bf-4df8-bd31-a3a801c373ad@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
 <BL9VfiV5JAdKbG-6FyvPjxE20A8zd6W6xJbbIgphzvc=.f5af6373-f7bf-4df8-bd31-a3a801c373ad@github.com>
Message-ID: <FCUs5PXLhm2lw2QudWmlM0_ilbPSxNJc9UojiA3wnYg=.bb2d4030-34b7-4011-b773-a4b91c51f8bc@github.com>

On Thu, 11 Jul 2024 20:11:50 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler.
>> 
>> I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3.
>> 
>> Thanks,
>> Patricio
>
> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   add vm.compMode != Xcomp

src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1582:

> 1580:     freeze_result res = entry->is_pinned() ? freeze_pinned_cs : freeze_pinned_monitor;
> 1581:     log_develop_trace(continuations)("=== end of freeze (fail %d)", res);
> 1582:     // Avoid Thread.yield() loops without safepoint polls (see 8335269).

Is an explicit reference to a JBS issue id like this still recommended practice? After all, it will be a prefix in the merged commit message.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1674711298

From pchilanomate at openjdk.org  Thu Jul 11 21:35:27 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Thu, 11 Jul 2024 21:35:27 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v6]
In-Reply-To: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
Message-ID: <lrcx4n_WnfGbmtYORBWqjQzBuDscQdyr5OFTmMLexko=.babb7658-ebba-49c2-ae5d-fc3d158ea7db@github.com>

> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler.
> 
> I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3.
> 
> Thanks,
> Patricio

Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:

  remove JBS id reference

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20016/files
  - new: https://git.openjdk.org/jdk/pull/20016/files/2a8b7076..1ea1a06a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20016&range=04-05

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20016.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20016/head:pull/20016

PR: https://git.openjdk.org/jdk/pull/20016

From pchilanomate at openjdk.org  Thu Jul 11 21:35:28 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Thu, 11 Jul 2024 21:35:28 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v5]
In-Reply-To: <FCUs5PXLhm2lw2QudWmlM0_ilbPSxNJc9UojiA3wnYg=.bb2d4030-34b7-4011-b773-a4b91c51f8bc@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
 <BL9VfiV5JAdKbG-6FyvPjxE20A8zd6W6xJbbIgphzvc=.f5af6373-f7bf-4df8-bd31-a3a801c373ad@github.com>
 <FCUs5PXLhm2lw2QudWmlM0_ilbPSxNJc9UojiA3wnYg=.bb2d4030-34b7-4011-b773-a4b91c51f8bc@github.com>
Message-ID: <TbLM-2xeEDUaFNbsBASWoStGUudUVVDn_I9TJ1ZXwic=.90a502c9-e58f-4d08-bd79-cf5dc1471da4@github.com>

On Thu, 11 Jul 2024 21:25:12 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

>> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   add vm.compMode != Xcomp
>
> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1582:
> 
>> 1580:     freeze_result res = entry->is_pinned() ? freeze_pinned_cs : freeze_pinned_monitor;
>> 1581:     log_develop_trace(continuations)("=== end of freeze (fail %d)", res);
>> 1582:     // Avoid Thread.yield() loops without safepoint polls (see 8335269).
> 
> Is an explicit reference to a JBS issue id like this still recommended practice? After all, it will be a prefix in the merged commit message.

Right, I removed it from the comment.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20016#discussion_r1674716191

From vlivanov at openjdk.org  Thu Jul 11 22:33:51 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 11 Jul 2024 22:33:51 GMT
Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and
 Math.min(long,long)
In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com>
References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com>
Message-ID: <yVFbVwZTQCEgaPhy1gw8MPd3RaqlMxxwPFNpxm2SCfs=.aef2f2d3-527d-4218-b307-d49bd217f59e@github.com>

On Tue, 9 Jul 2024 12:07:37 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance.
> 
> Currently vectorization does not kick in for loops containing either of these calls because of the following error:
> 
> 
> VLoop::check_preconditions: failed: control flow in loop not allowed
> 
> 
> The control flow is due to the java implementation for these methods, e.g.
> 
> 
> public static long max(long a, long b) {
>     return (a >= b) ? a : b;
> }
> 
> 
> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively.
> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization.
> E.g.
> 
> 
> SuperWord::transform_loop:
>     Loop: N518/N126  counted [int,int),+4 (1025 iters)  main has_sfpt strip_mined
>  518  CountedLoop  === 518 246 126  [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21)
> 
> 
> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1):
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR
>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>                                                          1     1     0     0
> ==============================
> TEST SUCCESS
> 
> long min   1155
> long max   1173
> 
> 
> After the patch, on darwin/aarch64 (M1):
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR
>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>                                                          1     1     0     0
> ==============================
> TEST SUCCESS
> 
> long min   1042
> long max   1042
> 
> 
> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes.
> Therefore, it still relies on the macro expansion to transform those into CMoveL.
> 
> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results:
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR
>    jtreg:test/hotspot/jtreg:tier1                     2500  2500     0     0
>>> jtreg:test/jdk:tier1                     ...

Overall, looks fine.

So, there will be `inline_min_max`, `inline_fp_min_max`, and `inline_long_min_max` which slightly vary. I'd prefer to see them unified. (Or, at least, enhance `inline_min_max` to cover `minL`/maxL` cases).

Also, it's a bit confusing to see int variants names w/o basic type (`_min`/`_minL` vs `_minI`/`_minL`).  Please, clean it up along the way. (FTR I'm also fine handling the renaming as a separate change.)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2224062122

From vlivanov at openjdk.org  Fri Jul 12 00:01:56 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Fri, 12 Jul 2024 00:01:56 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter
In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
Message-ID: <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>

On Tue, 2 Jul 2024 14:52:09 GMT, Andrew Haley <aph at openjdk.org> wrote:

> This patch expands the use of a hash table for secondary superclasses
> to the interpreter, C1, and runtime. It also adds a C2 implementation
> of hashed lookup in cases where the superclass isn't known at compile
> time.
> 
> HotSpot shared runtime
> ----------------------
> 
> Building hashed secondary tables is now unconditional. It takes very
> little time, and now that the shared runtime always has the tables, it
> might as well take advantage of them. The shared code is easier to
> follow now, I think.
> 
> There might be a performance issue with x86-64 in that we build
> HotSpot for a default x86-64 target that does not support popcount.
> This means that HotSpot C++ runtime on x86 always uses a software
> emulation for popcount, even though the vast majority of machines made
> for the past 20 years can do popcount in a single instruction. It
> wouldn't be terribly hard to do something about that.
> 
> Having said that, the software popcount is really not bad.
> 
> x86
> ---
> 
> x86 is rather tricky, because we still support
> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
> well as 32- and 64-bit ports. There's some further complication in
> that only `RCX` can be used as a shift count, so there's some register
> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
> rather gnarly, with multiple levels of conditionals at compile time
> and runtime.
> 
> AArch64
> -------
> 
> AArch64 is considerably more straightforward. We always have a
> popcount instruction and (thankfully) no 32-bit code to worry about.
> 
> Generally
> ---------
> 
> I would dearly love simply to rip out the "old" secondary supers cache
> support, but I've left it in just in case someone has a performance
> regression.
> 
> The versions of `MacroAssembler::lookup_secondary_supers_table` that
> work with variable superclasses don't take a fixed set of temp
> registers, and neither do they call out to to a slow path subroutine.
> Instead, the slow patch is expanded inline.
> 
> I don't think this is necessarily bad. Apart from the very rare cases
> where C2 can't determine the superclass to search for at compile time,
> this code is only used for generating stubs, and it seemed to me
> ridiculous to have stubs calling other stubs.
> 
> I've followed the guidance from @iwanowww not to obsess too much about
> the performance of C1-compiled secondary supers lookups, and to prefer
> simplicity over absolute performance. Nonetheless, this is a
> complicated patch that touches many areas.

Looks very good, Andrew!

Some comments on minor things follow.

src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1433:

> 1431: 
> 1432:   // Don't check secondary_super_cache
> 1433:   if (super_check_offset.is_register()

Do you see any effects from this particular change?

It adds a runtime check on the fast path for all subtype checks (irrespective of whether it checks primary or secondary super). Moreover, the very same check is performed after primary super slot is checked.

Unless `_secondary_super_cache` field is removed, unconditionally checking the slot at `super_check_offset` is benign.

src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1040:

> 1038: 
> 1039:   // Secondary subtype checking
> 1040:   void lookup_secondary_supers_table(Register sub_klass,

While browsing the code, I noticed that it's far from evident at call sites which overload is used (especially with so many arguments). Does it make sense to avoid method overloads here and use distinct method names instead?

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 1981:

> 1979:       __ load_klass(r19_klass, copied_oop);// query the object klass
> 1980: 
> 1981:       BLOCK_COMMENT("type_check:");

Why don't you move it inside `generate_type_check`?

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4781:

> 4779:                                                    Label* L_success,
> 4780:                                                    Label* L_failure) {
> 4781:   if (! UseSecondarySupersTable) {

Any particular reason to keep the condition negated? 

(Here and in general. There are multiple places where `!UseSecondarySupersTable` is used.)

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4810:

> 4808:                                                          Label* L_success,
> 4809:                                                          Label* L_failure) {
> 4810:   // NB! Callers may assume that, when temp2_reg is a valid register,

Oh, that's a subtle point... Can we make it more evident at call sites?

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5062:

> 5060: 
> 5061: #ifdef DEBUG
> 5062:   call_VM_leaf_base((address)&poo, /*number_of_arguments*/0);

A leftover from debugging?

src/hotspot/share/memory/universe.cpp line 443:

> 441: 
> 442:     {
> 443:       Universe::_the_array_interfaces_bitmap = Klass::compute_secondary_supers_bitmap(_the_array_interfaces_array);

Cleanup idea: remove `Universe::` prefixes. The rest of the method don't use qualified names for class members.

src/hotspot/share/oops/instanceKlass.cpp line 1410:

> 1408:     return nullptr;
> 1409:   } else if (num_extra_slots == 0) {
> 1410:     if (num_extra_slots == 0 && interfaces->length() <= 1) {

Since `secondary_supers` are hashed unconditionally now,  is `interfaces->length() <= 1` check still needed?

src/hotspot/share/oops/instanceKlass.cpp line 3524:

> 3522: 
> 3523:   st->print(BULLET"secondary supers: "); secondary_supers()->print_value_on(st); st->cr();
> 3524:   {

Any particular reason to keep brackets around `hash_slot` and `bitmap`?

src/hotspot/share/oops/klass.cpp line 175:

> 173:     if (secondary_supers()->at(i) == k) {
> 174:       if (UseSecondarySupersCache) {
> 175:         ((Klass*)this)->set_secondary_super_cache(k);

Does it make sense to assert `UseSecondarySupersCache` in `Klass::set_secondary_super_cache()`?

src/hotspot/share/oops/klass.cpp line 284:

> 282: // which doesn't zero out the memory before calling the constructor.
> 283: Klass::Klass(KlassKind kind) : _kind(kind),
> 284:                                _bitmap(SECONDARY_SUPERS_BITMAP_FULL),

I like the idea, but what are the benefits of initializing `_bitmap` separately from `_secondary_supers`?

src/hotspot/share/oops/klass.cpp line 469:

> 467: #endif
> 468: 
> 469:   bitmap = hash_secondary_supers(secondary_supers, /*rewrite=*/true); // rewrites freshly allocated array

I like that hashing is performed unconditionally now. 

Looks like you can remove `UseSecondarySupersTable`-specific CDS support (in `filemap.cpp`). CDS archive should unconditionally contain hashed tables.

src/hotspot/share/oops/klass.inline.hpp line 117:

> 115: }
> 116: 
> 117: inline bool Klass::search_secondary_supers(Klass *k) const {

I see you moved `Klass::search_secondary_supers` in `klass.inline.hpp`, but I'm not sure how it interacts with `Klass::is_subtype_of` (the sole caller) being declared in `klass.hpp`. 

Will the inlining still happen if `Klass::is_subtype_of()` callers include `klass.hpp`?

src/hotspot/share/oops/klass.inline.hpp line 122:

> 120:     return true;
> 121: 
> 122:   bool result = lookup_secondary_supers_table(k);

Should `UseSecondarySupersTable` affect `Klass::search_secondary_supers` as well?

-------------

PR Review: https://git.openjdk.org/jdk/pull/19989#pullrequestreview-2161098896
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674812600
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674823519
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674790160
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667194729
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674832710
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667037608
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667194339
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674792021
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667193916
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667190974
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667192207
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667192783
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674806107
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1667193323

From vlivanov at openjdk.org  Fri Jul 12 00:01:56 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Fri, 12 Jul 2024 00:01:56 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter
In-Reply-To: <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
Message-ID: <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com>

On Thu, 11 Jul 2024 23:17:10 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1433:
> 
>> 1431: 
>> 1432:   // Don't check secondary_super_cache
>> 1433:   if (super_check_offset.is_register()
> 
> Do you see any effects from this particular change?
> 
> It adds a runtime check on the fast path for all subtype checks (irrespective of whether it checks primary or secondary super). Moreover, the very same check is performed after primary super slot is checked.
> 
> Unless `_secondary_super_cache` field is removed, unconditionally checking the slot at `super_check_offset` is benign.

BTW `MacroAssembler::check_klass_subtype_fast_path` deserves a cleanup: `super_check_offset` can be safely turned into `Register` thus eliminating the code guarded by `super_check_offset.is_register() == false`.

> src/hotspot/share/oops/instanceKlass.cpp line 1410:
> 
>> 1408:     return nullptr;
>> 1409:   } else if (num_extra_slots == 0) {
>> 1410:     if (num_extra_slots == 0 && interfaces->length() <= 1) {
> 
> Since `secondary_supers` are hashed unconditionally now,  is `interfaces->length() <= 1` check still needed?

Also, `num_extra_slots == 0` check is redundant.

> src/hotspot/share/oops/klass.cpp line 284:
> 
>> 282: // which doesn't zero out the memory before calling the constructor.
>> 283: Klass::Klass(KlassKind kind) : _kind(kind),
>> 284:                                _bitmap(SECONDARY_SUPERS_BITMAP_FULL),
> 
> I like the idea, but what are the benefits of initializing `_bitmap` separately from `_secondary_supers`?

Another observation while browsing the code: `_secondary_supers_bitmap` would be a better name. (Same considerations apply to `_hash_slot`.)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674815196
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674798719
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674828164

From tanksherman27 at gmail.com  Fri Jul 12 00:57:39 2024
From: tanksherman27 at gmail.com (Julian Waters)
Date: Fri, 12 Jul 2024 08:57:39 +0800
Subject: Where does VMError::print_native_stack and
 os::get_sender_for_C_frame load/use the frame pointer?
Message-ID: <CAP2b4GM0Ka6BD6nVts7yhs7RRpFNXbZx22egmvw4q0BRRN3r9A@mail.gmail.com>

Yep, gcc does indeed save the frame pointer as expected by Java, but it
still isn't working (Only prints 1 frame then quits). I would start
debugging, but the debugger on my Windows device is down at the moment.
Sigh...

best regards,
Julian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240712/cce5e92a/attachment-0001.htm>

From gli at openjdk.org  Fri Jul 12 01:18:00 2024
From: gli at openjdk.org (Guoxiong Li)
Date: Fri, 12 Jul 2024 01:18:00 GMT
Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and
 VM_ParallelGCSystemGC [v3]
In-Reply-To: <JZIUoUDNkyU49Kkaso_UnStuexRVA-yCmrT2Dt-rfsY=.873c408d-6478-4f7f-87d5-7cbeccb20714@github.com>
References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
 <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com>
 <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com>
 <JZIUoUDNkyU49Kkaso_UnStuexRVA-yCmrT2Dt-rfsY=.873c408d-6478-4f7f-87d5-7cbeccb20714@github.com>
Message-ID: <2u2YExr_N6N4lae-i_FV8JVbEOT6cYzOHAftkb2BOmY=.f87ac878-9d9a-45c0-a20d-a5ffb1cabcad@github.com>

On Thu, 11 Jul 2024 18:09:24 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 478:
>> 
>>> 476: 
>>> 477:     const bool clear_all_soft_refs = true;
>>> 478:     do_full_collection_no_gc_locker(clear_all_soft_refs);
>> 
>> If the young collection succeeded in method `collect_at_safepoint`. The normal full collection won't run in `collect_at_safepoint`. If the successful young collection didn't release any memory (or only released little memory but not enough for allocation), the allocation in line 462 will fail too. Then a full collection with maximum compaction will be run. It is strange. In my opinion, I think the steps look like below:
>> 
>> 1. allocation
>> 2. young collection
>> 3. allocation
>> 4. normal full collection
>> 5. allocation
>> 6. maximum full collection
>> 7. allocation
>> 8. OOM
>> 
>> But in current patch, the step 4-5 may be skipped.
>
>> If the successful young collection didn't release any memory (or only released little memory but not enough for allocation),
> 
> A successful young-gc often leave young-gen completely empty. Otherwise, max-compaction full-gc should be run -- there is little benefit of running non-max-compaction full-gc if old-gen is too packed to hold all young-gen objs.

Thanks for your explanation. I am OK with the current solution now.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674923459

From gli at openjdk.org  Fri Jul 12 01:24:58 2024
From: gli at openjdk.org (Guoxiong Li)
Date: Fri, 12 Jul 2024 01:24:58 GMT
Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and
 VM_ParallelGCSystemGC [v3]
In-Reply-To: <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com>
References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
 <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com>
 <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com>
Message-ID: <0bOEaMQ75JB0T22pbSsMEP1UV7lh1pUHoGjgTkold-w=.bfdbb2f7-010d-4d24-ab79-7fc40aadc929@github.com>

On Thu, 11 Jul 2024 15:40:01 GMT, Guoxiong Li <gli at openjdk.org> wrote:

>> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - review
>>  - Merge branch 'master' into pgc-vm-operation
>>  - pgc-vm-operation
>
> src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 446:
> 
>> 444:   }
>> 445:   return result;   // Could be null if we are out of space.
>> 446: }
> 
> I notice the method `PSOldGen::allocate` can expand the size of the old gen, but the method `PSYoungGen::allocate` can't expand the size of the young gen. It is similar to a bug [1] in Serial. Fortunately, the size of the young generation can be resized during Parallel GC if the option `UseAdaptiveSizePolicy` is `true`. When the `UseAdaptiveSizePolicy` is set to `false` manually by the user, I suspect it is a bug in Parallel because of the unexpanded young generation size.
> 
> [1] https://bugs.openjdk.org/browse/JDK-8333386

@albertnetymk Do you think whether we need to expand young generation during allocation (both Serial and Parallel)? In Serial, `UseAdaptiveSizePolicy` is not used, so it is indeed a bug in Serial (the young generation can't be resized and is always the initial size).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1674933630

From duke at openjdk.org  Fri Jul 12 01:43:14 2024
From: duke at openjdk.org (Shaojin Wen)
Date: Fri, 12 Jul 2024 01:43:14 GMT
Subject: RFR: 8336278: Micro-optimize Replace String.format("%n") to
 System.lineSeparator
Message-ID: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com>

There are three places in the JDK code where String.format("%n") is used. This is actually equivalent to System.lineSeparator and does not require the implementation of String.format.

-------------

Commit messages:
 - replace String.format("%n") to System.lineSeparator()

Changes: https://git.openjdk.org/jdk/pull/20149/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20149&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8336278
  Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/20149.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20149/head:pull/20149

PR: https://git.openjdk.org/jdk/pull/20149

From dholmes at openjdk.org  Fri Jul 12 02:23:06 2024
From: dholmes at openjdk.org (David Holmes)
Date: Fri, 12 Jul 2024 02:23:06 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed
Message-ID: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>

Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log.

The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 0 to O_BUFLEN.

The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless.

If a string's length exceeds `max_length` then we print it as follows:

"< first max_length/2 characters> ... <last max_length/2 characters>" (abridged)

For example if we print "ABCDE" with a max_length of 4 then the output is literally:

"AB ... DE" (abridged)

The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive).

For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`.

Testing:
 - new test added for validation purposes
 - tiers 1 - 3 as sanity testing

Thanks

-------------

Commit messages:
 - Improve flag description
 - Fix indent
 - Merge branch 'master' into 8325945-print-string
 - 8325945: Error reporting should limit the number of String characters printed

Changes: https://git.openjdk.org/jdk/pull/20150/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20150&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8325945
  Stats: 154 lines in 6 files changed: 151 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/20150.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20150/head:pull/20150

PR: https://git.openjdk.org/jdk/pull/20150

From dholmes at openjdk.org  Fri Jul 12 02:59:50 2024
From: dholmes at openjdk.org (David Holmes)
Date: Fri, 12 Jul 2024 02:59:50 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v6]
In-Reply-To: <lrcx4n_WnfGbmtYORBWqjQzBuDscQdyr5OFTmMLexko=.babb7658-ebba-49c2-ae5d-fc3d158ea7db@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
 <lrcx4n_WnfGbmtYORBWqjQzBuDscQdyr5OFTmMLexko=.babb7658-ebba-49c2-ae5d-fc3d158ea7db@github.com>
Message-ID: <7vypV2vgUGYMeKbyw5--Vhe7p0bxby0eH5j1sthpZso=.3fcf07a0-7680-4b22-ab73-0e77147d7a08@github.com>

On Thu, 11 Jul 2024 21:35:27 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler.
>> 
>> I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3.
>> 
>> Thanks,
>> Patricio
>
> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove JBS id reference

Nothing further from me. Seems reasonable.

Thanks

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20016#pullrequestreview-2173576817

From jwaters at openjdk.org  Fri Jul 12 03:24:58 2024
From: jwaters at openjdk.org (Julian Waters)
Date: Fri, 12 Jul 2024 03:24:58 GMT
Subject: RFR: 8316930: HotSpot should use noexcept instead of throw() [v5]
In-Reply-To: <9k00GYxtEiNBgrtIsIYJUIdwwPjynEm6aONdchZreP4=.0ad54916-180e-4317-8385-e339595a340a@github.com>
References: <kc_cq_sBCqn-iAwHCEaTqgMVYrnT6tKsk3SZnD_qP-s=.1b5d24dd-a925-4f6d-aefb-67b4df6bddac@github.com>
 <9k00GYxtEiNBgrtIsIYJUIdwwPjynEm6aONdchZreP4=.0ad54916-180e-4317-8385-e339595a340a@github.com>
Message-ID: <EHOth-ipPAwv50rX0JRcBl3rP_z8mpPbY3TOvEnMHyU=.02579b23-fab1-4b49-9f84-35f95d603efe@github.com>

On Tue, 6 Feb 2024 07:04:00 GMT, Julian Waters <jwaters at openjdk.org> wrote:

>> throw() has been deprecated since C++11 alongside dynamic exception specifications, we should replace all instances of it with noexcept to prepare HotSpot for later versions of C++
>
> Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits:
> 
>  - Merge branch 'openjdk:master' into noexcept
>  - Merge branch 'openjdk:master' into noexcept
>  - Typo in GensrcAdlc.gmk
>  - Merge branch 'openjdk:master' into noexcept
>  - Merge branch 'master' into noexcept
>  - ic in compiledIC.hpp
>  - Revert compiledIC.cpp
>  - Revert compiledIC.hpp
>  - Partially Revert parse.hpp
>  - Merge branch 'master' into noexcept
>  - ... and 4 more: https://git.openjdk.org/jdk/compare/9ee9f288...b73a6882

I would like to address this soon, but will probably need help writing noexcept for the Style Guide

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15910#issuecomment-2224351303

From darcy at openjdk.org  Fri Jul 12 04:07:50 2024
From: darcy at openjdk.org (Joe Darcy)
Date: Fri, 12 Jul 2024 04:07:50 GMT
Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and
 Math.min(long,long)
In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com>
References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com>
Message-ID: <cDo0IswnQEKfYYkgGcx1DlmNCb25Kg7EUeqPQEosyS8=.e92172e2-a4f3-481f-9c90-c04dbf3558fb@github.com>

On Tue, 9 Jul 2024 12:07:37 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance.
> 
> Currently vectorization does not kick in for loops containing either of these calls because of the following error:
> 
> 
> VLoop::check_preconditions: failed: control flow in loop not allowed
> 
> 
> The control flow is due to the java implementation for these methods, e.g.
> 
> 
> public static long max(long a, long b) {
>     return (a >= b) ? a : b;
> }
> 
> 
> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively.
> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization.
> E.g.
> 
> 
> SuperWord::transform_loop:
>     Loop: N518/N126  counted [int,int),+4 (1025 iters)  main has_sfpt strip_mined
>  518  CountedLoop  === 518 246 126  [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21)
> 
> 
> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1):
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR
>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>                                                          1     1     0     0
> ==============================
> TEST SUCCESS
> 
> long min   1155
> long max   1173
> 
> 
> After the patch, on darwin/aarch64 (M1):
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR
>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>                                                          1     1     0     0
> ==============================
> TEST SUCCESS
> 
> long min   1042
> long max   1042
> 
> 
> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes.
> Therefore, it still relies on the macro expansion to transform those into CMoveL.
> 
> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results:
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR
>    jtreg:test/hotspot/jtreg:tier1                     2500  2500     0     0
>>> jtreg:test/jdk:tier1                     ...

Marked as reviewed by darcy (Reviewer).

Core libs changes looks fine; bumping review count for the remainder of the PR.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20098#pullrequestreview-2173771454
PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2224456985

From dholmes at openjdk.org  Fri Jul 12 04:15:56 2024
From: dholmes at openjdk.org (David Holmes)
Date: Fri, 12 Jul 2024 04:15:56 GMT
Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers
 [v2]
In-Reply-To: <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com>
References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
 <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com>
Message-ID: <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com>

On Thu, 11 Jul 2024 08:58:11 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> All around Hotspot, we have calls to `method->is_initializer()`. That methods test for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor (instance initializer), not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. Often we get lucky by never being exposed to static initializers on particular paths.
>> 
>> I would like to sharpen this. I went back and forth, and ultimately decided to remove `is_initializer` completely to avoid future confusion, and rewrite the uses appropriately.
>> 
>> Additional testing:
>>  - [x] Linux AArch64 server fastdebug, `all` (includes Fuzzer and CTW tests)
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Indenting

It is evident that people have been unfamiliar/sloppy with this API. This change should help prevent that in future. I have a concern about one change.

Thanks

src/hotspot/share/classfile/javaClasses.cpp line 3018:

> 3016:   int flags = (jushort)( m->access_flags().as_short() & JVM_RECOGNIZED_METHOD_MODIFIERS );
> 3017:   if (m->is_object_initializer()) {
> 3018:     flags |= java_lang_invoke_MemberName::MN_IS_CONSTRUCTOR;

I'm going to assume that `clinit` would already get filtered out at some point otherwise this would be a change in behaviour.

src/hotspot/share/runtime/reflection.cpp line 772:

> 770:   assert(!method()->is_object_initializer() &&
> 771:          (for_constant_pool_access || !method()->is_static_initializer()),
> 772:          "should call new_constructor instead");

Nit: existing -The assert message isn't really correct

-------------

PR Review: https://git.openjdk.org/jdk/pull/20120#pullrequestreview-2173741407
PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1675207989
PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1675249560

From dholmes at openjdk.org  Fri Jul 12 04:18:50 2024
From: dholmes at openjdk.org (David Holmes)
Date: Fri, 12 Jul 2024 04:18:50 GMT
Subject: RFR: 8336163: Remove declarations of some debug-only methods in
 release build [v2]
In-Reply-To: <4BJhfsLXcD6wvT8iZNXOeQ3IL2llZcqCPlCbesjlH4U=.34316abb-b4b0-44f9-a4d2-0ed1c1800ea2@github.com>
References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com>
 <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com>
 <4BJhfsLXcD6wvT8iZNXOeQ3IL2llZcqCPlCbesjlH4U=.34316abb-b4b0-44f9-a4d2-0ed1c1800ea2@github.com>
Message-ID: <seGC926uzjtTonUUes139jASOT8QaEGo38y9UkDLgRI=.52e038d9-3564-4deb-b272-7d978a744fff@github.com>

On Thu, 11 Jul 2024 07:26:52 GMT, Qizheng Xing <qxing at openjdk.org> wrote:

>> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update copyright.
>
> Thanks for the review.

@MaxXSoft  hotspot changes require two reviewers.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20131#issuecomment-2224509238

From aboldtch at openjdk.org  Fri Jul 12 05:57:30 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Fri, 12 Jul 2024 05:57:30 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6]
In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
Message-ID: <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>

> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
> 
> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
> 
> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
> 
> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
> 
> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
> 
> # Cleanups
> 
> Cleaned up displaced header usage for:
>   * BasicLock
>     * Contains some Zero changes
>     * Renames one exported JVMCI field
>   * ObjectMonitor
>     * Updates comments and tests consistencies
> 
> # Refactoring
> 
> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
> 
> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
> 
> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
> 
> # LightweightSynchronizer
> 
> Working on adapting and incorporating the following section as a comment in the source code
> 
> ## Fast Locking
> 
>   CAS on locking bits in markWord. 
>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
> 
>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
> 
>   If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf...

Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:

  Update arguments.cpp

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20067/files
  - new: https://git.openjdk.org/jdk/pull/20067/files/a207544b..15997bc3

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=04-05

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20067.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067

PR: https://git.openjdk.org/jdk/pull/20067

From qxing at openjdk.org  Fri Jul 12 07:02:52 2024
From: qxing at openjdk.org (Qizheng Xing)
Date: Fri, 12 Jul 2024 07:02:52 GMT
Subject: RFR: 8336163: Remove declarations of some debug-only methods in
 release build [v2]
In-Reply-To: <seGC926uzjtTonUUes139jASOT8QaEGo38y9UkDLgRI=.52e038d9-3564-4deb-b272-7d978a744fff@github.com>
References: <r3075sVKxO34FohH4gtlidTGqmu5y_0qL4_TU3DdbG8=.fc8b604d-bc8a-48a5-a8a7-8fecbd5d3c4f@github.com>
 <mqelp_tuVfW5bHGxY7EDkEPZYB6PR3Ogyi2OVXscA60=.e8610c4a-4352-4b5d-af18-1fbf49cfd7dd@github.com>
 <4BJhfsLXcD6wvT8iZNXOeQ3IL2llZcqCPlCbesjlH4U=.34316abb-b4b0-44f9-a4d2-0ed1c1800ea2@github.com>
 <seGC926uzjtTonUUes139jASOT8QaEGo38y9UkDLgRI=.52e038d9-3564-4deb-b272-7d978a744fff@github.com>
Message-ID: <o73U9AZ5aYUFRuptiTid2ygpbEDMXZRhY2V87_0lAO0=.8eba6994-91c3-4b36-9aec-0fcc79cb11a7@github.com>

On Fri, 12 Jul 2024 04:15:48 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Thanks for the review.
>
> @MaxXSoft  hotspot changes require two reviewers.

@dholmes-ora Sorry.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20131#issuecomment-2224950909

From dnsimon at openjdk.org  Fri Jul 12 07:37:50 2024
From: dnsimon at openjdk.org (Doug Simon)
Date: Fri, 12 Jul 2024 07:37:50 GMT
Subject: RFR: 8336278: Micro-optimize Replace String.format("%n") to
 System.lineSeparator
In-Reply-To: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com>
References: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com>
Message-ID: <pgFdiXYSe8Y3DE9EQl1dl0z-xd9YODEu9we1VcqJULM=.f853b262-2554-42aa-8261-f02a27cb2ab3@github.com>

On Thu, 11 Jul 2024 22:45:47 GMT, Shaojin Wen <duke at openjdk.org> wrote:

> There are three places in the JDK code where String.format("%n") is used. This is actually equivalent to System.lineSeparator and does not require the implementation of String.format.

Looks good and trivial to me.

-------------

Marked as reviewed by dnsimon (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20149#pullrequestreview-2174075257

From shade at openjdk.org  Fri Jul 12 07:57:50 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 12 Jul 2024 07:57:50 GMT
Subject: RFR: 8336278: Micro-optimize Replace String.format("%n") to
 System.lineSeparator
In-Reply-To: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com>
References: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com>
Message-ID: <iewRfB_a0Sy1rtMNowp945-8lkGaiuPms-KeQBHLlEo=.d79da08b-7614-4f51-b41d-139ef024ad53@github.com>

On Thu, 11 Jul 2024 22:45:47 GMT, Shaojin Wen <duke at openjdk.org> wrote:

> There are three places in the JDK code where String.format("%n") is used. This is actually equivalent to System.lineSeparator and does not require the implementation of String.format.

Hah! Yes, it makes no sense to call into `String.format` to just get the line separator.

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20149#pullrequestreview-2174110063

From shade at openjdk.org  Fri Jul 12 09:17:22 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 12 Jul 2024 09:17:22 GMT
Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers
 [v3]
In-Reply-To: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
Message-ID: <swBWpqAm_k6hHjGcwdNBowWfdBpksxtD63PiGp0KI1c=.ad02279c-ed66-40a0-9b01-379d4410a16c@github.com>

> All around Hotspot, we have calls to `method->is_initializer()`. That methods test for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor (instance initializer), not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. Often we get lucky by never being exposed to static initializers on particular paths.
> 
> I would like to sharpen this. I went back and forth, and ultimately decided to remove `is_initializer` completely to avoid future confusion, and rewrite the uses appropriately.
> 
> Additional testing:
>  - [x] Linux AArch64 server fastdebug, `all` (includes Fuzzer and CTW tests)

Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:

  Touch up assert messages

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20120/files
  - new: https://git.openjdk.org/jdk/pull/20120/files/c5da5ebd..a18f7a46

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20120&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20120&range=01-02

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20120.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20120/head:pull/20120

PR: https://git.openjdk.org/jdk/pull/20120

From shade at openjdk.org  Fri Jul 12 09:17:23 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 12 Jul 2024 09:17:23 GMT
Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers
 [v2]
In-Reply-To: <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com>
References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
 <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com>
 <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com>
Message-ID: <0j_XZ2e84ADGz8jxk21pFyF0QNhubV0i7sVi5sxnSyg=.7281e6d1-bf24-49f1-96a6-8284c4c9f90d@github.com>

On Fri, 12 Jul 2024 03:59:06 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Indenting
>
> src/hotspot/share/classfile/javaClasses.cpp line 3018:
> 
>> 3016:   int flags = (jushort)( m->access_flags().as_short() & JVM_RECOGNIZED_METHOD_MODIFIERS );
>> 3017:   if (m->is_object_initializer()) {
>> 3018:     flags |= java_lang_invoke_MemberName::MN_IS_CONSTRUCTOR;
> 
> I'm going to assume that `clinit` would already get filtered out at some point otherwise this would be a change in behaviour.

No, it is not filtered, we still have `clinit`-s on this path. In the initial version https://github.com/openjdk/jdk/pull/20120/commits/6769cfe609849aa9ed0985dcbecb2b0aa24bca03 I caught the assert in many tests, mostly in stack traces generation. 

Yes, this changes the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior.

> src/hotspot/share/runtime/reflection.cpp line 772:
> 
>> 770:   assert(!method()->is_object_initializer() &&
>> 771:          (for_constant_pool_access || !method()->is_static_initializer()),
>> 772:          "should call new_constructor instead");
> 
> Nit: existing -The assert message isn't really correct

Yeah, it is a bit odd. I thought to leave the messages alone, but we can massage them as well. Should be done in new commit.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1675564908
PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1675565003

From rkennke at openjdk.org  Fri Jul 12 09:55:52 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Fri, 12 Jul 2024 09:55:52 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6]
In-Reply-To: <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
Message-ID: <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com>

On Fri, 12 Jul 2024 05:57:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update arguments.cpp

I've reviewed and tested some earlier versions of this change in the context of Lilliput, and haven't encountered any showstopping problems. Very nice work!

When you say 'This patch has been evaluated to be performance neutral when UseObjectMonitorTable is turned off (the default).' - what does the performance look like with +UOMT? How does it compare to -UOMT?

I've only reviewed the platform-specific changes so far. Mostly looks good, I only have some relatively minor remarks.

Will review the shared code changes separately.

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 318:

> 316: 
> 317:       // Loop after unrolling, advance iterator.
> 318:       increment(t3_t, in_bytes(OMCache::oop_to_oop_difference()));

Maybe I am misreading this but... in the unroll loop you avoid emitting the increment on the last iteration, but then you emit it explicitely here? Wouldn't it be cleaner to do it in the unroll loop always and elide the explicit increment after loop?

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 343:

> 341:     const Register t3_owner = t3;
> 342:     const ByteSize monitor_tag = in_ByteSize(UseObjectMonitorTable ? 0 : checked_cast<int>(markWord::monitor_value));
> 343:     const Address owner_address{t1_monitor, ObjectMonitor::owner_offset() - monitor_tag};

That may be just me, but I found that syntax weird. I first needed to look-up what the {}-initializer actually means. Hiccups like this reduce readability, IMO. I'd prefer the normal ()-init for the Address like we seem to do everywhere else.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 674:

> 672: 
> 673:       // Loop after unrolling, advance iterator.
> 674:       increment(t, in_bytes(OMCache::oop_to_oop_difference()));

Same issue as in aarch64 code.

-------------

Changes requested by rkennke (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2174300266
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675587650
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675597362
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675605009

From ayang at openjdk.org  Fri Jul 12 09:56:51 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Fri, 12 Jul 2024 09:56:51 GMT
Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and
 VM_ParallelGCSystemGC [v3]
In-Reply-To: <0bOEaMQ75JB0T22pbSsMEP1UV7lh1pUHoGjgTkold-w=.bfdbb2f7-010d-4d24-ab79-7fc40aadc929@github.com>
References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
 <CTc1SUPyk4eTQPSB-vU374oKCCvcgLvaM-cPm9qFilk=.67d7d034-5055-429a-948a-d9ec1e834324@github.com>
 <qkTnSCS8GpxLSZsJrN0_QpK4HGeDscPHVHspATH923M=.56c82c79-a314-41b6-b7c6-ca1178e66152@github.com>
 <0bOEaMQ75JB0T22pbSsMEP1UV7lh1pUHoGjgTkold-w=.bfdbb2f7-010d-4d24-ab79-7fc40aadc929@github.com>
Message-ID: <IjVuddD0IemO58P8xHSnFquVaDehOl3OA-0r9kDZjh8=.30b5dcef-ac21-4a3d-a782-d7659b159229@github.com>

On Fri, 12 Jul 2024 01:22:28 GMT, Guoxiong Li <gli at openjdk.org> wrote:

>> src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 446:
>> 
>>> 444:   }
>>> 445:   return result;   // Could be null if we are out of space.
>>> 446: }
>> 
>> I notice the method `PSOldGen::allocate` can expand the size of the old gen, but the method `PSYoungGen::allocate` can't expand the size of the young gen. It is similar to a bug [1] in Serial. Fortunately, the size of the young generation can be resized during Parallel GC if the option `UseAdaptiveSizePolicy` is `true`. When the `UseAdaptiveSizePolicy` is set to `false` manually by the user, I suspect it is a bug in Parallel because of the unexpanded young generation size.
>> 
>> [1] https://bugs.openjdk.org/browse/JDK-8333386
>
> @albertnetymk Do you think whether we need to expand young generation during allocation (both Serial and Parallel)? In Serial, `UseAdaptiveSizePolicy` is not used, so it is indeed a bug in Serial (the young generation can't be resized and is always the initial size).

Due to the internal structure (eden/survivor) of young-gen, it's not super easy to expand young-gen during allocation like old-gen. Need a dedicated ticket to properly evaluate its cost/benefit.

> Serial (the young generation can't be resized and is always the initial size).

That sounds like a definite bug; at least young-gen should be resizable during young-gc/full-gc.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20077#discussion_r1675613933

From rkennke at openjdk.org  Fri Jul 12 10:14:52 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Fri, 12 Jul 2024 10:14:52 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6]
In-Reply-To: <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
Message-ID: <ZrdkZBka-bE3JWmIIozLSyuncmU2cAi8MB-sdZE0ue0=.f8f5c66a-da77-4bad-b8f4-842158312cb4@github.com>

On Fri, 12 Jul 2024 05:57:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update arguments.cpp

Is there a plan to get rid of the UseObjectMonitorTable flag in a future release? Ideally we would have one fast-locking implementation (LW locking) with one OM mapping (+UOMT), right?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2225260710

From eosterlund at openjdk.org  Fri Jul 12 10:18:51 2024
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Fri, 12 Jul 2024 10:18:51 GMT
Subject: RFR: 8329597: C2: Intrinsify Reference.clear
In-Reply-To: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
Message-ID: <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com>

On Thu, 11 Jul 2024 15:28:37 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks.
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [ ] Linux AArch64 server fastdebug, `all`

The reason we did not do this before is that this is not a strong reference store. Strong reference stores with a SATB collector will keep the referent alive, which is typically the exact opposite of what a user wants when they clear a Reference.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2225266939

From shade at openjdk.org  Fri Jul 12 10:24:57 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 12 Jul 2024 10:24:57 GMT
Subject: RFR: 8329597: C2: Intrinsify Reference.clear
In-Reply-To: <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com>
References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
 <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com>
Message-ID: <K2EJ43EXkTgJE0pjwzy50s3BoTAhF1Y2trwHtDzhojQ=.e837c7bc-717c-4826-8cc3-82a2232bc928@github.com>

On Fri, 12 Jul 2024 10:16:13 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> The reason we did not do this before is that this is not a strong reference store. Strong reference stores with a SATB collector will keep the referent alive, which is typically the exact opposite of what a user wants when they clear a Reference.

You mean not doing this store just on the Java side? Yes, I agree, it would be awkward. In intrinsic, we are storing with the same decorators that `JVM_ReferenceClear` is using, which should be good with SATB collectors. Perhaps I am misunderstanding the comment.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2225277261

From aboldtch at openjdk.org  Fri Jul 12 10:54:23 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Fri, 12 Jul 2024 10:54:23 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v7]
In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
Message-ID: <PgX4o-qTT_4gDiZ94towRtB7xs7zkYMcoTpp51iz5vM=.4085c8c0-679a-4e84-8cb0-20bfb9ec80bf@github.com>

> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
> 
> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
> 
> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
> 
> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
> 
> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
> 
> # Cleanups
> 
> Cleaned up displaced header usage for:
>   * BasicLock
>     * Contains some Zero changes
>     * Renames one exported JVMCI field
>   * ObjectMonitor
>     * Updates comments and tests consistencies
> 
> # Refactoring
> 
> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
> 
> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
> 
> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
> 
> # LightweightSynchronizer
> 
> Working on adapting and incorporating the following section as a comment in the source code
> 
> ## Fast Locking
> 
>   CAS on locking bits in markWord. 
>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
> 
>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
> 
>   If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf...

Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:

  Cleanup c2 cache lookup

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20067/files
  - new: https://git.openjdk.org/jdk/pull/20067/files/15997bc3..e1eb8c95

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=05-06

  Stats: 12 lines in 2 files changed: 0 ins; 10 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20067.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067

PR: https://git.openjdk.org/jdk/pull/20067

From aboldtch at openjdk.org  Fri Jul 12 10:54:24 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Fri, 12 Jul 2024 10:54:24 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6]
In-Reply-To: <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
 <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com>
Message-ID: <qk3m4DFlOPmoz1Ke2dUv74782IsXXrzJAlz-Axlvy4o=.29df14df-e3bf-4649-bc36-7d9082c9fdb8@github.com>

On Fri, 12 Jul 2024 09:32:44 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update arguments.cpp
>
> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 318:
> 
>> 316: 
>> 317:       // Loop after unrolling, advance iterator.
>> 318:       increment(t3_t, in_bytes(OMCache::oop_to_oop_difference()));
> 
> Maybe I am misreading this but... in the unroll loop you avoid emitting the increment on the last iteration, but then you emit it explicitely here? Wouldn't it be cleaner to do it in the unroll loop always and elide the explicit increment after loop?

You are correct. It is a leftover from when it was possible to tweak the number of unrolled lookups as well as whether it should loop the tail. Fixed.

> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 343:
> 
>> 341:     const Register t3_owner = t3;
>> 342:     const ByteSize monitor_tag = in_ByteSize(UseObjectMonitorTable ? 0 : checked_cast<int>(markWord::monitor_value));
>> 343:     const Address owner_address{t1_monitor, ObjectMonitor::owner_offset() - monitor_tag};
> 
> That may be just me, but I found that syntax weird. I first needed to look-up what the {}-initializer actually means. Hiccups like this reduce readability, IMO. I'd prefer the normal ()-init for the Address like we seem to do everywhere else.

I see. I tend to prefer uniform initialization as it makes narrowing conversions illegal. 

I remember `uniform initialization` coming up in some previous PR as well. It is really only neccesary for some types of templated code, but it does also makes easier to not make mistakes in the general case (as long as you avoid `std::initializer_list`, which I think we explicitly forbid in our coding guidelines).

I do not recall what the conclusion of that discussion was. But maybe it was that this feature is to exotic and foreign for hotspot.

I prefer it tough. Even if I fail to consistently use it.

> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 674:
> 
>> 672: 
>> 673:       // Loop after unrolling, advance iterator.
>> 674:       increment(t, in_bytes(OMCache::oop_to_oop_difference()));
> 
> Same issue as in aarch64 code.

Fixed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675676768
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675676879
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675677068

From aboldtch at openjdk.org  Fri Jul 12 11:08:53 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Fri, 12 Jul 2024 11:08:53 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6]
In-Reply-To: <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
 <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com>
Message-ID: <n7QM8yrj5JF-VZjcfLG9OnSXGb9Kbtt4uFMrXNMDkJw=.2a91529b-7bcb-4d89-a21f-0917fc0b129d@github.com>

On Fri, 12 Jul 2024 09:53:11 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> When you say 'This patch has been evaluated to be performance neutral when UseObjectMonitorTable is turned off (the default).' - what does the performance look like with +UOMT? How does it compare to -UOMT?

Most benchmarks are unaffected as they do not use any contended locking or wait/notify. Some see improvements and some show regressions. 

The most significant regressions are in `DaCapo-xalan` which is very sensitive to the timing of enter. It seems to rely quite heavily on how fast you can get to `ObjectMonitor::TrySpin` as well as the exact behaviour of this spinning. 

Then there are all the workloads which have not been tested in all these benchmark suites. The hope is to be able to incrementally iterate on the performance of the worst outliers.

> Is there a plan to get rid of the UseObjectMonitorTable flag in a future release? Ideally we would have one fast-locking implementation (LW locking) with one OM mapping (+UOMT), right?

My understanding (and shared hope) is that this is the ambition.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20067#issuecomment-2225341285

From eosterlund at openjdk.org  Fri Jul 12 12:00:56 2024
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Fri, 12 Jul 2024 12:00:56 GMT
Subject: RFR: 8329597: C2: Intrinsify Reference.clear
In-Reply-To: <K2EJ43EXkTgJE0pjwzy50s3BoTAhF1Y2trwHtDzhojQ=.e837c7bc-717c-4826-8cc3-82a2232bc928@github.com>
References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
 <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com>
 <K2EJ43EXkTgJE0pjwzy50s3BoTAhF1Y2trwHtDzhojQ=.e837c7bc-717c-4826-8cc3-82a2232bc928@github.com>
Message-ID: <iFxcPJTPGoxZgIaQKYtEbtg06xXYJewHfSA-f7nbofQ=.37070a3a-681b-4ccb-8857-91be898fd3c9@github.com>

On Fri, 12 Jul 2024 10:22:42 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> > The reason we did not do this before is that this is not a strong reference store. Strong reference stores with a SATB collector will keep the referent alive, which is typically the exact opposite of what a user wants when they clear a Reference.
> 
> You mean not doing this store just on the Java side? Yes, I agree, it would be awkward. In intrinsic, we are storing with the same decorators that `JVM_ReferenceClear` is using, which should be good with SATB collectors. Perhaps I am misunderstanding the comment.

The runtime use of the Access API knows how to resolve an unknown oop ref strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. However, we do not have support for that in the C2 backend. In fact, it does not understand non-strong oop stores at all. Because there hasn't really been a use case for it, other than clearing a Reference. That's the precise reason why we do not have a clear intrinsic; it would have to add that infrastructure.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2225430174

From aph at openjdk.org  Fri Jul 12 12:08:55 2024
From: aph at openjdk.org (Andrew Haley)
Date: Fri, 12 Jul 2024 12:08:55 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6]
In-Reply-To: <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
 <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com>
Message-ID: <PIXggQ3dHB2fzB-FFKqZv88c0HFbbgj8U3uMEH9LJWM=.7521eb79-276a-4afc-9081-bd99dadac5c6@github.com>

On Fri, 12 Jul 2024 09:40:45 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update arguments.cpp
>
> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 343:
> 
>> 341:     const Register t3_owner = t3;
>> 342:     const ByteSize monitor_tag = in_ByteSize(UseObjectMonitorTable ? 0 : checked_cast<int>(markWord::monitor_value));
>> 343:     const Address owner_address{t1_monitor, ObjectMonitor::owner_offset() - monitor_tag};
> 
> That may be just me, but I found that syntax weird. I first needed to look-up what the {}-initializer actually means. Hiccups like this reduce readability, IMO. I'd prefer the normal ()-init for the Address like we seem to do everywhere else.

I agree with @rkennke . When we wrote the AArch64 MacroAssembler we were concentrating on readability and familiarity, and this separate declaration and use, with unusual syntax, IMO makes life harder for the reader.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675751592

From gli at openjdk.org  Fri Jul 12 12:17:52 2024
From: gli at openjdk.org (Guoxiong Li)
Date: Fri, 12 Jul 2024 12:17:52 GMT
Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and
 VM_ParallelGCSystemGC [v4]
In-Reply-To: <viPT0XNzMpheGP6HtlZ0RI1Gbi-nA9DkCraQSfo81rA=.481fe804-9c5c-441b-b069-7ad7baee772a@github.com>
References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
 <viPT0XNzMpheGP6HtlZ0RI1Gbi-nA9DkCraQSfo81rA=.481fe804-9c5c-441b-b069-7ad7baee772a@github.com>
Message-ID: <s6pj4WO2tSlHfBq_95tbuSP3u3d5IBP9N7WWEk_fvNQ=.b8ef026f-3110-4c23-8726-f83d47fc5722@github.com>

On Thu, 11 Jul 2024 18:06:34 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity.
>> 
>> The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts.
>> 
>> Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser.
>
> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - Merge branch 'master' into pgc-vm-operation
>  - review
>  - review
>  - Merge branch 'master' into pgc-vm-operation
>  - pgc-vm-operation

Marked as reviewed by gli (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20077#pullrequestreview-2174586077

From jkratochvil at openjdk.org  Fri Jul 12 12:30:52 2024
From: jkratochvil at openjdk.org (Jan Kratochvil)
Date: Fri, 12 Jul 2024 12:30:52 GMT
Subject: RFR: 8333446: Add tests for hierarchical container support [v4]
In-Reply-To: <ZUmCX2Tqmw_48beJOefsyDEgjElCZWV6IVl7SMZi4r0=.37d3a4ee-2740-4745-ae47-766da3b7fb6e@github.com>
References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com>
 <ZUmCX2Tqmw_48beJOefsyDEgjElCZWV6IVl7SMZi4r0=.37d3a4ee-2740-4745-ae47-766da3b7fb6e@github.com>
Message-ID: <VJ5KHzRz789LNMySGGlflRLtFi1wXHuLyh56ifMK19c=.9f49db56-674c-4a70-8e86-294b6d041f59@github.com>

On Thu, 11 Jul 2024 16:46:13 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges.
>> 
>> I'm adding those tests in order to not regress another time.
>> 
>> Testing:
>> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1.
>> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to  JDK-8322420)
>> - [x] GHA
>
> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:
> 
>  - Add Whitebox check for host cpu
>  - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>  - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>  - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>  - Fix comments
>  - 8333446: Add tests for hierarchical container support

With #17198 and this updated patch I still get the a FAIL due to:

[0.333s][trace][os,container] OSContainer::active_processor_count: 4

But let's resolve it after #17198 gets final/approved.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2225475953

From sgehwolf at openjdk.org  Fri Jul 12 12:46:50 2024
From: sgehwolf at openjdk.org (Severin Gehwolf)
Date: Fri, 12 Jul 2024 12:46:50 GMT
Subject: RFR: 8333446: Add tests for hierarchical container support [v4]
In-Reply-To: <VJ5KHzRz789LNMySGGlflRLtFi1wXHuLyh56ifMK19c=.9f49db56-674c-4a70-8e86-294b6d041f59@github.com>
References: <gu9zW7xFuwfD7EyhkHQYadnHoB0DlCtSlkg8ddja9lQ=.523cfe54-5b05-44a2-9030-1dbc78797e7e@github.com>
 <ZUmCX2Tqmw_48beJOefsyDEgjElCZWV6IVl7SMZi4r0=.37d3a4ee-2740-4745-ae47-766da3b7fb6e@github.com>
 <VJ5KHzRz789LNMySGGlflRLtFi1wXHuLyh56ifMK19c=.9f49db56-674c-4a70-8e86-294b6d041f59@github.com>
Message-ID: <i6J1-zEhPuCqHs8lXwpjlNxfvX7lnCIkSuATD5U3S9M=.b43cf310-f870-40be-acfc-2889861183e4@github.com>

On Fri, 12 Jul 2024 12:28:16 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote:

> With #17198 and this updated patch I still get the a FAIL due to:
> 
> ```
> [0.333s][trace][os,container] OSContainer::active_processor_count: 4
> ```
> 
> But let's resolve it after #17198 gets final/approved.

Because the #17198 is incomplete. As mentioned in the review:

> We ought to also trim the path for the CPU controller. This patch only fixes the memory controller.

That's exactly why the test is failing.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2225501718

From coleenp at openjdk.org  Fri Jul 12 12:50:52 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Fri, 12 Jul 2024 12:50:52 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v7]
In-Reply-To: <PgX4o-qTT_4gDiZ94towRtB7xs7zkYMcoTpp51iz5vM=.4085c8c0-679a-4e84-8cb0-20bfb9ec80bf@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <PgX4o-qTT_4gDiZ94towRtB7xs7zkYMcoTpp51iz5vM=.4085c8c0-679a-4e84-8cb0-20bfb9ec80bf@github.com>
Message-ID: <6Aa4oWKwpgo9Br75tCLj3AGQLxP9Rw2dgjzOXJQ6CTo=.e92e83f5-e4b3-43d8-8e89-3349de99524d@github.com>

On Fri, 12 Jul 2024 10:54:23 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Cleanup c2 cache lookup

Thank you for making the argument change.

-------------

Marked as reviewed by coleenp (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2174647655

From ayang at openjdk.org  Fri Jul 12 13:01:59 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Fri, 12 Jul 2024 13:01:59 GMT
Subject: RFR: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and
 VM_ParallelGCSystemGC [v4]
In-Reply-To: <viPT0XNzMpheGP6HtlZ0RI1Gbi-nA9DkCraQSfo81rA=.481fe804-9c5c-441b-b069-7ad7baee772a@github.com>
References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
 <viPT0XNzMpheGP6HtlZ0RI1Gbi-nA9DkCraQSfo81rA=.481fe804-9c5c-441b-b069-7ad7baee772a@github.com>
Message-ID: <nmAzpCfQlgjRjRZhg89_QRFj0PebhJ85w9OrQfHZ9RU=.27e474f6-a59d-4b1a-a548-8cc2d5f2dcac@github.com>

On Thu, 11 Jul 2024 18:06:34 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity.
>> 
>> The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts.
>> 
>> Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser.
>
> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - Merge branch 'master' into pgc-vm-operation
>  - review
>  - review
>  - Merge branch 'master' into pgc-vm-operation
>  - pgc-vm-operation

Thanks for review.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20077#issuecomment-2225527435

From ayang at openjdk.org  Fri Jul 12 13:02:00 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Fri, 12 Jul 2024 13:02:00 GMT
Subject: Integrated: 8335902: Parallel: Refactor VM_ParallelGCFailedAllocation
 and VM_ParallelGCSystemGC
In-Reply-To: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
References: <vG2CPHrdE7Q8yAsBuS1IagvRplyRdAe3UcAtORGk1lE=.d5b2329b-1eb5-4241-ad16-83b3ea651f00@github.com>
Message-ID: <2rPj9VK03GeaMLDlBBlNBKwNQCLPWNk-cbsLN_G3ymA=.a932a4b7-185d-46b3-acac-4a27ff4a1ee8@github.com>

On Mon, 8 Jul 2024 16:18:22 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

> Similar cleanup as https://github.com/openjdk/jdk/pull/19056 but in Parallel. As a result, the corresponding code in `SerialHeap` and `ParallelScavengeHeap` share much similarity.
> 
> The easiest way to review is to start from these two VM operations, `VM_ParallelCollectForAllocation` and `VM_ParallelGCCollect` and follow the new code directly, where one can see how allocation-failure triggers various GCs with different collection efforts.
> 
> Test: tier1-6; perf-neural for dacapo, specjvm2008, specjbb2015 and cachestresser.

This pull request has now been integrated.

Changeset: 34d8562a
Author:    Albert Mingkun Yang <ayang at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/34d8562a913b8382601e4c0c31ad34a663b9ec0a
Stats:     342 lines in 14 files changed: 88 ins; 169 del; 85 mod

8335902: Parallel: Refactor VM_ParallelGCFailedAllocation and VM_ParallelGCSystemGC

Reviewed-by: gli, zgu

-------------

PR: https://git.openjdk.org/jdk/pull/20077

From aboldtch at openjdk.org  Fri Jul 12 13:06:40 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Fri, 12 Jul 2024 13:06:40 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v8]
In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
Message-ID: <VM1cqaoV2Fx9tbdONunfk4TMc8NUHbjdVFCDy5ySDuE=.9eafd768-5e31-48c7-b0fb-e676e801ddc4@github.com>

> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
> 
> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
> 
> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
> 
> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
> 
> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
> 
> # Cleanups
> 
> Cleaned up displaced header usage for:
>   * BasicLock
>     * Contains some Zero changes
>     * Renames one exported JVMCI field
>   * ObjectMonitor
>     * Updates comments and tests consistencies
> 
> # Refactoring
> 
> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
> 
> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
> 
> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
> 
> # LightweightSynchronizer
> 
> Working on adapting and incorporating the following section as a comment in the source code
> 
> ## Fast Locking
> 
>   CAS on locking bits in markWord. 
>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
> 
>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
> 
>   If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf...

Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:

  Avoid uniform initialization

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20067/files
  - new: https://git.openjdk.org/jdk/pull/20067/files/e1eb8c95..cccffeda

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=06-07

  Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/20067.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067

PR: https://git.openjdk.org/jdk/pull/20067

From aboldtch at openjdk.org  Fri Jul 12 13:06:40 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Fri, 12 Jul 2024 13:06:40 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6]
In-Reply-To: <PIXggQ3dHB2fzB-FFKqZv88c0HFbbgj8U3uMEH9LJWM=.7521eb79-276a-4afc-9081-bd99dadac5c6@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
 <TZcCyU0Zgrw6UwJ6-v_k0W06ChzxniusrEiK1UPErt0=.a028b5e9-dd11-4f2c-94d2-e427ad85a8ee@github.com>
 <PIXggQ3dHB2fzB-FFKqZv88c0HFbbgj8U3uMEH9LJWM=.7521eb79-276a-4afc-9081-bd99dadac5c6@github.com>
Message-ID: <FxdbAVZ0cLhyLrpgp-3Nwj7pSm_PVli71my96vMU_b8=.f131763f-576c-476f-8146-d805f3f074cf@github.com>

On Fri, 12 Jul 2024 12:06:05 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 343:
>> 
>>> 341:     const Register t3_owner = t3;
>>> 342:     const ByteSize monitor_tag = in_ByteSize(UseObjectMonitorTable ? 0 : checked_cast<int>(markWord::monitor_value));
>>> 343:     const Address owner_address{t1_monitor, ObjectMonitor::owner_offset() - monitor_tag};
>> 
>> That may be just me, but I found that syntax weird. I first needed to look-up what the {}-initializer actually means. Hiccups like this reduce readability, IMO. I'd prefer the normal ()-init for the Address like we seem to do everywhere else.
>
> I agree with @rkennke . When we wrote the AArch64 MacroAssembler we were concentrating on readability and familiarity, and this separate declaration and use, with unusual syntax, IMO makes life harder for the reader.

Fair enough. 

? _To me uniform initialization is just safer less problematic way of expressing the same thing._

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675828324

From shade at openjdk.org  Fri Jul 12 13:21:50 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 12 Jul 2024 13:21:50 GMT
Subject: RFR: 8329597: C2: Intrinsify Reference.clear
In-Reply-To: <iFxcPJTPGoxZgIaQKYtEbtg06xXYJewHfSA-f7nbofQ=.37070a3a-681b-4ccb-8857-91be898fd3c9@github.com>
References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
 <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com>
 <K2EJ43EXkTgJE0pjwzy50s3BoTAhF1Y2trwHtDzhojQ=.e837c7bc-717c-4826-8cc3-82a2232bc928@github.com>
 <iFxcPJTPGoxZgIaQKYtEbtg06xXYJewHfSA-f7nbofQ=.37070a3a-681b-4ccb-8857-91be898fd3c9@github.com>
Message-ID: <WOpJEGXtCPcCZv7YFhUT2ZOHe8j3mnavPrLjbbFD0Ns=.e514c8c3-ee1f-4e0d-a9ae-a83171959a0e@github.com>

On Fri, 12 Jul 2024 11:57:56 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> The runtime use of the Access API knows how to resolve an unknown oop ref strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. However, we do not have support for that in the C2 backend. In fact, it does not understand non-strong oop stores at all. 

Aw, nice usability landmine. I thought C2 barrier set would assert on me if it cannot deliver. Apparently not, I see it just does pre-barriers when it is not sure what strongness the store is. Hrmpf. OK, let me see what can be done here.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2225577027

From fgao at openjdk.org  Fri Jul 12 13:52:04 2024
From: fgao at openjdk.org (Fei Gao)
Date: Fri, 12 Jul 2024 13:52:04 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting from
 long to pointer
Message-ID: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>

In the cases like:

  UNSAFE.putLong(address + off1 + 1030, lseed);
  UNSAFE.putLong(address + 1023, lseed);
  UNSAFE.putLong(address + off2 + 1001, lseed);


Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like:

  ldr  R10, [R15, #120]    # int ! Field: address
  ldr  R11, [R16, #136]    # int ! Field: off1
  ldr  R12, [R16, #144]    # int ! Field: off2
  add  R11, R11, R10
  mov R11, R11    # long -> ptr
  add  R12, R12, R10
  mov R10, R10    # long -> ptr
  add R11, R11, #1030    # ptr
  str  R17, [R11]    # int
  add R10, R10, #1023    # ptr
  str  R17, [R10]    # int
  mov R10, R12    # long -> ptr
  add R10, R10, #1001    # ptr
  str  R17, [R10]    # int


In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly:

  ldr    x10, [x15,#120]
  ldp    x11, x12, [x16,#136]
  add    x11, x11, x10
  add    x12, x12, x10
  add    x11, x11, #0x406
  str    x17, [x11]
  add    x10, x10, #0x3ff
  str    x17, [x10]
  mov    x10, x12  <--- extra register copy
  add    x10, x10, #0x3e9
  str    x17, [x10]


There is still one extra register copy, which we're trying to remove in this patch.

This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition.

Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so

[1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906

-------------

Commit messages:
 - 8336245: AArch64: remove extra register copy when converting from long to pointer

Changes: https://git.openjdk.org/jdk/pull/20157/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20157&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8336245
  Stats: 320 lines in 5 files changed: 297 ins; 3 del; 20 mod
  Patch: https://git.openjdk.org/jdk/pull/20157.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20157/head:pull/20157

PR: https://git.openjdk.org/jdk/pull/20157

From fgao at openjdk.org  Fri Jul 12 13:52:05 2024
From: fgao at openjdk.org (Fei Gao)
Date: Fri, 12 Jul 2024 13:52:05 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
Message-ID: <2Ln6-ZIklVFgsBWZmmyOU2G-wZmknxjsoT1xcTKSXDc=.54473598-6e15-43d1-9e5f-95c796d11066@github.com>

On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote:

> In the cases like:
> 
>   UNSAFE.putLong(address + off1 + 1030, lseed);
>   UNSAFE.putLong(address + 1023, lseed);
>   UNSAFE.putLong(address + off2 + 1001, lseed);
> 
> 
> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like:
> 
>   ldr  R10, [R15, #120]    # int ! Field: address
>   ldr  R11, [R16, #136]    # int ! Field: off1
>   ldr  R12, [R16, #144]    # int ! Field: off2
>   add  R11, R11, R10
>   mov R11, R11    # long -> ptr
>   add  R12, R12, R10
>   mov R10, R10    # long -> ptr
>   add R11, R11, #1030    # ptr
>   str  R17, [R11]    # int
>   add R10, R10, #1023    # ptr
>   str  R17, [R10]    # int
>   mov R10, R12    # long -> ptr
>   add R10, R10, #1001    # ptr
>   str  R17, [R10]    # int
> 
> 
> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly:
> 
>   ldr    x10, [x15,#120]
>   ldp    x11, x12, [x16,#136]
>   add    x11, x11, x10
>   add    x12, x12, x10
>   add    x11, x11, #0x406
>   str    x17, [x11]
>   add    x10, x10, #0x3ff
>   str    x17, [x10]
>   mov    x10, x12  <--- extra register copy
>   add    x10, x10, #0x3e9
>   str    x17, [x10]
> 
> 
> There is still one extra register copy, which we're trying to remove in this patch.
> 
> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition.
> 
> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so
> 
> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906

src/hotspot/share/opto/machnode.cpp line 400:

> 398: 
> 399:   if (t->isa_intptr_t() &&
> 400: #if !defined(AARCH64)

After applying the operand "IndirectX2P", we may have some patterns like:

str val, [CastX2P base]

The code path here will resolve the `base`, which is actually a `intptr`, not a `ptr`, and the offset is `0`.

I guess the code here was intended to support `[base, offset]`, where base can be a `intptr` but offset can not be `0`. I'm not sure why there is such a limitation that offset can not be `0`, maybe for some old machines?

I don't think the limitation is applied to aarch64 machines now. So I unblock it for aarch64.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1675959482

From fgao at openjdk.org  Fri Jul 12 14:17:50 2024
From: fgao at openjdk.org (Fei Gao)
Date: Fri, 12 Jul 2024 14:17:50 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
Message-ID: <TDTev6CsRM2rnR1nNML50zEUQoPZ79l0y9Zg0CpAwgU=.7b792eac-cec5-4ced-b84c-704802ca9f57@github.com>

On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote:

> In the cases like:
> 
>   UNSAFE.putLong(address + off1 + 1030, lseed);
>   UNSAFE.putLong(address + 1023, lseed);
>   UNSAFE.putLong(address + off2 + 1001, lseed);
> 
> 
> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like:
> 
>   ldr  R10, [R15, #120]    # int ! Field: address
>   ldr  R11, [R16, #136]    # int ! Field: off1
>   ldr  R12, [R16, #144]    # int ! Field: off2
>   add  R11, R11, R10
>   mov R11, R11    # long -> ptr
>   add  R12, R12, R10
>   mov R10, R10    # long -> ptr
>   add R11, R11, #1030    # ptr
>   str  R17, [R11]    # int
>   add R10, R10, #1023    # ptr
>   str  R17, [R10]    # int
>   mov R10, R12    # long -> ptr
>   add R10, R10, #1001    # ptr
>   str  R17, [R10]    # int
> 
> 
> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly:
> 
>   ldr    x10, [x15,#120]
>   ldp    x11, x12, [x16,#136]
>   add    x11, x11, x10
>   add    x12, x12, x10
>   add    x11, x11, #0x406
>   str    x17, [x11]
>   add    x10, x10, #0x3ff
>   str    x17, [x10]
>   mov    x10, x12  <--- extra register copy
>   add    x10, x10, #0x3e9
>   str    x17, [x10]
> 
> 
> There is still one extra register copy, which we're trying to remove in this patch.
> 
> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition.
> 
> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so
> 
> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906

https://github.com/openjdk/jdk/pull/20159 is also to fix the same issue. Please feel free to review the draft PR. Thanks.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2225679586

From aph at openjdk.org  Fri Jul 12 14:36:51 2024
From: aph at openjdk.org (Andrew Haley)
Date: Fri, 12 Jul 2024 14:36:51 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <2Ln6-ZIklVFgsBWZmmyOU2G-wZmknxjsoT1xcTKSXDc=.54473598-6e15-43d1-9e5f-95c796d11066@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
 <2Ln6-ZIklVFgsBWZmmyOU2G-wZmknxjsoT1xcTKSXDc=.54473598-6e15-43d1-9e5f-95c796d11066@github.com>
Message-ID: <BDLL94Te55nHGCUuLtN6qIQynYIWdux300wtYvdxbkU=.0bdaf9f1-cb7e-403c-96f8-3b3ba69f8484@github.com>

On Fri, 12 Jul 2024 13:49:32 GMT, Fei Gao <fgao at openjdk.org> wrote:

>> In the cases like:
>> 
>>   UNSAFE.putLong(address + off1 + 1030, lseed);
>>   UNSAFE.putLong(address + 1023, lseed);
>>   UNSAFE.putLong(address + off2 + 1001, lseed);
>> 
>> 
>> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like:
>> 
>>   ldr  R10, [R15, #120]    # int ! Field: address
>>   ldr  R11, [R16, #136]    # int ! Field: off1
>>   ldr  R12, [R16, #144]    # int ! Field: off2
>>   add  R11, R11, R10
>>   mov R11, R11    # long -> ptr
>>   add  R12, R12, R10
>>   mov R10, R10    # long -> ptr
>>   add R11, R11, #1030    # ptr
>>   str  R17, [R11]    # int
>>   add R10, R10, #1023    # ptr
>>   str  R17, [R10]    # int
>>   mov R10, R12    # long -> ptr
>>   add R10, R10, #1001    # ptr
>>   str  R17, [R10]    # int
>> 
>> 
>> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly:
>> 
>>   ldr    x10, [x15,#120]
>>   ldp    x11, x12, [x16,#136]
>>   add    x11, x11, x10
>>   add    x12, x12, x10
>>   add    x11, x11, #0x406
>>   str    x17, [x11]
>>   add    x10, x10, #0x3ff
>>   str    x17, [x10]
>>   mov    x10, x12  <--- extra register copy
>>   add    x10, x10, #0x3e9
>>   str    x17, [x10]
>> 
>> 
>> There is still one extra register copy, which we're trying to remove in this patch.
>> 
>> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition.
>> 
>> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so
>> 
>> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906
>
> src/hotspot/share/opto/machnode.cpp line 400:
> 
>> 398: 
>> 399:   if (t->isa_intptr_t() &&
>> 400: #if !defined(AARCH64)
> 
> After applying the operand "IndirectX2P", we may have some patterns like:
> 
> str val, [CastX2P base]
> 
> The code path here will resolve the `base`, which is actually a `intptr`, not a `ptr`, and the offset is `0`.
> 
> I guess the code here was intended to support `[base, offset]`, where base can be a `intptr` but offset can not be `0`. I'm not sure why there is such a limitation that offset can not be `0`, maybe for some old machines?
> 
> I don't think the limitation is applied to aarch64 machines now. So I unblock it for aarch64.

I think it's the other way around. Isn't this code saying that if the address is an intptr + a nonzero offset, then the returned type is bottom, ie nothing? What effect does this change have?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1676024922

From jvernee at openjdk.org  Fri Jul 12 14:43:25 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Fri, 12 Jul 2024 14:43:25 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing shared
 arena
Message-ID: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>

This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.

Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.

In this PR:
- I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
- Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
- Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.

I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:


Benchmark                     Threads   Mode  Cnt     Score     Error  Units
ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
ConcurrentClose.sharedAccess        1   avgt   10    52.042 ?   0.630  us/op
ConcurrentClose.confinedAccess     32   avgt   10    25.517 ?   1.069  us/op
ConcurrentClose.confinedAccess      1   avgt   10    12.398 ?   0.098  us/op


(I manually added the `Threads` collumn btw)

Testing: tier 1-4

-------------

Commit messages:
 - polish
 - slightly improve comment
 - tweak comment
 - improve benchmark parameters
 - cleanup
 - add benchmark
 - add note about lacking session oop at safepoint
 - Only deopt if necessary
 - refactor close handshake
 - Return before deoptimizing of target thread already has async exception

Changes: https://git.openjdk.org/jdk/pull/20158/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335480
  Stats: 428 lines in 6 files changed: 339 ins; 19 del; 70 mod
  Patch: https://git.openjdk.org/jdk/pull/20158.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158

PR: https://git.openjdk.org/jdk/pull/20158

From shade at openjdk.org  Fri Jul 12 14:45:53 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 12 Jul 2024 14:45:53 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
Message-ID: <SGPbCyVziPj9rzNvesb3ME37e9-Ld4wSCuTTQYbGNWo=.a29bc9f6-974f-4dc2-960f-a4fbba474710@github.com>

On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> See bug for more discussion.
> 
> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
> 
> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
> 
> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
> 
> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
> 
> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
> 
> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
> 
> Additional testing:
>  - [x] New IR tests
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, jcstre...

Still waiting for formal reviews, thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2225744502

From liach at openjdk.org  Fri Jul 12 14:57:53 2024
From: liach at openjdk.org (Chen Liang)
Date: Fri, 12 Jul 2024 14:57:53 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
Message-ID: <_1clZ5gStZtmtewSn4IBK_hNMGirBETJC9Szgrw6xzE=.1ba387d8-317d-44f3-8fa9-2860f9d53242@github.com>

On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> See bug for more discussion.
> 
> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
> 
> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
> 
> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
> 
> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
> 
> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
> 
> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
> 
> Additional testing:
>  - [x] New IR tests
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, jcstre...

Marked as reviewed by liach (Reviewer).

src/hotspot/share/opto/parse1.cpp line 1040:

> 1038:     if (PrintOpto && (Verbose || WizardMode)) {
> 1039:       method()->print_name();
> 1040:       tty->print_cr(" writes @Stable and needs a memory barrier");

This is the generic, non-constructor stable write release barrier removed, right?

-------------

PR Review: https://git.openjdk.org/jdk/pull/19635#pullrequestreview-2175116293
PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1676061372

From shade at openjdk.org  Fri Jul 12 15:07:51 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 12 Jul 2024 15:07:51 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <_1clZ5gStZtmtewSn4IBK_hNMGirBETJC9Szgrw6xzE=.1ba387d8-317d-44f3-8fa9-2860f9d53242@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
 <_1clZ5gStZtmtewSn4IBK_hNMGirBETJC9Szgrw6xzE=.1ba387d8-317d-44f3-8fa9-2860f9d53242@github.com>
Message-ID: <SPYs_DoEu46y0-9C7D45r5Oxfl8TCb3SoW_pi864DAQ=.5d8ff9d9-c2e3-4727-82a8-eb979cf71c0d@github.com>

On Fri, 12 Jul 2024 14:54:58 GMT, Chen Liang <liach at openjdk.org> wrote:

>> See bug for more discussion.
>> 
>> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
>> 
>> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
>> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
>> 
>> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
>> 
>> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
>> 
>> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
>> 
>> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
>> 
>> Additional testing:
>>  - [x] New IR tests
>>  - [x] Linux x86_64 server fastdebug, `all`
>>  - [x] Linux AArch64 server fastdebug, `all`
>>  - [x...
>
> src/hotspot/share/opto/parse1.cpp line 1040:
> 
>> 1038:     if (PrintOpto && (Verbose || WizardMode)) {
>> 1039:       method()->print_name();
>> 1040:       tty->print_cr(" writes @Stable and needs a memory barrier");
> 
> This is the generic, non-constructor stable write release barrier removed, right?

Yes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1676075164

From rkennke at openjdk.org  Fri Jul 12 16:18:02 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Fri, 12 Jul 2024 16:18:02 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6]
In-Reply-To: <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
Message-ID: <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com>

On Fri, 12 Jul 2024 05:57:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update arguments.cpp

Here comes my first-pass review of the shared code.
(Man, I hope we can get rid of UOMT soon, again...)

src/hotspot/share/oops/instanceKlass.cpp line 1090:

> 1088: 
> 1089:     // Step 2
> 1090:     // If we were to use wait() instead of waitUninterruptibly() then

This is a nice correction (even though, the actual call below is wait_uninterruptibly() ;-) ), but seems totally unrelated.

src/hotspot/share/oops/markWord.cpp line 27:

> 25: #include "precompiled.hpp"
> 26: #include "oops/markWord.hpp"
> 27: #include "runtime/basicLock.inline.hpp"

I don't think this include is needed (at least not by the changed code parts, I haven't checked existing code).

src/hotspot/share/runtime/arguments.cpp line 1820:

> 1818:     warning("New lightweight locking not supported on this platform");
> 1819:   }
> 1820:   if (UseObjectMonitorTable) {

Uhm, wait a second. That list of platforms covers all existing platforms anyway, so the whole block could be removed? Or is there a deeper meaning here that I don't understand?

src/hotspot/share/runtime/basicLock.cpp line 37:

> 35:     if (mon != nullptr) {
> 36:       mon->print_on(st);
> 37:     }

I am not sure if we wanted to do this, but we know the owner, therefore we could also look-up the OM from the table, and print it. It wouldn't have all that much to do with the BasicLock, though.

src/hotspot/share/runtime/basicLock.inline.hpp line 45:

> 43:   return reinterpret_cast<ObjectMonitor*>(get_metadata());
> 44: #else
> 45:   // Other platforms does not make use of the cache yet,

If it's not used, why does it matter to special case the code here?

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 28:

> 26: 
> 27: #include "classfile/vmSymbols.hpp"
> 28: #include "javaThread.inline.hpp"

This include is incorrect (and my IDE says it's not needed).

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 31:

> 29: #include "jfrfiles/jfrEventClasses.hpp"
> 30: #include "logging/log.hpp"
> 31: #include "logging/logStream.hpp"

Include of logStream.hpp not needed?

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 58:

> 56: 
> 57: //
> 58: // Lightweight synchronization.

This comment doesn't really say anything. Either remove it, or add a nice summary of how LW locking and OM table stuff works.

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 80:

> 78: 
> 79:   ConcurrentTable* _table;
> 80:   volatile size_t _table_count;

Looks like a misnomer to me. We only have one table, but we do have N entries/nodes. This is counted when new nodes are allocated or old nodes are freed. Consider renaming this to '_entry_count' or '_node_count'? I'm actually a bit surprised if ConcurrentHashTable doesn't already track this...

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 88:

> 86: 
> 87:   public:
> 88:     Lookup(oop obj) : _obj(obj) {}

Make explicit?

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 97:

> 95: 
> 96:     bool equals(ObjectMonitor** value) {
> 97:       // The entry is going to be removed soon.

What does this comment mean?

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 112:

> 110: 
> 111:   public:
> 112:     LookupMonitor(ObjectMonitor* monitor) : _monitor(monitor) {}

Make explicit?

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 159:

> 157:   static size_t min_log_size() {
> 158:     // ~= log(AvgMonitorsPerThreadEstimate default)
> 159:     return 10;

Uh wait - are we assuming that threads hold 1024 monitors *on average* ? Isn't this a bit excessive? I would have thought maybe 8 monitors/thread. Yes there are workloads that are bonkers. Or maybe the comment/flag name does not say what I think it says.

Or why not use AvgMonitorsPerThreadEstimate directly?

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 349:

> 347:   assert(LockingMode == LM_LIGHTWEIGHT, "must be");
> 348: 
> 349:   if (try_read) {

All the callers seem to pass try_read = true. Why do we have the branch at all?

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 401:

> 399: 
> 400:   if (inserted) {
> 401:     // Hopefully the performance counters are allocated on distinct

It doesn't look like the counters are on distinct cache lines (see objectMonitor.hpp, lines 212ff). If this is a concern, file a bug to investigate it later? The comment here is a bit misplaced, IMO.

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 473:

> 471:   int _length;
> 472: 
> 473:   void do_oop(oop* o) final {

C++ always provides something to learn - C++ has got a final keyword! :-) Looks like a reasonable use of it here, though.

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 477:

> 475:     if (obj->mark_acquire().has_monitor()) {
> 476:       if (_length > 0 && _contended_oops[_length-1] == obj) {
> 477:         // assert(VM_Version::supports_recursive_lightweight_locking(), "must be");

Uncomment or remove assert?

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 554:

> 552:   bool _no_safepoint;
> 553:   union {
> 554:     struct {} _dummy;

Uhh ... Why does this need to be wrapped in a union and struct?

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 563:

> 561:     assert(locking_thread == current || locking_thread->is_obj_deopt_suspend(), "locking_thread may not run concurrently");
> 562:     if (_no_safepoint) {
> 563:       ::new (&_nsv) NoSafepointVerifier();

I'm thinking that it might be easier and cleaner to just re-do what the NoSafepointVerifier does? It just calls thread->inc/dec
_no_safepoint_count().

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 748:

> 746:   }
> 747: 
> 748:   // Fast-locking does not use the 'lock' argument.

I believe the comment is outdated.

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 969:

> 967: 
> 968:   for (;;) {
> 969:   // Fetch the monitor from the table

Wrong intendation.

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 1157:

> 1155:   // enter can block for safepoints; clear the unhandled object oop
> 1156:   PauseNoSafepointVerifier pnsv(&nsv);
> 1157:   object = nullptr;

What is the point of that statement? object is not an out-arg (afaict), and not used subsequently.

src/hotspot/share/runtime/lightweightSynchronizer.hpp line 68:

> 66:   static void exit(oop object, JavaThread* current);
> 67: 
> 68:   static ObjectMonitor* inflate_into_object_header(Thread* current, JavaThread* inflating_thread, oop object, const ObjectSynchronizer::InflateCause cause);

My IDE flags this with a warning 'Parameter 'cause' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions' *shrugs*

src/hotspot/share/runtime/lockStack.inline.hpp line 232:

> 230:   oop obj = monitor->object_peek();
> 231:   assert(obj != nullptr, "must be alive");
> 232:   assert(monitor == LightweightSynchronizer::get_monitor_from_table(JavaThread::current(), obj), "must be exist in table");

"must be exist in table" -> "must exist in table"

src/hotspot/share/runtime/objectMonitor.cpp line 56:

> 54: #include "runtime/safepointMechanism.inline.hpp"
> 55: #include "runtime/sharedRuntime.hpp"
> 56: #include "runtime/synchronizer.hpp"

This include is not used.

src/hotspot/share/runtime/objectMonitor.hpp line 193:

> 191:   ObjectWaiter* volatile _WaitSet;  // LL of threads wait()ing on the monitor
> 192:   volatile int  _waiters;           // number of waiting threads
> 193:  private:

You can now also remove the 'private:' here

src/hotspot/share/runtime/synchronizer.cpp line 390:

> 388: 
> 389: static bool useHeavyMonitors() {
> 390: #if defined(X86) || defined(AARCH64) || defined(PPC64) || defined(RISCV64) || defined(S390)

Why are those if-defs here? Why is ARM excluded?

-------------

Changes requested by rkennke (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2174478048
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675695457
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675696406
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675704824
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675707735
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675711809
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675744474
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675745048
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1676111067
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675773683
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675747483
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675765460
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675766088
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675781420
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675791687
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675799897
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675803217
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675805690
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675824394
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675832868
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675854207
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675876915
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675932005
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675936943
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1676107048
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1676112375
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1676125325
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1676140201

From rkennke at openjdk.org  Fri Jul 12 16:18:03 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Fri, 12 Jul 2024 16:18:03 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v3]
In-Reply-To: <-hS6aTxhzI_HzVegg0EziUtGxdq6orpF9s1rF3l2hZY=.0c4296b2-d27a-4578-a160-d17b65163655@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <5CNKzDumOf1MJQXM9OBHQh0Mj7eLv2ONio1V-AXeSJI=.54302b45-2dd2-4f18-a094-6b2c6a59517c@github.com>
 <-hS6aTxhzI_HzVegg0EziUtGxdq6orpF9s1rF3l2hZY=.0c4296b2-d27a-4578-a160-d17b65163655@github.com>
Message-ID: <hK7cMXwnR14MPvDtZ08migcBjRmXqlXpFEI5BLyAA2M=.cec68237-c10b-4cdd-976f-495c6d25560b@github.com>

On Tue, 9 Jul 2024 20:43:06 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Add JVMCI symbol exports
>>  - Revert "More graceful JVMCI VM option interaction"
>>    
>>    This reverts commit 2814350370cf142e130fe1d38610c646039f976d.
>
> src/hotspot/share/opto/library_call.cpp line 4620:
> 
>> 4618:       Node *unlocked_val      = _gvn.MakeConX(markWord::unlocked_value);
>> 4619:       Node *chk_unlocked      = _gvn.transform(new CmpXNode(lmasked_header, unlocked_val));
>> 4620:       Node *test_not_unlocked = _gvn.transform(new BoolNode(chk_unlocked, BoolTest::ne));
> 
> I don't really know what this does.  Someone from the c2 compiler group should look at this.

Yes, that looks correct. I'm familiar with this code because I messed with it in my attempts to implement compact identity hashcode in Lilliput2.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675699672

From rkennke at openjdk.org  Fri Jul 12 16:18:03 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Fri, 12 Jul 2024 16:18:03 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6]
In-Reply-To: <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
 <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com>
Message-ID: <CkZ-Sr3ITmhrMyAhsjGUsf2LgyiU2QhaNdvbkoMWL1Y=.48abe896-7a09-4bf1-a236-d86ffd35fcdf@github.com>

On Fri, 12 Jul 2024 15:56:59 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update arguments.cpp
>
> src/hotspot/share/runtime/synchronizer.cpp line 390:
> 
>> 388: 
>> 389: static bool useHeavyMonitors() {
>> 390: #if defined(X86) || defined(AARCH64) || defined(PPC64) || defined(RISCV64) || defined(S390)
> 
> Why are those if-defs here? Why is ARM excluded?

Oh I see, you only moved this up. Still a bit puzzling.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1676142470

From coleenp at openjdk.org  Fri Jul 12 17:44:53 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Fri, 12 Jul 2024 17:44:53 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6]
In-Reply-To: <CkZ-Sr3ITmhrMyAhsjGUsf2LgyiU2QhaNdvbkoMWL1Y=.48abe896-7a09-4bf1-a236-d86ffd35fcdf@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
 <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com>
 <CkZ-Sr3ITmhrMyAhsjGUsf2LgyiU2QhaNdvbkoMWL1Y=.48abe896-7a09-4bf1-a236-d86ffd35fcdf@github.com>
Message-ID: <mYOetX5LzfVBYpl9xDGQlJJxxntXdKRfAYCs_g0L5_g=.4863065e-35c1-474e-abcc-cb19789ed6aa@github.com>

On Fri, 12 Jul 2024 15:58:56 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> src/hotspot/share/runtime/synchronizer.cpp line 390:
>> 
>>> 388: 
>>> 389: static bool useHeavyMonitors() {
>>> 390: #if defined(X86) || defined(AARCH64) || defined(PPC64) || defined(RISCV64) || defined(S390)
>> 
>> Why are those if-defs here? Why is ARM excluded?
>
> Oh I see, you only moved this up. Still a bit puzzling.

This code was just moved.  No idea why ARM is excluded.  I filed this to deal with this.
https://bugs.openjdk.org/browse/JDK-8336325

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1676253183

From jbhateja at openjdk.org  Fri Jul 12 18:31:59 2024
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Fri, 12 Jul 2024 18:31:59 GMT
Subject: RFR: 8335860: compiler/vectorization/TestFloat16VectorConvChain.java
 fails with non-standard AVX/SSE settings
Message-ID: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>

Enabling test with explicit feature checks for x86 target.
Removing from test/hotspot/jtreg/ProblemList.txt

Best Regards,
Jatin

-------------

Commit messages:
 - 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings

Changes: https://git.openjdk.org/jdk/pull/20160/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20160&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335860
  Stats: 5 lines in 3 files changed: 1 ins; 2 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20160.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20160/head:pull/20160

PR: https://git.openjdk.org/jdk/pull/20160

From jvernee at openjdk.org  Fri Jul 12 20:59:26 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Fri, 12 Jul 2024 20:59:26 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
Message-ID: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>

> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
> 
> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
> 
> In this PR:
> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside compiled code.
> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
> 
> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
> 
> 
> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
> ConcurrentClose.sharedAccess        1   avgt   10    52.042 ?   0.630  us/op
> ConcurrentClose.confinedAccess     32   avgt   10    25.517 ?   1.069  ...

Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:

  track has_scoped_access for compiled methods

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20158/files
  - new: https://git.openjdk.org/jdk/pull/20158/files/34ff5fd8..d1266b53

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=00-01

  Stats: 42 lines in 15 files changed: 38 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/20158.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158

PR: https://git.openjdk.org/jdk/pull/20158

From jvernee at openjdk.org  Fri Jul 12 20:59:26 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Fri, 12 Jul 2024 20:59:26 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena
In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
Message-ID: <-ywJLT6LHavlxhuYXJQTh6xvvhz00oFECkpiCvz_Y4w=.a67a4c7d-b503-475f-aee0-0e042acbccc6@github.com>

On Fri, 12 Jul 2024 13:57:23 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
> 
> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
> 
> In this PR:
> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside compiled code.
> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
> 
> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
> 
> 
> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
> ConcurrentClose.sharedAccess        1   avgt   10    52.042 ?   0.630  us/op
> ConcurrentClose.confinedAccess     32   avgt   10    25.517 ?   1.069  ...

>  This could be narrowed down further by tracking for each compiled method if it has an (inlined) call to an `@Scoped` method, but I've left that out for now.

I decided to add this to the PR for completeness, so that we don't go and deoptimize frames that are not using scoped accesses at all.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2226341535

From duke at openjdk.org  Fri Jul 12 21:18:53 2024
From: duke at openjdk.org (duke)
Date: Fri, 12 Jul 2024 21:18:53 GMT
Subject: RFR: 8336278: Micro-optimize Replace String.format("%n") to
 System.lineSeparator
In-Reply-To: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com>
References: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com>
Message-ID: <xNX4TovrPZgAe2DdqIuUNO1RVgo3rVlGuGV47_r6tto=.934fa260-a0de-4579-9a3a-22dde9420a9d@github.com>

On Thu, 11 Jul 2024 22:45:47 GMT, Shaojin Wen <duke at openjdk.org> wrote:

> There are three places in the JDK code where String.format("%n") is used. This is actually equivalent to System.lineSeparator and does not require the implementation of String.format.

@wenshao 
Your change (at version 829da3e149eadedd22d81d22a2d025516c59c210) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20149#issuecomment-2226364868

From duke at openjdk.org  Fri Jul 12 21:52:04 2024
From: duke at openjdk.org (Shaojin Wen)
Date: Fri, 12 Jul 2024 21:52:04 GMT
Subject: Integrated: 8336278: Micro-optimize Replace String.format("%n") to
 System.lineSeparator
In-Reply-To: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com>
References: <Wq0CZfwc1zPhr-zfj7K2iSXSMbRtbr9mfvjBshZNpo0=.cd467619-c484-4167-a34c-516e05bbc67f@github.com>
Message-ID: <MAqa6C_mHMuQtCN7ka4-e_wGH1aPNziQaZt8LsuZALc=.e8b9d092-e89d-4c1c-bda6-62249403ca32@github.com>

On Thu, 11 Jul 2024 22:45:47 GMT, Shaojin Wen <duke at openjdk.org> wrote:

> There are three places in the JDK code where String.format("%n") is used. This is actually equivalent to System.lineSeparator and does not require the implementation of String.format.

This pull request has now been integrated.

Changeset: 4957145e
Author:    Shaojin Wen <shaojin.wensj at alibaba-inc.com>
Committer: Chen Liang <liach at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/4957145e6c823bfaa638a77457da5c031af978b9
Stats:     6 lines in 3 files changed: 0 ins; 0 del; 6 mod

8336278: Micro-optimize Replace String.format("%n") to System.lineSeparator

Reviewed-by: dnsimon, shade

-------------

PR: https://git.openjdk.org/jdk/pull/20149

From hboehm at google.com  Sat Jul 13 00:36:33 2024
From: hboehm at google.com (Hans Boehm)
Date: Fri, 12 Jul 2024 17:36:33 -0700
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <mailman.15905.1720688298.324.hotspot-dev@openjdk.org>
References: <mailman.15905.1720688298.324.hotspot-dev@openjdk.org>
Message-ID: <CAMOCf+i5Eb8xFMPw_+eeSpyXcFEiXeEacLk0ZqYQBxCpHkxDxg@mail.gmail.com>

> Message: 1
> Date: Thu, 11 Jul 2024 08:50:57 GMT
> From: John R Rose <jrose at openjdk.org>
> To: <hotspot-dev at openjdk.org>
> Subject: Re: RFR: 8333791: Fix memory barriers for @Stable fields
> Message-ID:
>         <pfFWmbs1q_M-WQIDyBw15ctVdRcAudSrdJ6BEQRx41E=.
762c100f-7650-47fd-bfe3-ac620913384f at github.com>
>
> Content-Type: text/plain; charset=utf-8
>
> On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org>
wrote:
>
> > See bug for more discussion.
> >
> > Currently, C2 puts a `Release` barrier at exit of _every_ method that
writes a `@Stable` field. This is a problem for high-performance code that
initializes the stable field like this:
https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
> >
> > A more egregious example is here, which means that every `String`
constructor actually does `Release` barrier for `@Stable` field write,
while only a `StoreStore` for `final` field store would suffice:
> >
https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
> >
> > AFAICS, the original intent for Release barrier in constructor for
stable fields was to match the memory semantics of final fields better.
`@Stable` are in some sense "super-finals": they are foldable like static
finals or non-static trusted finals, but can be written anywhere. The
`@Stable` machinery is intrinsically safe under races: either a compiler
sees a component of stable subgraph in initialized state and folds it, or
it sees a default value for the component and leaves it alone.
> >
> > I [performed an audit](
https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000)
of current `@Stable` uses for fields that are not currently `final` or
`volatile`, and there are cases where we write into `@Stable` fields in
constructors. AFAICS, they are covered by final-field-like semantics by
accident of having adjacent `final` fields.
> >
> > Current PR implements Variant 2 from the discussion: makes sure stable
fields are as memory-safe as finals, and that's it. I believe this is
all-around a good compromise for both mainline and the backports: the
performance is improved in one the path that matter, and we still have some
safety margin in face of accidental removals of adjacent `final`-s, or in
case I missed some spots during the audit.
> >
> > C1 did not do anything special for `@Stable` fields at all, fixed those
to match C2. Both Zero and template interpreters for non-TSO arches put
barriers at every `return` (with notable exception of [ARM32](
https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in
an overkill manner.
> >
> > Additional testing:
> >  - [x] New IR tests
> >  - [x] Linux x86_64 server fastdebug, `all`
> >  - [x] Linux AArch64 server fastdebug, `all`
>
> I like this compromise.  Let me see if I got it right:  A stable write in
a constructor is treated like a final write ? it triggers a barrier at the
end of the constructor.  That?s a cheap move.  No other barriers are added
automatically, for reads or other writes, saving us from doing less cheap
moves.  The burden would be on users of stable vars (in fancy access
patterns) to add more fences if needed, but we don?t see any important
cases of that, at the moment.
>

No opinion on the merits here. But IIUC, "as memory safe as finals" is a
slightly squishy notion here. The downside of not having the release fence
is that even with safe publication, a write to an @Stable field outside the
constructor can be seen by a read in the constructor, before the object is
published. That's arguably weirder than final field behavior, and not
something that can arise with final fields. But it still only happens in
the presence of data races, and thus probably not in code you should be
writing anyway.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240712/c76ad150/attachment.htm>

From duke at openjdk.org  Sat Jul 13 05:19:03 2024
From: duke at openjdk.org (duke)
Date: Sat, 13 Jul 2024 05:19:03 GMT
Subject: Withdrawn: 8301464: Code in GenFullCP is still disabled after
 JDK-8079697 was fixed
In-Reply-To: <BAqvKpFjumZRFacqHsYUioKLlfPISiGcfCUbJtFyLA0=.4f580a3b-4b65-41cb-885e-1d945c380b1c@github.com>
References: <BAqvKpFjumZRFacqHsYUioKLlfPISiGcfCUbJtFyLA0=.4f580a3b-4b65-41cb-885e-1d945c380b1c@github.com>
Message-ID: <K1IyX2kviBwKDaCsIGi6Q8XjsCc8wRddGcMlb0AjTV4=.62814bf8-7dd8-4418-aa96-0ce2607371df@github.com>

On Tue, 14 May 2024 03:05:27 GMT, xiaotaonan <duke at openjdk.org> wrote:

> Code in GenFullCP is still disabled after JDK-8079697 was fixed
> note:I have not found any relevant information on why ClassWriter.COMPUTE_FRAMES is disabled in JDK-8079697.

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/19228

From forax at openjdk.org  Sat Jul 13 11:12:51 2024
From: forax at openjdk.org (=?UTF-8?B?UsOpbWk=?= Forax)
Date: Sat, 13 Jul 2024 11:12:51 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
Message-ID: <s1XasFdk32maXz3tFyJk5buq1tlHS5xV2GoETU3-Tys=.962cbff0-0271-4deb-9357-c7c4e26599f6@github.com>

On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside compiled code.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10    52.042 ?   0.630  us/op
>> ConcurrentClose.confine...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   track has_scoped_access for compiled methods

Nice work !

Thinking a bit about how to improve the benchmark and given the semantics of Arena.close(), there is a trick that you can use. There are two kinds of memory segments, the one that only visible from Java and the one that are visible not only from Java. By example, a memory segment created from a mmap or a memory segment with an address sent to a C code are visible from outside Java, for those, you have no choice but wait in Arena.close() until all threads have answered to the handshakes. For all the other memory segments, because they are only visible from Java, their memory can be reclaimed asynchronously, i.e. the last thread of the handshakes can free the corresponding memory segments, so the thread that call Arena.close() is free to run even if the memory is not yet reclaimed.

>From my armchair, that seems a awful lot of engeneering so it may not worth it, but now you know :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2226858328

From jvernee at openjdk.org  Sat Jul 13 14:29:55 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Sat, 13 Jul 2024 14:29:55 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
Message-ID: <t2RnOoTnKhtDfrsmF42_BRTwV-eWFcUobQ89P-VJjbM=.5081d3f4-2e2c-4a2a-9f03-08e25af0275d@github.com>

On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   track has_scoped_access for compiled methods

That is something we considered in the past as well (I think Maurizio even had a prototype at some point). The issue is that close should be deterministic. i.e. after the call to `close()` returns, all memory should be freed. That is an essential property for applications that have most of their virtual address space tied up, and then want to release and immediately re-use a big chunk of it. If it's not important that memory is freed deterministically, but memory should still be accessible from multiple threads, an automatic arena might be a better choice.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2226929736

From mgronlun at openjdk.org  Sat Jul 13 14:53:19 2024
From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=)
Date: Sat, 13 Jul 2024 14:53:19 GMT
Subject: RFR: 8334781: JFR crash:  assert(((((JfrTraceIdBits::load(klass)) &
 ((JfrTraceIdEpoch::this_epoch_method_and_class_bits()))) != 0))) failed:
 invariant
Message-ID: <aHCZUov46bOLAQiJBG-h65BUAKeOvc0Lz-Jkr39PQ98=.743a7e19-075b-40ee-b886-82a6717641a2@github.com>

Greetings,

Please help review this adjustment, which fixes rare situations where methods that have been retransformed or redefined can be perceived as being tagged by JFR when they, in fact, are not. The fix unconditionally sets the metatag clear bits on artefact initialization and adds assertions about the JFR bit tag state machine.

Testing: jdk_jfr, stress testing

Thanks
Markus

-------------

Commit messages:
 - 8334781

Changes: https://git.openjdk.org/jdk/pull/20171/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20171&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8334781
  Stats: 34 lines in 7 files changed: 17 ins; 3 del; 14 mod
  Patch: https://git.openjdk.org/jdk/pull/20171.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20171/head:pull/20171

PR: https://git.openjdk.org/jdk/pull/20171

From eosterlund at openjdk.org  Sat Jul 13 15:15:51 2024
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Sat, 13 Jul 2024 15:15:51 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
Message-ID: <JlANbo3VlMnnTFmbfBDKxQUjYy3PBX4JlzzQmFEjtjg=.34c00e97-7dcf-494e-8c07-2dabe6deb978@github.com>

On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   track has_scoped_access for compiled methods

Looks good.

-------------

Marked as reviewed by eosterlund (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20158#pullrequestreview-2176318150

From eosterlund at openjdk.org  Sat Jul 13 15:31:55 2024
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Sat, 13 Jul 2024 15:31:55 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
Message-ID: <OnqjJptKgPgbiZbFAHOraOyF5BgiP3dz_6o5Wz8OYxs=.d37f7763-7efe-4bdf-9523-52c4f733bb59@github.com>

On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   track has_scoped_access for compiled methods

@dougxc might want to have a look at Graal support for this one.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2226957995

From jvernee at openjdk.org  Sat Jul 13 16:08:50 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Sat, 13 Jul 2024 16:08:50 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <OnqjJptKgPgbiZbFAHOraOyF5BgiP3dz_6o5Wz8OYxs=.d37f7763-7efe-4bdf-9523-52c4f733bb59@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <OnqjJptKgPgbiZbFAHOraOyF5BgiP3dz_6o5Wz8OYxs=.d37f7763-7efe-4bdf-9523-52c4f733bb59@github.com>
Message-ID: <y2sUl8FsxgrwFuAcQg_w9CffblaZWgyn6RAopMSk7Z8=.1fc54eb8-aa1b-4fe8-9aae-12d86e3942b8@github.com>

On Sat, 13 Jul 2024 15:28:57 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

> @dougxc might want to have a look at Graal support for this one.

Yes, I conservatively implemented `has_scoped_access()` for Graal (see `jvmciRuntime.cpp` changes). It won't regress anything, but there's still an opportunity for improvement.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2226974681

From forax at openjdk.org  Sat Jul 13 16:45:50 2024
From: forax at openjdk.org (=?UTF-8?B?UsOpbWk=?= Forax)
Date: Sat, 13 Jul 2024 16:45:50 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
Message-ID: <0zSpYFkv6lAR8G0FpPDyFP-uLqh92ZQ5uW5xVCRXmyg=.c14d0ee0-e0bf-4367-9dfa-c613489684c9@github.com>

On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   track has_scoped_access for compiled methods

Knowing that all the segments are freed during close() is something you may want.
But having the execution time of close() be linear with the number of threads is also problematic. Maybe, it means that we need another kind of Arena that works like shared() but allow the freed to be done asynchronously (ofSharedAsyncFree ?).

Note that the semantics of ofSharedAsyncFree() is different from ofAuto(), ofAuto() relies on the GC to free a segment so the delay before a segment is freed is not time bounded if the application has enough memory, the memory of the segment may never be reclaimed. With ofSharedAsyncFree(), the segments are freed by the last thread, so while this mechanism is not deterministic, it is time bounded.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2226992713

From uschindler at openjdk.org  Sun Jul 14 11:04:54 2024
From: uschindler at openjdk.org (Uwe Schindler)
Date: Sun, 14 Jul 2024 11:04:54 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
Message-ID: <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com>

On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   track has_scoped_access for compiled methods

Hi Jorn,

Many thanks for working on this! 

I have one problem with the benchmark: I think it is not measuring the whole setup in a way that is our workload: The basic problem is that we don't want to deoptimize threads which are not related to MemorySegments. So basically, the throughput of those threads should not be affected. For threads currently in a memory-segment read it should have a bit of effect, but it should recover fast.

The given benchmark somehow only measures the following: It starts many threads; in each it opens a shared memory segment, does some work and closes it. So it measures the throughput of the whole "create shared/work on it/close shared" workload. Actually the problems we see in Lucene are more that we have many threads working on shared memory segments or on other tasks not related to memory segments at all, while a few threads are concurrently closing and opening new arenas. With more threads concurrently closing the arenas, also the throughput on other threads degrades.

So IMHO, the benchamrk should be improved to have a few threads (configurable) that open/close memory segments and a list of other threads that do other work and finally a list of threads reading from the memory segments opened by first thread. The testcase you wrote is more fitting the above workload. Maybe the benchmark should be setup more like the test. If you have a benchmark with that workload it should better show an improvement.

The current benchmark has the problem that it measures the whole open/work/close on shared sgements. And slosing a shared segment is always heavy, because it has to trigger and wait for the thread-local handshake.

Why is the test preventing inlining of the inner read method?

I may be able to benchmark a Lucene workload with a custom JDK build next week. It might be an idea to use the wrong DaCapoBenchmark (downgrade to older version before it has fixed https://github.com/dacapobench/dacapobench/issues/264 , specifically https://github.com/dacapobench/dacapobench/commit/76588b28d516ae19f51a80e7287d404385a2c146).

Uwe

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2227303884

From uschindler at openjdk.org  Sun Jul 14 11:10:00 2024
From: uschindler at openjdk.org (Uwe Schindler)
Date: Sun, 14 Jul 2024 11:10:00 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <0zSpYFkv6lAR8G0FpPDyFP-uLqh92ZQ5uW5xVCRXmyg=.c14d0ee0-e0bf-4367-9dfa-c613489684c9@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <0zSpYFkv6lAR8G0FpPDyFP-uLqh92ZQ5uW5xVCRXmyg=.c14d0ee0-e0bf-4367-9dfa-c613489684c9@github.com>
Message-ID: <Ii5WTv26SUHutPAvrKXxS-0pWJvE7roJ7DXQBStH3XI=.42b58d78-60be-4533-a60e-7693a3dfeed0@github.com>

On Sat, 13 Jul 2024 16:43:16 GMT, R?mi Forax <forax at openjdk.org> wrote:

> Knowing that all the segments are freed during close() is something you may want. But having the execution time of close() be linear with the number of threads is also problematic. Maybe, it means that we need another kind of Arena that works like shared() but allow the freed to be done asynchronously (ofSharedAsyncFree ?).
> 
> Note that the semantics of ofSharedAsyncFree() is different from ofAuto(), ofAuto() relies on the GC to free a segment so the delay before a segment is freed is not time bounded if the application has enough memory, the memory of the segment may never be reclaimed. With ofSharedAsyncFree(), the segments are freed by the last thread, so while this mechanism is not deterministic, it is time bounded.

That's a great suggestion! In our case we just want the index files open as soon as possible, but not on next GC (which will be horrible and brings us back into the times of DirectByteBuffer). The problem with GC is that the Arena/MemorySegments and so on are tiny objects which will live for very long time, especially when they were used for quite some time (like an index segment of an Lucene index).

Of course for testing purposes in Lucene we could use `ofShared()` (to make sure all mmapped files are freeed, especially on Windows as soon as index is close), but in production environments we could offer the option to use delayed close to improve throughput.

Uwe

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2227305407

From duke at openjdk.org  Sun Jul 14 13:17:20 2024
From: duke at openjdk.org (ArsenyBochkarev)
Date: Sun, 14 Jul 2024 13:17:20 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics [v4]
In-Reply-To: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
Message-ID: <OoIrgTl5AYcdVT0LI29XBYklYtlfeyu8BmEwkw2dnss=.ce405fe8-7445-4408-933b-c89c5767bc53@github.com>

> Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK.

ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision:

 - Add newlines after each of new functions
 - Use global x0 instead of alias for it

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19960/files
  - new: https://git.openjdk.org/jdk/pull/19960/files/8520bc3a..ede19103

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=02-03

  Stats: 5 lines in 1 file changed: 3 ins; 2 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/19960.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19960/head:pull/19960

PR: https://git.openjdk.org/jdk/pull/19960

From duke at openjdk.org  Sun Jul 14 13:17:20 2024
From: duke at openjdk.org (ArsenyBochkarev)
Date: Sun, 14 Jul 2024 13:17:20 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics [v2]
In-Reply-To: <T59CuchKVcFhqy7VAzIHxakveuo2bJFrORdrKQwoFLE=.1b43c0cb-d05e-45eb-b85c-026b44dea080@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
 <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com>
 <vknSXGLwqD-p-lOrVwzn8rU6mTY3o4NP3eRbp4smvoI=.33dba76f-cd79-4d55-9e87-58e37adfeaf8@github.com>
 <T59CuchKVcFhqy7VAzIHxakveuo2bJFrORdrKQwoFLE=.1b43c0cb-d05e-45eb-b85c-026b44dea080@github.com>
Message-ID: <x-FAWINJvvNt_Qg4EAmdtmfAsJZBDU7pbuP1QcvABrU=.7db48ca9-9dee-4b9c-829a-b1ef07e4271a@github.com>

On Tue, 9 Jul 2024 08:36:16 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2282:
>> 
>>> 2280:     __ vrev8_v(vtmp1, vtmp1);
>>> 2281:     __ vrev8_v(vtmp2, vtmp2);
>>> 2282:   }
>> 
>> Please leave a new line after each of these newly-added functions.
>
> BTW: Did you compare this with the openssl version which also makes use of `vaesz_vs` instruction from `Zvkned`  [1]? 
> 
> [1] https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aes-riscv64-zvkb-zvkned.pl

Done!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1677133443

From duke at openjdk.org  Sun Jul 14 13:17:21 2024
From: duke at openjdk.org (ArsenyBochkarev)
Date: Sun, 14 Jul 2024 13:17:21 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics [v2]
In-Reply-To: <vknSXGLwqD-p-lOrVwzn8rU6mTY3o4NP3eRbp4smvoI=.33dba76f-cd79-4d55-9e87-58e37adfeaf8@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
 <eGRQlTfJGvdSd84lJn1MUGon75zsDTYTOhMbVqQryC8=.3cff42c0-7b5c-4870-929e-3acfa74e31bd@github.com>
 <vknSXGLwqD-p-lOrVwzn8rU6mTY3o4NP3eRbp4smvoI=.33dba76f-cd79-4d55-9e87-58e37adfeaf8@github.com>
Message-ID: <sjq-jSCSS8ZVs4OHKtAbFNQ4UTMWlvl6T0npQSjAbNs=.f2992394-755e-4acd-8d87-1c33f8635d82@github.com>

On Mon, 8 Jul 2024 14:50:00 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> ArsenyBochkarev has updated the pull request incrementally with three additional commits since the last revision:
>> 
>>  - Use t2 directly instead of temp2
>>  - Rename temp1 -> x0
>>  - Left a note on a side effect of generate_vle32_pack4
>
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2332:
> 
>> 2330:     const Register key         = c_rarg2;  // key array address
>> 2331:     const Register keylen      = c_rarg3;
>> 2332:     const Register x0          = c_rarg4;
> 
> I think you can use the global `x0` (aka the zero register) instead for `vsetivli`. It very confusing to have register alias names like `x0` like here.

Done

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1677133451

From duke at openjdk.org  Sun Jul 14 15:00:04 2024
From: duke at openjdk.org (ArsenyBochkarev)
Date: Sun, 14 Jul 2024 15:00:04 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics [v5]
In-Reply-To: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
Message-ID: <Yt-B895uR8jzQ6h90NG1ObK9-Dq1xk0dLAaz30Pi6gY=.6a78060e-f59f-47e8-9819-df255c8cee83@github.com>

> Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK.

ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision:

  Multiversion encryption depending on keylen

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19960/files
  - new: https://git.openjdk.org/jdk/pull/19960/files/ede19103..8f1f98b5

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=03-04

  Stats: 76 lines in 1 file changed: 22 ins; 25 del; 29 mod
  Patch: https://git.openjdk.org/jdk/pull/19960.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19960/head:pull/19960

PR: https://git.openjdk.org/jdk/pull/19960

From duke at openjdk.org  Sun Jul 14 15:00:04 2024
From: duke at openjdk.org (ArsenyBochkarev)
Date: Sun, 14 Jul 2024 15:00:04 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics [v3]
In-Reply-To: <IATUuy7OYBIasXTq1KFmVEjeg2eQ9qFM2UP5B0UhoHw=.7a112155-e875-4752-b6f4-fbeb56248759@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
 <F1yms2X9VVITjLPANuQqABre5E199ILHQ4ywpS4cicY=.3e2c0af1-8070-497a-bfa0-5732eb199974@github.com>
 <IATUuy7OYBIasXTq1KFmVEjeg2eQ9qFM2UP5B0UhoHw=.7a112155-e875-4752-b6f4-fbeb56248759@github.com>
Message-ID: <gUcwCyjQ9PLL3JiB2PFaCzec5qfwHz2BUotiqLqGfJA=.ad375f3a-17a3-4058-9ee5-f07586d64e42@github.com>

On Tue, 9 Jul 2024 05:28:13 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Left a note on a side effect of generate_vle32_pack2
>
> Changes requested by fyang (Reviewer).

As for comparison with the openssl version: first of all, thanks for the sources, @RealFYang! The main difference that I see is that they introduced three different different versions of encryption depending on the key sizes, which allows them to skip a couple of instructions, like when I did `vaesem_vv(res, vzero)` followed by `vxor_vv(res, res, vtemp1)`. So I thought it'll be more efficient to replace the current version by something openssl-lookalike. The only problem I see is increasing code size a bit. Please let me know if we are not interested in this change for some reason

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19960#issuecomment-2227377554

From duke at openjdk.org  Sun Jul 14 15:09:04 2024
From: duke at openjdk.org (ArsenyBochkarev)
Date: Sun, 14 Jul 2024 15:09:04 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics [v6]
In-Reply-To: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
Message-ID: <GPGRoOpsnwB4pgPTPjLAB_urcM6X8fhrQgRQwT6tMQY=.0b8f0d8d-9d63-4086-9378-cf4d36359a3d@github.com>

> Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK.

ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision:

  Use one L_end for all AES key sizes

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19960/files
  - new: https://git.openjdk.org/jdk/pull/19960/files/8f1f98b5..407b9af0

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=04-05

  Stats: 12 lines in 1 file changed: 1 ins; 8 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/19960.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19960/head:pull/19960

PR: https://git.openjdk.org/jdk/pull/19960

From aboldtch at openjdk.org  Mon Jul 15 00:50:30 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 15 Jul 2024 00:50:30 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
Message-ID: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>

> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
> 
> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
> 
> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
> 
> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
> 
> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
> 
> # Cleanups
> 
> Cleaned up displaced header usage for:
>   * BasicLock
>     * Contains some Zero changes
>     * Renames one exported JVMCI field
>   * ObjectMonitor
>     * Updates comments and tests consistencies
> 
> # Refactoring
> 
> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
> 
> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
> 
> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
> 
> # LightweightSynchronizer
> 
> Working on adapting and incorporating the following section as a comment in the source code
> 
> ## Fast Locking
> 
>   CAS on locking bits in markWord. 
>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
> 
>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
> 
>   If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inf...

Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision:

 - Remove try_read
 - Add explicit to single parameter constructors
 - Remove superfluous access specifier
 - Remove unused include
 - Update assert message OMCache::set_monitor
 - Fix indentation
 - Remove outdated comment LightweightSynchronizer::exit
 - Remove logStream include
 - Remove strange comment
 - Fix javaThread include

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20067/files
  - new: https://git.openjdk.org/jdk/pull/20067/files/cccffeda..ebf11542

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=07-08

  Stats: 25 lines in 5 files changed: 0 ins; 8 del; 17 mod
  Patch: https://git.openjdk.org/jdk/pull/20067.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067

PR: https://git.openjdk.org/jdk/pull/20067

From aboldtch at openjdk.org  Mon Jul 15 00:50:33 2024
From: aboldtch at openjdk.org (Axel Boldt-Christmas)
Date: Mon, 15 Jul 2024 00:50:33 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6]
In-Reply-To: <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
 <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com>
Message-ID: <u2VLk8hKBH5V6331fMIPCwusNARMd_v-q_wL_7r0AOA=.99b9b9f1-ac37-4cb6-9ad0-4e019fe3c1fe@github.com>

On Fri, 12 Jul 2024 11:09:35 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update arguments.cpp
>
> src/hotspot/share/oops/instanceKlass.cpp line 1090:
> 
>> 1088: 
>> 1089:     // Step 2
>> 1090:     // If we were to use wait() instead of waitUninterruptibly() then
> 
> This is a nice correction (even though, the actual call below is wait_uninterruptibly() ;-) ), but seems totally unrelated.

I was thinking it was referring to `ObjectSynchronizer::waitUninterruptibly` added the same commit as the comment b3bf31a0a08da679ec2fd21613243fb17b1135a9

> src/hotspot/share/oops/markWord.cpp line 27:
> 
>> 25: #include "precompiled.hpp"
>> 26: #include "oops/markWord.hpp"
>> 27: #include "runtime/basicLock.inline.hpp"
> 
> I don't think this include is needed (at least not by the changed code parts, I haven't checked existing code).

It is probably included through some other transitive include. However all the metadata functions are now inlined. These are used here. `inline markWord BasicLock::displaced_header() const` and `inline void BasicLock::set_displaced_header(markWord header)`

> src/hotspot/share/runtime/arguments.cpp line 1820:
> 
>> 1818:     warning("New lightweight locking not supported on this platform");
>> 1819:   }
>> 1820:   if (UseObjectMonitorTable) {
> 
> Uhm, wait a second. That list of platforms covers all existing platforms anyway, so the whole block could be removed? Or is there a deeper meaning here that I don't understand?

Zero. Used as as start point for porting to new platforms.

> src/hotspot/share/runtime/basicLock.cpp line 37:
> 
>> 35:     if (mon != nullptr) {
>> 36:       mon->print_on(st);
>> 37:     }
> 
> I am not sure if we wanted to do this, but we know the owner, therefore we could also look-up the OM from the table, and print it. It wouldn't have all that much to do with the BasicLock, though.

Yeah maybe it is unwanted. Not sure how we should treat these prints of the frames. My thinking was that there is something in the cache, print it. But maybe just treating it as some internal data, maybe print "monitor { <Cached ObjectMonitor* address> }" or similar is better.

> src/hotspot/share/runtime/basicLock.inline.hpp line 45:
> 
>> 43:   return reinterpret_cast<ObjectMonitor*>(get_metadata());
>> 44: #else
>> 45:   // Other platforms does not make use of the cache yet,
> 
> If it's not used, why does it matter to special case the code here?

Because it is not used it there may be uninitialised values there.

See https://github.com/openjdk/jdk/pull/20067#discussion_r1671959763

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 28:
> 
>> 26: 
>> 27: #include "classfile/vmSymbols.hpp"
>> 28: #include "javaThread.inline.hpp"
> 
> This include is incorrect (and my IDE says it's not needed).

Correct, is should be `runtime/javaThread.inline.hpp`.  Fixed.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 31:
> 
>> 29: #include "jfrfiles/jfrEventClasses.hpp"
>> 30: #include "logging/log.hpp"
>> 31: #include "logging/logStream.hpp"
> 
> Include of logStream.hpp not needed?

Yeah we removed all log streams. Removed.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 80:
> 
>> 78: 
>> 79:   ConcurrentTable* _table;
>> 80:   volatile size_t _table_count;
> 
> Looks like a misnomer to me. We only have one table, but we do have N entries/nodes. This is counted when new nodes are allocated or old nodes are freed. Consider renaming this to '_entry_count' or '_node_count'? I'm actually a bit surprised if ConcurrentHashTable doesn't already track this...

I think I was thinking of the names as a prefix to refer to the `Count of the table` and `Size of the table`. And not the `Number of tables`. But I can see the confusion. 

`ConcurrentHashTable` tracks no statistics except for JFR which added some counters directly into the implementation. All statistics are for the users to manage, even if there are helpers for gather these statistics. 

The current implementation is based on what we do for the StringTable and SymbolTable

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 88:
> 
>> 86: 
>> 87:   public:
>> 88:     Lookup(oop obj) : _obj(obj) {}
> 
> Make explicit?

Done.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 97:
> 
>> 95: 
>> 96:     bool equals(ObjectMonitor** value) {
>> 97:       // The entry is going to be removed soon.
> 
> What does this comment mean?

Not sure where it came from. Removed.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 112:
> 
>> 110: 
>> 111:   public:
>> 112:     LookupMonitor(ObjectMonitor* monitor) : _monitor(monitor) {}
> 
> Make explicit?

Done.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 159:
> 
>> 157:   static size_t min_log_size() {
>> 158:     // ~= log(AvgMonitorsPerThreadEstimate default)
>> 159:     return 10;
> 
> Uh wait - are we assuming that threads hold 1024 monitors *on average* ? Isn't this a bit excessive? I would have thought maybe 8 monitors/thread. Yes there are workloads that are bonkers. Or maybe the comment/flag name does not say what I think it says.
> 
> Or why not use AvgMonitorsPerThreadEstimate directly?

Maybe that is resonable. I believe I had that at some point but it had to deal with how to handle extreme values of `AvgMonitorsPerThreadEstimate` as well as what to do when `AvgMonitorsPerThreadEstimate` was disabled `=0`. One 4 / 8 KB allocation seems harmless.

But this was very arbitrary. This will probably be changed when/if the resizing of the table becomes more synchronised with deflation, allowing for shrinking the table.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 349:
> 
>> 347:   assert(LockingMode == LM_LIGHTWEIGHT, "must be");
>> 348: 
>> 349:   if (try_read) {
> 
> All the callers seem to pass try_read = true. Why do we have the branch at all?

I'll clean this up. From experiments if was never better to use `insert_get` over a `get; insert_get`, even if we tried to be cleaver on when we skipped the initial get.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 401:
> 
>> 399: 
>> 400:   if (inserted) {
>> 401:     // Hopefully the performance counters are allocated on distinct
> 
> It doesn't look like the counters are on distinct cache lines (see objectMonitor.hpp, lines 212ff). If this is a concern, file a bug to investigate it later? The comment here is a bit misplaced, IMO.

It originates from https://github.com/openjdk/jdk/blob/15997bc3dfe9dddf21f20fa189f97291824892de/src/hotspot/share/runtime/synchronizer.cpp#L1543 

I think we just kept it and did not think more about it.

Not sure what it is referring to. Maybe @dcubed-ojdk knows more, they originated from him (9 years old comment).

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 477:
> 
>> 475:     if (obj->mark_acquire().has_monitor()) {
>> 476:       if (_length > 0 && _contended_oops[_length-1] == obj) {
>> 477:         // assert(VM_Version::supports_recursive_lightweight_locking(), "must be");
> 
> Uncomment or remove assert?

Yeah not sure why it was ever uncommented. To me it seems like that the assert should be invariant. But will investigate.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 554:
> 
>> 552:   bool _no_safepoint;
>> 553:   union {
>> 554:     struct {} _dummy;
> 
> Uhh ... Why does this need to be wrapped in a union and struct?

A poor man's optional.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 563:
> 
>> 561:     assert(locking_thread == current || locking_thread->is_obj_deopt_suspend(), "locking_thread may not run concurrently");
>> 562:     if (_no_safepoint) {
>> 563:       ::new (&_nsv) NoSafepointVerifier();
> 
> I'm thinking that it might be easier and cleaner to just re-do what the NoSafepointVerifier does? It just calls thread->inc/dec
> _no_safepoint_count().

I wanted to avoid having to add `NoSafepointVerifier` implementation details in the synchroniser code. I guess `ContinuationWrapper` already does this. 

Simply creating a `NoSafepointVerifier` when you expect no safepoint is more obvious to me, shows the intent better.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 748:
> 
>> 746:   }
>> 747: 
>> 748:   // Fast-locking does not use the 'lock' argument.
> 
> I believe the comment is outdated.

Removed.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 969:
> 
>> 967: 
>> 968:   for (;;) {
>> 969:   // Fetch the monitor from the table
> 
> Wrong intendation.

Fixed.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 1157:
> 
>> 1155:   // enter can block for safepoints; clear the unhandled object oop
>> 1156:   PauseNoSafepointVerifier pnsv(&nsv);
>> 1157:   object = nullptr;
> 
> What is the point of that statement? object is not an out-arg (afaict), and not used subsequently.

`CHECK_UNHANDLED_OOPS` + `-XX:+CheckUnhandledOops`

https://github.com/openjdk/jdk/blob/15997bc3dfe9dddf21f20fa189f97291824892de/src/hotspot/share/oops/oopsHierarchy.hpp#L53-L55

> src/hotspot/share/runtime/lightweightSynchronizer.hpp line 68:
> 
>> 66:   static void exit(oop object, JavaThread* current);
>> 67: 
>> 68:   static ObjectMonitor* inflate_into_object_header(Thread* current, JavaThread* inflating_thread, oop object, const ObjectSynchronizer::InflateCause cause);
> 
> My IDE flags this with a warning 'Parameter 'cause' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions' *shrugs*

Yeah. The only effect is has is that you cannot reassign the variable. It was the style taken from [synchronizer.hpp](https://github.com/openjdk/jdk/blob/15997bc3dfe9dddf21f20fa189f97291824892de/src/hotspot/share/runtime/synchronizer.hpp) where all `InflateCause` parameters are const.

> src/hotspot/share/runtime/lockStack.inline.hpp line 232:
> 
>> 230:   oop obj = monitor->object_peek();
>> 231:   assert(obj != nullptr, "must be alive");
>> 232:   assert(monitor == LightweightSynchronizer::get_monitor_from_table(JavaThread::current(), obj), "must be exist in table");
> 
> "must be exist in table" -> "must exist in table"

Done.

> src/hotspot/share/runtime/objectMonitor.cpp line 56:
> 
>> 54: #include "runtime/safepointMechanism.inline.hpp"
>> 55: #include "runtime/sharedRuntime.hpp"
>> 56: #include "runtime/synchronizer.hpp"
> 
> This include is not used.

Removed.

> src/hotspot/share/runtime/objectMonitor.hpp line 193:
> 
>> 191:   ObjectWaiter* volatile _WaitSet;  // LL of threads wait()ing on the monitor
>> 192:   volatile int  _waiters;           // number of waiting threads
>> 193:  private:
> 
> You can now also remove the 'private:' here

Done.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240569
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240591
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240598
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240629
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240633
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240644
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240655
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240709
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240664
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240684
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240695
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240712
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240735
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240747
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240787
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240807
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677240936
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677241002
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677241011
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677241037
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677241082
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677241093
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677241121
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1677241145

From dholmes at openjdk.org  Mon Jul 15 05:28:51 2024
From: dholmes at openjdk.org (David Holmes)
Date: Mon, 15 Jul 2024 05:28:51 GMT
Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers
 [v2]
In-Reply-To: <0j_XZ2e84ADGz8jxk21pFyF0QNhubV0i7sVi5sxnSyg=.7281e6d1-bf24-49f1-96a6-8284c4c9f90d@github.com>
References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
 <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com>
 <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com>
 <0j_XZ2e84ADGz8jxk21pFyF0QNhubV0i7sVi5sxnSyg=.7281e6d1-bf24-49f1-96a6-8284c4c9f90d@github.com>
Message-ID: <G5EBaq25gdUcR-5HHsF3Bg8vvpXImOqwnKZbIht8LMI=.07dd543a-1442-495b-97cd-c2bffe268949@github.com>

On Fri, 12 Jul 2024 09:14:00 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> src/hotspot/share/classfile/javaClasses.cpp line 3018:
>> 
>>> 3016:   int flags = (jushort)( m->access_flags().as_short() & JVM_RECOGNIZED_METHOD_MODIFIERS );
>>> 3017:   if (m->is_object_initializer()) {
>>> 3018:     flags |= java_lang_invoke_MemberName::MN_IS_CONSTRUCTOR;
>> 
>> I'm going to assume that `clinit` would already get filtered out at some point otherwise this would be a change in behaviour.
>
> No, it is not filtered, we still have `clinit`-s on this path. In the initial version https://github.com/openjdk/jdk/pull/20120/commits/1a0d18f1333866ab2eceb02b30c0fe363473d4e6#diff-a8ed79cab8961103a78187704b7a14fd00b322da06e75518bcfd888d9b940040R3020 I caught the assert in many tests, mostly in stack traces generation. 
> 
> Yes, this changes the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior.

Okay, such a change in behaviour was unexpected for a "cleanup" PR. I'm looking into it now. Perhaps @mlchung can comment?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1677324623

From dnsimon at openjdk.org  Mon Jul 15 08:30:51 2024
From: dnsimon at openjdk.org (Doug Simon)
Date: Mon, 15 Jul 2024 08:30:51 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
Message-ID: <xS-SSkFYhZMr5bOOD766HSBI2qeZwCap1zxpH8FakX8=.c3488826-41a7-40b5-858e-531b0156a909@github.com>

On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   track has_scoped_access for compiled methods

src/hotspot/share/prims/scopedMemoryAccess.cpp line 179:

> 177:         //
> 178:         // The safepoint at which we're stopped may be in between the liveness check
> 179:         // and actual memory access, but is itself 'outside' of @Scoped code

what is `@Scoped code`? I don't see that annotation mentioned here: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/ScopedValue.html

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677474756

From dnsimon at openjdk.org  Mon Jul 15 08:43:53 2024
From: dnsimon at openjdk.org (Doug Simon)
Date: Mon, 15 Jul 2024 08:43:53 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
Message-ID: <ptpaZ2nDeFW4XCq-qpHEWSCXxYQRhvvpO8Ol2Zo0fyE=.83707754-6088-465c-85bd-ea1ac96af034@github.com>

On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   track has_scoped_access for compiled methods

src/hotspot/share/jvmci/jvmciRuntime.cpp line 2186:

> 2184:         nm->set_has_wide_vectors(has_wide_vector);
> 2185:         nm->set_has_monitors(has_monitors);
> 2186:         nm->set_has_scoped_access(true); // conservative

What does "conservative" imply here? That is, what performance penalty will be incurred for Graal compiled code until it completely supports this "scoped access" bit?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677490130

From uschindler at openjdk.org  Mon Jul 15 08:43:54 2024
From: uschindler at openjdk.org (Uwe Schindler)
Date: Mon, 15 Jul 2024 08:43:54 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <xS-SSkFYhZMr5bOOD766HSBI2qeZwCap1zxpH8FakX8=.c3488826-41a7-40b5-858e-531b0156a909@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <xS-SSkFYhZMr5bOOD766HSBI2qeZwCap1zxpH8FakX8=.c3488826-41a7-40b5-858e-531b0156a909@github.com>
Message-ID: <iZUCNDNa_4a6TemwjTTiax82KYrB8ZVnmN4csEC58Ek=.858f62d2-a36a-405d-95a9-31b04ec0ac00@github.com>

On Mon, 15 Jul 2024 08:28:16 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

>> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   track has_scoped_access for compiled methods
>
> src/hotspot/share/prims/scopedMemoryAccess.cpp line 179:
> 
>> 177:         //
>> 178:         // The safepoint at which we're stopped may be in between the liveness check
>> 179:         // and actual memory access, but is itself 'outside' of @Scoped code
> 
> what is `@Scoped code`? I don't see that annotation mentioned here: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/ScopedValue.html

This is the whole magic around the shared arena. It is not public API and internal to Hotspot/VM:
- https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/java.base/share/classes/jdk/internal/misc/X-ScopedMemoryAccess.java.template#L117-L119
- https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/hotspot/share/prims/scopedMemoryAccess.cpp#L143-L149

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677486942

From alanb at openjdk.org  Mon Jul 15 08:43:54 2024
From: alanb at openjdk.org (Alan Bateman)
Date: Mon, 15 Jul 2024 08:43:54 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <iZUCNDNa_4a6TemwjTTiax82KYrB8ZVnmN4csEC58Ek=.858f62d2-a36a-405d-95a9-31b04ec0ac00@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <xS-SSkFYhZMr5bOOD766HSBI2qeZwCap1zxpH8FakX8=.c3488826-41a7-40b5-858e-531b0156a909@github.com>
 <iZUCNDNa_4a6TemwjTTiax82KYrB8ZVnmN4csEC58Ek=.858f62d2-a36a-405d-95a9-31b04ec0ac00@github.com>
Message-ID: <GzjVKiahxVjaGdUKwOkTOFDkGSexn-alBu30Igy4SDA=.027c7042-4777-4943-84dd-f018dd038a45@github.com>

On Mon, 15 Jul 2024 08:38:59 GMT, Uwe Schindler <uschindler at openjdk.org> wrote:

>> src/hotspot/share/prims/scopedMemoryAccess.cpp line 179:
>> 
>>> 177:         //
>>> 178:         // The safepoint at which we're stopped may be in between the liveness check
>>> 179:         // and actual memory access, but is itself 'outside' of @Scoped code
>> 
>> what is `@Scoped code`? I don't see that annotation mentioned here: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/ScopedValue.html
>
> This is the whole magic around the shared arena. It is not public API and internal to Hotspot/VM:
> - https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/java.base/share/classes/jdk/internal/misc/X-ScopedMemoryAccess.java.template#L117-L119
> - https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/hotspot/share/prims/scopedMemoryAccess.cpp#L143-L149

> what is `@Scoped code`? I don't see that annotation mentioned here: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/ScopedValue.html

This is nothing to do with scoped values, instead this is an annotation declared in jdk.internal.misc.ScopedMemoryAccess that is known to the VM.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677489415

From uschindler at openjdk.org  Mon Jul 15 08:53:52 2024
From: uschindler at openjdk.org (Uwe Schindler)
Date: Mon, 15 Jul 2024 08:53:52 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <GzjVKiahxVjaGdUKwOkTOFDkGSexn-alBu30Igy4SDA=.027c7042-4777-4943-84dd-f018dd038a45@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <xS-SSkFYhZMr5bOOD766HSBI2qeZwCap1zxpH8FakX8=.c3488826-41a7-40b5-858e-531b0156a909@github.com>
 <iZUCNDNa_4a6TemwjTTiax82KYrB8ZVnmN4csEC58Ek=.858f62d2-a36a-405d-95a9-31b04ec0ac00@github.com>
 <GzjVKiahxVjaGdUKwOkTOFDkGSexn-alBu30Igy4SDA=.027c7042-4777-4943-84dd-f018dd038a45@github.com>
Message-ID: <L1lKorlvjPi33N65zUk7wYlXZtSqxIoVUNY-o66Q6dw=.6a05ef06-56fb-42d7-8fa1-f2d4ecf769b9@github.com>

On Mon, 15 Jul 2024 08:41:01 GMT, Alan Bateman <alanb at openjdk.org> wrote:

>> This is the whole magic around the shared arena. It is not public API and internal to Hotspot/VM:
>> - https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/java.base/share/classes/jdk/internal/misc/X-ScopedMemoryAccess.java.template#L117-L119
>> - https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/hotspot/share/prims/scopedMemoryAccess.cpp#L143-L149
>
>> what is `@Scoped code`? I don't see that annotation mentioned here: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/ScopedValue.html
> 
> This is nothing to do with scoped values, instead this is an annotation declared in jdk.internal.misc.ScopedMemoryAccess that is known to the VM.

Basically if the VM is inside a `@Scoped` method and it starts a thread-local handshake, it will deoptimize top-most frame of all those threads so they can do the "isAlive" check.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677501161

From mcimadamore at openjdk.org  Mon Jul 15 08:56:54 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 08:56:54 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com>
Message-ID: <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com>

On Sun, 14 Jul 2024 11:01:58 GMT, Uwe Schindler <uschindler at openjdk.org> wrote:

> I have one problem with the benchmark: I think it is not measuring the whole setup in a way that is our workload: The basic problem is that we don't want to deoptimize threads which are not related to MemorySegments. So basically, the throughput of those threads should not be affected. For threads currently in a memory-segment read it should have a bit of effect, but it should recover fast.

IMHO there is a bit of confusion in this discussion. When we say that a shared arena close operation is slow, we might mean one of two things:

1. calling the `close()` method itself is slow (this is what the benchmark effectively measures)
2. throughput of unrelated threads is affected (I think this is what Lucene is seeing)

Addressing (2) than (1) (in the sense that, if you sign up for a shared arena close, you know it's going to be deterministic, but expensive, as the javadoc itself admits).

For this reason, I'm unsure about some of the "delaying tactics" I see mentioned here: if we delay the underlying "free"/"unmap" operation, this is only going to affect (1). You still need some global operation (e.g. handshake) to make sure all threads agree on the segment state. Moving the cost of the free/unmap from one place to another is not really going to do much for (2).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228002760

From luhenry at openjdk.org  Mon Jul 15 08:58:53 2024
From: luhenry at openjdk.org (Ludovic Henry)
Date: Mon, 15 Jul 2024 08:58:53 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v4]
In-Reply-To: <FZMjsZWO9NKx4v5svo8qQPE5HKqvoiM-lc0oiDCah80=.2d250429-524a-4e93-a453-bf1db0238626@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
 <FZMjsZWO9NKx4v5svo8qQPE5HKqvoiM-lc0oiDCah80=.2d250429-524a-4e93-a453-bf1db0238626@github.com>
Message-ID: <yRaSxAW_ivL6zaoRe4-MDVu1yockrIPgGh8EhghgeHM=.832b8741-972c-4725-9b8e-f0e2597228f6@github.com>

On Tue, 2 Jul 2024 14:16:35 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> 
>> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it.
>> 
>> Thanks.
>> 
>> ## Test
>> benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32)
>> 
>> I've tried several implementations, respectively with vector group
>> * m2+m1+scalar
>> * m2+scalar
>> * m1+scalar
>> * pure scalar
>> The best one is combination of m2+m1, it have best performance in all source size.
>> 
>> ### K230
>> 
>> this implementation (m2+m1)
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic
>> -- | -- | -- | -- | -- | -- | -- | -- | --
>> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063
>> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443
>> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364
>> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856
>> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146
>> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423
>> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862
>> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469
>> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631
>> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698
>> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876
>> 
>> </google-sheets-html-origin>
>> 
>> vector with only m2
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 4...
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   move label

Marked as reviewed by luhenry (Committer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/19973#pullrequestreview-2177185304

From mcimadamore at openjdk.org  Mon Jul 15 09:00:52 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 09:00:52 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
Message-ID: <1x3PmjfjQXR2h3k8UlLT0N9_yvLbNw_cn3O7NRLDt_U=.68c48202-af31-40ed-836b-ecafd051113f@github.com>

On Fri, 12 Jul 2024 20:59:26 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   track has_scoped_access for compiled methods

test/micro/org/openjdk/bench/java/lang/foreign/ConcurrentClose.java line 34:

> 32: import static java.lang.foreign.ValueLayout.*;
> 33: 
> 34: @BenchmarkMode(Mode.AverageTime)

Doesn't the existing bench `MemorySessionClose` already covers this? That benchmark has three stress modes, and one of them spawns many unrelated threads (but there is only one thread doing a close).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677508532

From uschindler at openjdk.org  Mon Jul 15 09:00:53 2024
From: uschindler at openjdk.org (Uwe Schindler)
Date: Mon, 15 Jul 2024 09:00:53 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <1x3PmjfjQXR2h3k8UlLT0N9_yvLbNw_cn3O7NRLDt_U=.68c48202-af31-40ed-836b-ecafd051113f@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <1x3PmjfjQXR2h3k8UlLT0N9_yvLbNw_cn3O7NRLDt_U=.68c48202-af31-40ed-836b-ecafd051113f@github.com>
Message-ID: <yLIZx6_mKpBkcqNL2CnMkDAhHwN0XbP85IHCyPZl__w=.79858167-e0c3-427f-a5d8-435b5745e6c9@github.com>

On Mon, 15 Jul 2024 08:57:08 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   track has_scoped_access for compiled methods
>
> test/micro/org/openjdk/bench/java/lang/foreign/ConcurrentClose.java line 34:
> 
>> 32: import static java.lang.foreign.ValueLayout.*;
>> 33: 
>> 34: @BenchmarkMode(Mode.AverageTime)
> 
> Doesn't the existing bench `MemorySessionClose` already covers this? That benchmark has three stress modes, and one of them spawns many unrelated threads (but there is only one thread doing a close).

It should also run threads not doing any scoped accesses to verify that other threads are not affected.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677509975

From uschindler at openjdk.org  Mon Jul 15 09:04:54 2024
From: uschindler at openjdk.org (Uwe Schindler)
Date: Mon, 15 Jul 2024 09:04:54 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com>
 <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com>
Message-ID: <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com>

On Mon, 15 Jul 2024 08:54:11 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> > I have one problem with the benchmark: I think it is not measuring the whole setup in a way that is our workload: The basic problem is that we don't want to deoptimize threads which are not related to MemorySegments. So basically, the throughput of those threads should not be affected. For threads currently in a memory-segment read it should have a bit of effect, but it should recover fast.
> 
> IMHO there is a bit of confusion in this discussion. When we say that a shared arena close operation is slow, we might mean one of two things:
> 
> 1. calling the `close()` method itself is slow (this is what the benchmark effectively measures)
> 2. throughput of unrelated threads is affected (I think this is what Lucene is seeing)
> 
> Addressing (2) than (1) (in the sense that, if you sign up for a shared arena close, you know it's going to be deterministic, but expensive, as the javadoc itself admits).

I fully agree, we mixed two different approaches. The problem is that the benchmark measures both, 1 and 2 per thread. To see an effect of this change, the benchmark should have 3 types of threads: One only closing arenas, another set that consumes scoped memory and a third group doing totally unrelated stuff.

> For this reason, I'm unsure about some of the "delaying tactics" I see mentioned here: if we delay the underlying "free"/"unmap" operation, this is only going to affect (1). You still need some global operation (e.g. handshake) to make sure all threads agree on the segment state. Moving the cost of the free/unmap from one place to another is not really going to do much for (2).

This is indeed unrelated. It is just an idea I also thorught of. In Apache Lucene we are mostly interested to close the shared arena as soon as possible. We don't need to make sure it is closed after the "close" call finished (we don't care), but we can't wait until GC closes the arena possibly after hours or even days. The reason for the latter is that the Arena is a small, long-living instance and GC does not want to free it, as there is no pressure.

So basically for us it would be best to trigger the close and then do other stuff.

Of course we can do that in a separate thread (this is my idea how to improve the closes in lucene). The only problem is that Lucene does not have own threadpools, so this would be responsibility of the caller to possibly close our indexes in a separate thread (and a single one only).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228018619

From mcimadamore at openjdk.org  Mon Jul 15 09:14:53 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 09:14:53 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com>
 <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com>
 <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com>
Message-ID: <LVyAn4d_sDCGkBiHz3Xyo7phPDe9zvNsHAw81TetnSw=.d18de3cb-8b55-4de2-93ce-964d30ac1dd4@github.com>

On Mon, 15 Jul 2024 09:02:29 GMT, Uwe Schindler <uschindler at openjdk.org> wrote:

> One only closing arenas, another set that consumes scoped memory and a third group doing totally unrelated stuff.

Exactly. My general feeling is that the cost of handshaking a thread dominates everything else, so doing improvements around e.g. avoiding unnecessary deoptimization (as in this PR) is not going to help much, even for threads doing unrelated stuff, but I'd be happy to be proven wrong.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228035799

From shade at openjdk.org  Mon Jul 15 09:17:55 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 15 Jul 2024 09:17:55 GMT
Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers
 [v2]
In-Reply-To: <G5EBaq25gdUcR-5HHsF3Bg8vvpXImOqwnKZbIht8LMI=.07dd543a-1442-495b-97cd-c2bffe268949@github.com>
References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
 <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com>
 <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com>
 <0j_XZ2e84ADGz8jxk21pFyF0QNhubV0i7sVi5sxnSyg=.7281e6d1-bf24-49f1-96a6-8284c4c9f90d@github.com>
 <G5EBaq25gdUcR-5HHsF3Bg8vvpXImOqwnKZbIht8LMI=.07dd543a-1442-495b-97cd-c2bffe268949@github.com>
Message-ID: <5Xt9rNCHwYnwvFMglf_Yp5ZzwKEDNrmRecR_NrFLGMA=.7aa1fef1-a977-4244-ad24-df9897bb2743@github.com>

On Mon, 15 Jul 2024 05:26:30 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> No, it is not filtered, we still have `clinit`-s on this path. In the initial version https://github.com/openjdk/jdk/pull/20120/commits/1a0d18f1333866ab2eceb02b30c0fe363473d4e6#diff-a8ed79cab8961103a78187704b7a14fd00b322da06e75518bcfd888d9b940040R3020 I caught the assert in many tests, mostly in stack traces generation. 
>> 
>> Yes, this changes the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior.
>
> Okay, such a change in behaviour was unexpected for a "cleanup" PR. I'm looking into it now. Perhaps @mlchung can comment?

Yeah, this is not really a cleanup (behaviors stay the same) change. For this particular hunk, keeping the old behavior seems to be unnecessary work. Note that we are also changing the behavior in C2: both in `do_exits` we no longer emit the barriers for `static final` stores in `clinits`, plus EA does not care about `clinits` anymore as well. Those are also behavioral changes.

If you prefer, I can turn this PR into a behaviorally similar cleanup, and do the behavior changes separately.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1677530301

From mcimadamore at openjdk.org  Mon Jul 15 09:21:53 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 09:21:53 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <LVyAn4d_sDCGkBiHz3Xyo7phPDe9zvNsHAw81TetnSw=.d18de3cb-8b55-4de2-93ce-964d30ac1dd4@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com>
 <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com>
 <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com>
 <LVyAn4d_sDCGkBiHz3Xyo7phPDe9zvNsHAw81TetnSw=.d18de3cb-8b55-4de2-93ce-964d30ac1dd4@github.com>
Message-ID: <7_MKD2O70VqPmWUn5_TcL3AZ-yT8iB6uv7zk8s9xIDQ=.57d7a025-d31c-4f32-a55c-c919490218e3@github.com>

On Mon, 15 Jul 2024 09:11:53 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> avoiding unnecessary deoptimization (as in this PR) is not going to help much,

What would definitively help is to somehow reduce the number of threads to handshake when calling close - e.g. have an arena that is shared but only to a *group* of thread. We can do that easily using structured concurrency. But for unstructured code there's not a lot that can be done, as there's no way for the runtime to guess which threads can access segments created by a given arena.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228046170

From uschindler at openjdk.org  Mon Jul 15 09:21:53 2024
From: uschindler at openjdk.org (Uwe Schindler)
Date: Mon, 15 Jul 2024 09:21:53 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <7_MKD2O70VqPmWUn5_TcL3AZ-yT8iB6uv7zk8s9xIDQ=.57d7a025-d31c-4f32-a55c-c919490218e3@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com>
 <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com>
 <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com>
 <LVyAn4d_sDCGkBiHz3Xyo7phPDe9zvNsHAw81TetnSw=.d18de3cb-8b55-4de2-93ce-964d30ac1dd4@github.com>
 <7_MKD2O70VqPmWUn5_TcL3AZ-yT8iB6uv7zk8s9xIDQ=.57d7a025-d31c-4f32-a55c-c919490218e3@github.com>
Message-ID: <A_-nlvRr4xrHguQhAL5cMzT7IDkySpqbAKHvHWDjfm8=.9727b308-9350-4315-8d35-8836650884e0@github.com>

On Mon, 15 Jul 2024 09:17:31 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>>> One only closing arenas, another set that consumes scoped memory and a third group doing totally unrelated stuff.
>> 
>> Exactly. My general feeling is that the cost of handshaking a thread dominates everything else, so doing improvements around e.g. avoiding unnecessary deoptimization (as in this PR) is not going to help much, even for threads doing unrelated stuff, but I'd be happy to be proven wrong.
>
>> avoiding unnecessary deoptimization (as in this PR) is not going to help much,
> 
> What would definitively help is to somehow reduce the number of threads to handshake when calling close - e.g. have an arena that is shared but only to a *group* of thread. We can do that easily using structured concurrency. But for unstructured code there's not a lot that can be done, as there's no way for the runtime to guess which threads can access segments created by a given arena.

@mcimadamore: FYI, at the moment we are working on grouping mmapped files together (by their index segment file pattern) and use the same arena for multiple index files. Because those are closed together we can use a refcounted aproach. All files of a group (the index segment name) share the same arena and this one is closed after last file in group is closed: https://github.com/apache/lucene/pull/13570

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228049242

From shipilev at amazon.de  Mon Jul 15 10:00:48 2024
From: shipilev at amazon.de (Aleksey Shipilev)
Date: Mon, 15 Jul 2024 12:00:48 +0200
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <CAMOCf+i5Eb8xFMPw_+eeSpyXcFEiXeEacLk0ZqYQBxCpHkxDxg@mail.gmail.com>
References: <mailman.15905.1720688298.324.hotspot-dev@openjdk.org>
 <CAMOCf+i5Eb8xFMPw_+eeSpyXcFEiXeEacLk0ZqYQBxCpHkxDxg@mail.gmail.com>
Message-ID: <fa562ac5-4d0d-4d15-ad4e-b03e97eef5b0@amazon.de>

Hi Hans,

On 13.07.24 02:36, Hans Boehm wrote:
> No opinion on the?merits here. But IIUC, "as memory safe as finals" is a slightly squishy notion 
> here. The downside of not having the release fence is that even with safe publication, a write to 
> an?@Stable field outside the constructor can be seen by a read in the constructor, before the object 
> is published. That's arguably weirder than final field behavior, and not something that can arise 
> with final fields. But it still only happens in the presence of data races, and thus probably?not in 
> code you should be writing anyway.

Agreed. "Hans Boehm's argument for doing release in initializers" [1] lives in my head rent-free :)

In the presence of @Stable writes outside of constructors, users of @Stable are basically on their 
own, and are responsible for proper fencing if data races are not benign. Including putting the 
release/seqcst fences in constructors if constructors read the @Stable fields back.

I think not giving into handling these corner cases by default gives us a reasonable 
performance/safety model for @Stable: https://github.com/openjdk/jdk/pull/19635#issuecomment-2222413725

-Aleksey

[1] https://www.hboehm.info/c++mm/no_write_fences.html



Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

From jvernee at openjdk.org  Mon Jul 15 10:28:52 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Mon, 15 Jul 2024 10:28:52 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <ptpaZ2nDeFW4XCq-qpHEWSCXxYQRhvvpO8Ol2Zo0fyE=.83707754-6088-465c-85bd-ea1ac96af034@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <ptpaZ2nDeFW4XCq-qpHEWSCXxYQRhvvpO8Ol2Zo0fyE=.83707754-6088-465c-85bd-ea1ac96af034@github.com>
Message-ID: <1MbFi_08NlZRB0wF-sBB_JnNzHr4DjDdOa6hkGmXjjY=.ba6c7bcb-88d9-4fb6-b817-2b2527934931@github.com>

On Mon, 15 Jul 2024 08:41:38 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

>> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   track has_scoped_access for compiled methods
>
> src/hotspot/share/jvmci/jvmciRuntime.cpp line 2186:
> 
>> 2184:         nm->set_has_wide_vectors(has_wide_vector);
>> 2185:         nm->set_has_monitors(has_monitors);
>> 2186:         nm->set_has_scoped_access(true); // conservative
> 
> What does "conservative" imply here? That is, what performance penalty will be incurred for Graal compiled code until it completely supports this "scoped access" bit?

It means we will always deoptimize a top-most frame of any thread, when closing a shared arena, and it is compiled by Graal. (This is a one-off deoptimization though. The compiled code is not thrown away). It essentially matches the current behavior before this PR.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1677613801

From fgao at openjdk.org  Mon Jul 15 10:52:53 2024
From: fgao at openjdk.org (Fei Gao)
Date: Mon, 15 Jul 2024 10:52:53 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <BDLL94Te55nHGCUuLtN6qIQynYIWdux300wtYvdxbkU=.0bdaf9f1-cb7e-403c-96f8-3b3ba69f8484@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
 <2Ln6-ZIklVFgsBWZmmyOU2G-wZmknxjsoT1xcTKSXDc=.54473598-6e15-43d1-9e5f-95c796d11066@github.com>
 <BDLL94Te55nHGCUuLtN6qIQynYIWdux300wtYvdxbkU=.0bdaf9f1-cb7e-403c-96f8-3b3ba69f8484@github.com>
Message-ID: <h9Apat3UK9nQkjKkieLkCq2ZhNNk73JI8sELbHsEHZk=.086a01f3-b0e5-4d8f-9e34-ca88084a07d8@github.com>

On Fri, 12 Jul 2024 14:34:20 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> src/hotspot/share/opto/machnode.cpp line 400:
>> 
>>> 398: 
>>> 399:   if (t->isa_intptr_t() &&
>>> 400: #if !defined(AARCH64)
>> 
>> After applying the operand "IndirectX2P", we may have some patterns like:
>> 
>> str val, [CastX2P base]
>> 
>> The code path here will resolve the `base`, which is actually a `intptr`, not a `ptr`, and the offset is `0`.
>> 
>> I guess the code here was intended to support `[base, offset]`, where base can be a `intptr` but offset can not be `0`. I'm not sure why there is such a limitation that offset can not be `0`, maybe for some old machines?
>> 
>> I don't think the limitation is applied to aarch64 machines now. So I unblock it for aarch64.
>
> I think it's the other way around. Isn't this code saying that if the address is an intptr + a nonzero offset, then the returned type is bottom, ie nothing? What effect does this change have?

Thanks for review! Yeah, this code says if the address is an `intptr` + a nonzero offset, then return `TypeRawPtr::BOTTOM`. Then it continues [the verification](https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/hotspot/share/opto/matcher.cpp#L1834). 

Without the change here, an `intptr` + a `zero` offset would fail to assert on next lines, https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/hotspot/share/opto/machnode.cpp#L409-L413

AFAIK, the return value `TypeRawPtr::BOTTOM` represents `raw access to memory` here. And an `intptr` + a `zero` offset is also a valid `raw access`, so I unblock it here. WDYT? Thanks.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1677638951

From jvernee at openjdk.org  Mon Jul 15 10:52:53 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Mon, 15 Jul 2024 10:52:53 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com>
 <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com>
 <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com>
Message-ID: <GAhBy0q0U31weyO24JHCXijauSe2bk_J3kQ07qZ5s70=.19332107-6225-49a3-a8ee-d0bf79a65873@github.com>

On Mon, 15 Jul 2024 09:02:29 GMT, Uwe Schindler <uschindler at openjdk.org> wrote:

> Of course we can do that in a separate thread (this is my idea how to improve the closes in lucene).

This is what I was thinking of as well. `close()` on a shared arena can be called by any thread, so it would be possible to have an executor service with 1-n threads that is dedicated to closing memory.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228215031

From aph at openjdk.org  Mon Jul 15 11:03:50 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 15 Jul 2024 11:03:50 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
Message-ID: <A7LqCA84i3ml2kFafMJr2_ENuyn9yW-KjBViIryuKBU=.8efd29b0-3636-4ef7-aa2c-dc92228cefc5@github.com>

On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote:

> In the cases like:
> 
>   UNSAFE.putLong(address + off1 + 1030, lseed);
>   UNSAFE.putLong(address + 1023, lseed);
>   UNSAFE.putLong(address + off2 + 1001, lseed);
> 
> 
> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like:
> 
>   ldr  R10, [R15, #120]    # int ! Field: address
>   ldr  R11, [R16, #136]    # int ! Field: off1
>   ldr  R12, [R16, #144]    # int ! Field: off2
>   add  R11, R11, R10
>   mov R11, R11    # long -> ptr
>   add  R12, R12, R10
>   mov R10, R10    # long -> ptr
>   add R11, R11, #1030    # ptr
>   str  R17, [R11]    # int
>   add R10, R10, #1023    # ptr
>   str  R17, [R10]    # int
>   mov R10, R12    # long -> ptr
>   add R10, R10, #1001    # ptr
>   str  R17, [R10]    # int
> 
> 
> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly:
> 
>   ldr    x10, [x15,#120]
>   ldp    x11, x12, [x16,#136]
>   add    x11, x11, x10
>   add    x12, x12, x10
>   add    x11, x11, #0x406
>   str    x17, [x11]
>   add    x10, x10, #0x3ff
>   str    x17, [x10]
>   mov    x10, x12  <--- extra register copy
>   add    x10, x10, #0x3e9
>   str    x17, [x10]
> 
> 
> There is still one extra register copy, which we're trying to remove in this patch.
> 
> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition.
> 
> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so
> 
> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906

Marked as reviewed by aph (Reviewer).

This will need quite a lot of testing, perhaps higher tiers and jcstress. You can test these two PRs together.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20157#pullrequestreview-2177415307
PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2228232139

From aph at openjdk.org  Mon Jul 15 11:03:51 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 15 Jul 2024 11:03:51 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <h9Apat3UK9nQkjKkieLkCq2ZhNNk73JI8sELbHsEHZk=.086a01f3-b0e5-4d8f-9e34-ca88084a07d8@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
 <2Ln6-ZIklVFgsBWZmmyOU2G-wZmknxjsoT1xcTKSXDc=.54473598-6e15-43d1-9e5f-95c796d11066@github.com>
 <BDLL94Te55nHGCUuLtN6qIQynYIWdux300wtYvdxbkU=.0bdaf9f1-cb7e-403c-96f8-3b3ba69f8484@github.com>
 <h9Apat3UK9nQkjKkieLkCq2ZhNNk73JI8sELbHsEHZk=.086a01f3-b0e5-4d8f-9e34-ca88084a07d8@github.com>
Message-ID: <573t-GxA0Cutej_7Ei1rOdp90v3tpBsEgyMXr3STUPk=.d6791101-87d2-4fab-964d-a620a2be4d24@github.com>

On Mon, 15 Jul 2024 10:50:32 GMT, Fei Gao <fgao at openjdk.org> wrote:

>> I think it's the other way around. Isn't this code saying that if the address is an intptr + a nonzero offset, then the returned type is bottom, ie nothing? What effect does this change have?
>
> Thanks for review! Yeah, this code says if the address is an `intptr` + a nonzero offset, then return `TypeRawPtr::BOTTOM`. Then it continues [the verification](https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/hotspot/share/opto/matcher.cpp#L1834). 
> 
> Without the change here, an `intptr` + a `zero` offset would fail to assert on next lines, https://github.com/openjdk/jdk/blob/a96de6d8d273d75a6500e10ed06faab9955f893b/src/hotspot/share/opto/machnode.cpp#L409-L413
> 
> AFAIK, the return value `TypeRawPtr::BOTTOM` represents `raw access to memory` here. And an `intptr` + a `zero` offset is also a valid `raw access`, so I unblock it here. WDYT? Thanks.

I learn something every day, I guess. It's been a long while since I looked, but I expected "pointer to anything" to be TOP, not BOTTOM. Thinking some more, a `TypeRawPtr` must be casted to a usable physical type in order to use it, so "pointer to nothing" makes more sense than "pointer to anything".

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1677648436

From jvernee at openjdk.org  Mon Jul 15 11:33:30 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Mon, 15 Jul 2024 11:33:30 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
Message-ID: <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>

> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
> 
> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
> 
> In this PR:
> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
> 
> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
> 
> 
> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
> ConcurrentClose.sharedAccess        1   avgt   10    52.042 ?   0.630  us/op
> ConcurrentClose.conf...

Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:

  improve benchmark

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20158/files
  - new: https://git.openjdk.org/jdk/pull/20158/files/d1266b53..6d0b9b57

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=01-02

  Stats: 28 lines in 1 file changed: 14 ins; 1 del; 13 mod
  Patch: https://git.openjdk.org/jdk/pull/20158.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158

PR: https://git.openjdk.org/jdk/pull/20158

From forax at openjdk.org  Mon Jul 15 11:33:30 2024
From: forax at openjdk.org (=?UTF-8?B?UsOpbWk=?= Forax)
Date: Mon, 15 Jul 2024 11:33:30 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <GAhBy0q0U31weyO24JHCXijauSe2bk_J3kQ07qZ5s70=.19332107-6225-49a3-a8ee-d0bf79a65873@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com>
 <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com>
 <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com>
 <GAhBy0q0U31weyO24JHCXijauSe2bk_J3kQ07qZ5s70=.19332107-6225-49a3-a8ee-d0bf79a65873@github.com>
Message-ID: <IaCxpypcq5BsE5qkCekQQfHpbvkrcU60kDW8_y8bgn8=.64109d21-8177-4477-a950-1bb5a3317eec@github.com>

On Mon, 15 Jul 2024 10:50:34 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

> This is what I was thinking of as well. close() on a shared arena can be called by any thread, so it would be possible to have an executor service with 1-n threads that is dedicated to closing memory.

This delays both the closing of the Arena and the freeing of the segments, so bugs may be not discovered if the arena is accessed in between the time the thread pool is notified and the time the close() is effectively called.

And you loose the structured part of the API, you can not use a try-with-resources anymore. I think that part can be fixed using a wrapper on top of Arena.ofShared().

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228280373

From jvernee at openjdk.org  Mon Jul 15 11:49:53 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Mon, 15 Jul 2024 11:49:53 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
Message-ID: <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com>

On Mon, 15 Jul 2024 11:33:30 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   improve benchmark

I've update the benchmark to run with 3 separate threads: 1 thread that is just creating and closing shared arenas in a loop, 1 that is accessing memory using the FFM API, and 1 that is accessing a `byte[]`.

Current:

Benchmark                                        Mode  Cnt   Score    Error  Units
ConcurrentClose.sharedClose                      avgt   10  50.093 ?  6.200  us/op
ConcurrentClose.sharedClose:closing              avgt   10  46.269 ?  0.786  us/op
ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10  98.072 ? 19.061  us/op
ConcurrentClose.sharedClose:otherAccess          avgt   10   5.938 ?  0.058  us/op


I do see a pretty big difference on the memory segment accessing thread when I remove deoptimization altogether:


Benchmark                                        Mode  Cnt   Score   Error  Units
ConcurrentClose.sharedClose                      avgt   10  22.664 ? 0.409  us/op
ConcurrentClose.sharedClose:closing              avgt   10  45.351 ? 1.554  us/op
ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10  16.671 ? 0.251  us/op
ConcurrentClose.sharedClose:otherAccess          avgt   10   5.969 ? 0.089  us/op


When I remove the `has_scoped_access()` check before the deopt, I expect the `otherAccess` thread to be affected, but the effect isn't nearly as big as with the FFM thread. I think this is likely due to the `otherAccess` benchmark being less sensitive to optimization (i.e. it already runs fairly fast in the interpreter). I also tried using `MethodHandles::arrayElementGetter` for the access, but the numbers I got were pretty much the same:


Benchmark                                        Mode  Cnt    Score   Error  Units
ConcurrentClose.sharedClose                      avgt   10   52.745 ? 1.071  us/op
ConcurrentClose.sharedClose:closing              avgt   10   46.670 ? 0.453  us/op
ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10  102.663 ? 3.430  us/op
ConcurrentClose.sharedClose:otherAccess          avgt   10    8.901 ? 0.109  us/op


I think, to really test the effect of the `has_scoped_access` check, we need to look at a more realistic scenario.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228311368

From mcimadamore at openjdk.org  Mon Jul 15 12:02:51 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 12:02:51 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com>
Message-ID: <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com>

On Mon, 15 Jul 2024 11:47:43 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

> I've update the benchmark to run with 3 separate threads: 1 thread that is just creating and closing shared arenas in a loop, 1 that is accessing memory using the FFM API, and 1 that is accessing a `byte[]`.
> 
> Current:
> 
> ```
> Benchmark                                        Mode  Cnt   Score    Error  Units
> ConcurrentClose.sharedClose                      avgt   10  50.093 ?  6.200  us/op
> ConcurrentClose.sharedClose:closing              avgt   10  46.269 ?  0.786  us/op
> ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10  98.072 ? 19.061  us/op
> ConcurrentClose.sharedClose:otherAccess          avgt   10   5.938 ?  0.058  us/op
> ```
> 
> I do see a pretty big difference on the memory segment accessing thread when I remove deoptimization altogether:
> 
> ```
> Benchmark                                        Mode  Cnt   Score   Error  Units
> ConcurrentClose.sharedClose                      avgt   10  22.664 ? 0.409  us/op
> ConcurrentClose.sharedClose:closing              avgt   10  45.351 ? 1.554  us/op
> ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10  16.671 ? 0.251  us/op
> ConcurrentClose.sharedClose:otherAccess          avgt   10   5.969 ? 0.089  us/op
> ```
> 
> When I remove the `has_scoped_access()` check before the deopt, I expect the `otherAccess` thread to be affected, but the effect isn't nearly as big as with the FFM thread. I think this is likely due to the `otherAccess` benchmark being less sensitive to optimization (i.e. it already runs fairly fast in the interpreter). I also tried using `MethodHandles::arrayElementGetter` for the access, but the numbers I got were pretty much the same:
> 
> ```
> Benchmark                                        Mode  Cnt    Score   Error  Units
> ConcurrentClose.sharedClose                      avgt   10   52.745 ? 1.071  us/op
> ConcurrentClose.sharedClose:closing              avgt   10   46.670 ? 0.453  us/op
> ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10  102.663 ? 3.430  us/op
> ConcurrentClose.sharedClose:otherAccess          avgt   10    8.901 ? 0.109  us/op
> ```
> 
> I think, to really test the effect of the `has_scoped_access` check, we need to look at a more realistic scenario.

Interesting benchmark. What is the baseline here? E.g. can we also compare against same benchmark that is using a confined arena to do the closing?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228335857

From mcimadamore at openjdk.org  Mon Jul 15 12:12:53 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 12:12:53 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com>
 <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com>
Message-ID: <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com>

On Mon, 15 Jul 2024 12:00:31 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> When I remove the `has_scoped_access()` check before the deopt, I expect the `otherAccess` thread to be affected, but the effect isn't nearly as big as with the FFM thread. I think this is likely due to the `otherAccess` benchmark being less sensitive to optimization (i.e. it already runs fairly fast in the interpreter). I also tried using `MethodHandles::arrayElementGetter` for the access, but the numbers I got were pretty much the same:

To put this into perspective, once the underlying bug with reachability fences is addressed, then we should see the numbers for this benchmark align with the ones where you removed deopt completely (as we won't deopt threads that don't have the target arena in their oopmap)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228353326

From mcimadamore at openjdk.org  Mon Jul 15 12:17:51 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 12:17:51 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com>
 <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com>
 <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com>
Message-ID: <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com>

On Mon, 15 Jul 2024 12:10:02 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> I also tried using `MethodHandles::arrayElementGetter` for the access, but the numbers I got were pretty much the same:

This is quite strange, as the code involved should be quite similar to those with memory segments (e.g. you go through a method handle pointing to some helper class). I would have said this would have provided a fairly good comparison.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228359381

From mcimadamore at openjdk.org  Mon Jul 15 12:17:52 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 12:17:52 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com>
 <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com>
 <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com>
 <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com>
Message-ID: <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com>

On Mon, 15 Jul 2024 12:13:23 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> > I also tried using `MethodHandles::arrayElementGetter` for the access, but the numbers I got were pretty much the same:
> 
> This is quite strange, as the code involved should be quite similar to those with memory segments (e.g. you go through a method handle pointing to some helper class). I would have said this would have provided a fairly good comparison.

Ah! I had  `arrayElementVarHandle` in mind - maybe you can try that?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228361950

From jvernee at openjdk.org  Mon Jul 15 12:36:53 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Mon, 15 Jul 2024 12:36:53 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
Message-ID: <pLIRTEBVE6RFNwR0N7hZ3eRrqmAPcFB3v2PkPnDQUg0=.0e7ea973-6c9b-41e0-abb0-5d975108fbc5@github.com>

On Mon, 15 Jul 2024 11:33:30 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   improve benchmark

This is the baseline if I change `closing` to use a confined arena:


Benchmark                                        Mode  Cnt   Score    Error  Units
ConcurrentClose.sharedClose                      avgt   10   8.089 ?  0.006  us/op
ConcurrentClose.sharedClose:closing              avgt   10   0.001 ?  0.001  us/op
ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10  20.046 ?  0.019  us/op
ConcurrentClose.sharedClose:otherAccess          avgt   10   4.220 ?  0.002  us/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228401517

From jvernee at openjdk.org  Mon Jul 15 12:49:56 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Mon, 15 Jul 2024 12:49:56 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com>
 <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com>
 <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com>
 <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com>
 <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com>
Message-ID: <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com>

On Mon, 15 Jul 2024 12:14:52 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> Ah! I had `arrayElementVarHandle` in mind - maybe you can try that?

Even with `arrayElementVarHandle` it's about the same

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228425705

From mcimadamore at openjdk.org  Mon Jul 15 13:01:52 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 13:01:52 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <pLIRTEBVE6RFNwR0N7hZ3eRrqmAPcFB3v2PkPnDQUg0=.0e7ea973-6c9b-41e0-abb0-5d975108fbc5@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <pLIRTEBVE6RFNwR0N7hZ3eRrqmAPcFB3v2PkPnDQUg0=.0e7ea973-6c9b-41e0-abb0-5d975108fbc5@github.com>
Message-ID: <_PDgnriMr5GoRUoTpxJnhZjIqEcjdF2kscNx94ScPlc=.b035d8ac-e218-46ed-86d9-a08368c63dc5@github.com>

On Mon, 15 Jul 2024 12:34:37 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

> This is the baseline if I change `closing` to use a confined arena:
> 
> ```
> Benchmark                                        Mode  Cnt   Score    Error  Units
> ConcurrentClose.sharedClose                      avgt   10   8.089 ?  0.006  us/op
> ConcurrentClose.sharedClose:closing              avgt   10   0.001 ?  0.001  us/op
> ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10  20.046 ?  0.019  us/op
> ConcurrentClose.sharedClose:otherAccess          avgt   10   4.220 ?  0.002  us/op
> ```

This is promising. Effectively, once all the issues surrounding reachability fences will be addressed, we should be able to achieve numbers similar to above even in the case of shared close. The only thing being slower in that case would be the closing thread itself.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228448722

From mcimadamore at openjdk.org  Mon Jul 15 13:11:52 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 13:11:52 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com>
 <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com>
 <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com>
 <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com>
 <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com>
 <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com>
Message-ID: <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com>

On Mon, 15 Jul 2024 12:47:30 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

> Even with `arrayElementVarHandle` it's about the same

This is very odd, and I don't have a good explanation as to why that is the case. What does the baseline (confined arena) look like for `arrayElementVarHandle` ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228469162

From mcimadamore at openjdk.org  Mon Jul 15 13:24:20 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 13:24:20 GMT
Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of
 JNI [v9]
In-Reply-To: <CZGbki5iFdCGPPagE-ya-_L8Nkgf7OunywApOsxC548=.2cee82fa-a5d2-4165-b012-30982a85030a@github.com>
References: <CZGbki5iFdCGPPagE-ya-_L8Nkgf7OunywApOsxC548=.2cee82fa-a5d2-4165-b012-30982a85030a@github.com>
Message-ID: <SNuYItiKx3q_g0ZUVt50iHQw67J-RmNEAP2c5M-kGeY=.bef589ad-fb18-40ff-b291-5da29d4d7451@github.com>

> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways:
> 
> * `System::load` and `System::loadLibrary` are now restricted methods
> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods
> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation
> 
> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once.
> 
> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`.
> 
> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above.

Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision:

 - Merge branch 'master' into restricted_jni
 - Address review comments
 - Add note on --illegal-native-access default value in the launcher help
 - Address review comment
 - Refine warning text for JNI method binding
 - Address review comments
   Improve warning for JNI methods, similar to what's described in JEP 472
   Beef up tests
 - Address review comments
 - Fix another typo
 - Fix typo
 - Add more comments
 - ... and 2 more: https://git.openjdk.org/jdk/compare/2ced23fe...ff51ac6a

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19213/files
  - new: https://git.openjdk.org/jdk/pull/19213/files/789bdf48..ff51ac6a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=07-08

  Stats: 168976 lines in 3271 files changed: 114666 ins; 38249 del; 16061 mod
  Patch: https://git.openjdk.org/jdk/pull/19213.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19213/head:pull/19213

PR: https://git.openjdk.org/jdk/pull/19213

From mcimadamore at openjdk.org  Mon Jul 15 13:24:20 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 13:24:20 GMT
Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of
 JNI [v8]
In-Reply-To: <vWr2PdgTv6vEllxW8820KO3aZ3tR3xvMhCxD2k7QpS0=.8bfd6c47-6c93-4fc3-aced-7079889aa6a2@github.com>
References: <CZGbki5iFdCGPPagE-ya-_L8Nkgf7OunywApOsxC548=.2cee82fa-a5d2-4165-b012-30982a85030a@github.com>
 <vWr2PdgTv6vEllxW8820KO3aZ3tR3xvMhCxD2k7QpS0=.8bfd6c47-6c93-4fc3-aced-7079889aa6a2@github.com>
Message-ID: <1f6cPvfYhyTzqeYoeA6uQi2WULB_Bq49AhF_RoEVWDQ=.9577a65e-b626-43fd-ab03-09783b978d94@github.com>

On Fri, 17 May 2024 13:38:25 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways:
>> 
>> * `System::load` and `System::loadLibrary` are now restricted methods
>> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods
>> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation
>> 
>> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once.
>> 
>> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`.
>> 
>> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above.
>
> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Address review comments

keep alive

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19213#issuecomment-2228489298

From jvernee at openjdk.org  Mon Jul 15 13:52:53 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Mon, 15 Jul 2024 13:52:53 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com>
 <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com>
 <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com>
 <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com>
 <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com>
 <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com>
 <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com>
Message-ID: <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com>

On Mon, 15 Jul 2024 13:09:21 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> > Even with `arrayElementVarHandle` it's about the same
> 
> This is very odd, and I don't have a good explanation as to why that is the case. What does the baseline (confined arena) look like for `arrayElementVarHandle` ?

Pretty much exactly the same

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228555214

From mcimadamore at openjdk.org  Mon Jul 15 14:04:52 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 14:04:52 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com>
 <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com>
 <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com>
 <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com>
 <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com>
 <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com>
 <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com>
 <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com>
Message-ID: <SDZzJPMEmQsSOaDtkC7g10HN4XPM_Q1CmdlCsAZYcKM=.465d6eca-3af2-4d39-8d33-f4f8a026834e@github.com>

On Mon, 15 Jul 2024 13:49:57 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

> > > Even with `arrayElementVarHandle` it's about the same
> > 
> > 
> > This is very odd, and I don't have a good explanation as to why that is the case. What does the baseline (confined arena) look like for `arrayElementVarHandle` ?
> 
> Pretty much exactly the same

So, that means that `arrayElementVarHandle` is ~4x faster than memory segment? Isn't that a bit odd?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228582926

From mli at openjdk.org  Mon Jul 15 14:45:54 2024
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 15 Jul 2024 14:45:54 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
 <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com>
 <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com>
Message-ID: <iaoi0o--txlXDpM7hHfpbn_wQWD9DxBlRDwXaQ8V9RQ=.e59b601f-3c14-4560-aec1-ba3bce070c01@github.com>

On Wed, 10 Jul 2024 10:48:19 GMT, Andrew Haley <aph at openjdk.org> wrote:

> I can't tell what problem we're trying to solve by not simply checking in the source code, in its preferred form, to the OpenJDK tree. Thhis has practical advantages to do with traceability and security, and in-principle reasons to do with basic Open Source practice too. On the other side, there are no disadvantages.

Do you suggest to copy the whole sleef source repo into jdk?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2228672993

From pchilanomate at openjdk.org  Mon Jul 15 14:56:58 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Mon, 15 Jul 2024 14:56:58 GMT
Subject: RFR: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom [v3]
In-Reply-To: <WuesF9Q5ft_qBS-SToSKAHFbJKj_LXZkUp-bEfmoUcQ=.a0952d22-9988-45dc-82e3-e4c0cb69e250@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
 <xcZfnPE5iPxfz9WTSkNWCamtfVSXhpg5UNojhYBsW30=.72bf8fbc-60bc-4250-9284-79b2d75150fb@github.com>
 <WuesF9Q5ft_qBS-SToSKAHFbJKj_LXZkUp-bEfmoUcQ=.a0952d22-9988-45dc-82e3-e4c0cb69e250@github.com>
Message-ID: <nJcAz6i5sFgUQ9F5r_GkBBYEKu3lAG-f5WndZYakCXA=.38a29967-99c9-497b-9c38-028128394298@github.com>

On Tue, 9 Jul 2024 16:46:02 GMT, Alan Bateman <alanb at openjdk.org> wrote:

>> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Rename test to ThreadPollOnYield.java
>
> Marked as reviewed by alanb (Reviewer).

Thanks for the reviews and comments @AlanBateman, @dholmes-ora and @dougxc!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20016#issuecomment-2228695102

From pchilanomate at openjdk.org  Mon Jul 15 14:56:59 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Mon, 15 Jul 2024 14:56:59 GMT
Subject: Integrated: 8335269: [Graal] occasional timeout in
 java/lang/StringBuffer/TestSynchronization.java with loom
In-Reply-To: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
References: <GwtD_8F0F-wOnGz2XvoM3dscT4jr32ebpmF2nD697VQ=.d31d699a-5f5a-4e2d-94a1-a240966ec7de@github.com>
Message-ID: <JtTWrVvbZ2mDPUZpS140uz5grsKiSzx8Kl9Z7LF-k1E=.20bc6ef6-b8fa-44a0-a117-c6b7e27174b1@github.com>

On Wed, 3 Jul 2024 19:54:46 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

> Please review the following simple fix. A pinned virtual thread calling Thread.yield() in a loop might never poll for safepoints if the compiler relies on a poll in native method Continuation.doYield while optimizing. This is a special native method that doesn't always poll for safepoints, and in particular it doesn't if the virtual thread is pinned due to owning monitors. Currently this scenario can be reproduced with the Graal compiler.
> 
> I included a test which reproduces the issue with Graal (couldn't reproduce the issue with c2). The test times out without the fix and passes with it. I also run the patch through mach5 tiers1-3.
> 
> Thanks,
> Patricio

This pull request has now been integrated.

Changeset: 000de306
Author:    Patricio Chilano Mateo <pchilanomate at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/000de306286bb75bbdad2f572ce6dafd4184680e
Stats:     84 lines in 2 files changed: 84 ins; 0 del; 0 mod

8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom

Reviewed-by: dholmes, alanb

-------------

PR: https://git.openjdk.org/jdk/pull/20016

From uschindler at openjdk.org  Mon Jul 15 15:08:54 2024
From: uschindler at openjdk.org (Uwe Schindler)
Date: Mon, 15 Jul 2024 15:08:54 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
Message-ID: <AMZu30w-TvEZXJCUyAOIwf1ul52k4wdFxNnJKSe-n1w=.7b4ac77f-3989-43d8-b586-ffacb4b52a5a@github.com>

On Mon, 15 Jul 2024 11:33:30 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   improve benchmark

Thins looks all promising! Together with making sure that Apache Solr and Elasticsearch/Opensearch close indexes one-by-one in a separate thread (with the PR https://github.com/apache/lucene/pull/13570 in place, too), the issues should be fixed.

What is the issue with memory fences?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228726262

From jvernee at openjdk.org  Mon Jul 15 15:24:54 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Mon, 15 Jul 2024 15:24:54 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <IaCxpypcq5BsE5qkCekQQfHpbvkrcU60kDW8_y8bgn8=.64109d21-8177-4477-a950-1bb5a3317eec@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com>
 <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com>
 <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com>
 <GAhBy0q0U31weyO24JHCXijauSe2bk_J3kQ07qZ5s70=.19332107-6225-49a3-a8ee-d0bf79a65873@github.com>
 <IaCxpypcq5BsE5qkCekQQfHpbvkrcU60kDW8_y8bgn8=.64109d21-8177-4477-a950-1bb5a3317eec@github.com>
Message-ID: <go4TYWdHjDfdqUBwKYn11NxzEsvupNIgu6CNLLvvcKg=.a18dca9d-078a-4f94-b4e0-b0a0652dcd2c@github.com>

On Mon, 15 Jul 2024 11:29:49 GMT, R?mi Forax <forax at openjdk.org> wrote:

> > This is what I was thinking of as well. close() on a shared arena can be called by any thread, so it would be possible to have an executor service with 1-n threads that is dedicated to closing memory.
> 
> This delays both the closing of the Arena and the freeing of the segments, so bugs may be not discovered if the arena is accessed in between the time the thread pool is notified and the time the close() is effectively called.

Closing the arena is what requires the handshake, which is where the majority of the cost is. I don't see the point in closing synchronously, but then freeing the memory asynchronously, since the latter is relatively cheap.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228756598

From uschindler at openjdk.org  Mon Jul 15 15:24:54 2024
From: uschindler at openjdk.org (Uwe Schindler)
Date: Mon, 15 Jul 2024 15:24:54 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v2]
In-Reply-To: <go4TYWdHjDfdqUBwKYn11NxzEsvupNIgu6CNLLvvcKg=.a18dca9d-078a-4f94-b4e0-b0a0652dcd2c@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <0j4dLtE61HH3gE0ptR-LufJuIOvKFgLJbSDAeXY3Ii4=.ced967f1-bbd1-4add-8484-88a84aabb5f3@github.com>
 <LjCucUevFLYVoUMkuwCFQVefc4XJOe4LhnKyzKgv7dc=.45bba479-3885-4c34-a9cf-d737d67cb432@github.com>
 <C1dTpyl6SMMzerAW2RzQMIKnEFfGbSR-b7Y3igJvfeQ=.be2c9b83-ac19-429c-bf4a-53075c6a4ec6@github.com>
 <kt2ziWwc2mDqXYZCUEdDxEnqAcdmjrymlUnbOZN4TJg=.b8f7f731-e8e0-43ee-8b13-5bd8ab8ef5b8@github.com>
 <GAhBy0q0U31weyO24JHCXijauSe2bk_J3kQ07qZ5s70=.19332107-6225-49a3-a8ee-d0bf79a65873@github.com>
 <IaCxpypcq5BsE5qkCekQQfHpbvkrcU60kDW8_y8bgn8=.64109d21-8177-4477-a950-1bb5a3317eec@github.com>
 <go4TYWdHjDfdqUBwKYn11NxzEsvupNIgu6CNLLvvcKg=.a18dca9d-078a-4f94-b4e0-b0a0652dcd2c@github.com>
Message-ID: <qcosrYz-ELNTFcuHX5ngZMCxrFAeNfcB1F4_VNDfMyM=.b8d19ff9-7ee9-4cff-829a-3f77706d500d@github.com>

On Mon, 15 Jul 2024 15:18:20 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

> > > This is what I was thinking of as well. close() on a shared arena can be called by any thread, so it would be possible to have an executor service with 1-n threads that is dedicated to closing memory.
> > 
> > 
> > This delays both the closing of the Arena and the freeing of the segments, so bugs may be not discovered if the arena is accessed in between the time the thread pool is notified and the time the close() is effectively called.
> 
> Closing the arena is what requires the handshake, which is where the majority of the cost is. I don't see the point in closing synchronously, but then freeing the memory asynchronously, since the latter is relatively cheap.

I think the idea is to trigger the handshake async and then close after the handshake (in a callback when hadshake finishs). This is only a problem if you for example want to delete a mmapped file on Windows. This won't work as long as the memory is mmapped, but in all other cases.

So there should be the option to allow async close() [if client supports it], but the defaulkt should be synchronized. I think this is what @forax suggested.

But anyways: Using a separate extra thread is a good idea. I proposed this for Apache Solr and Elasticsearch people are checking their code at moment.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228760782

From mcimadamore at openjdk.org  Mon Jul 15 16:34:00 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 16:34:00 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <SDZzJPMEmQsSOaDtkC7g10HN4XPM_Q1CmdlCsAZYcKM=.465d6eca-3af2-4d39-8d33-f4f8a026834e@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com>
 <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com>
 <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com>
 <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com>
 <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com>
 <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com>
 <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com>
 <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com>
 <SDZzJPMEmQsSOaDtkC7g10HN4XPM_Q1Cmdl
 CsAZYcKM=.465d6eca-3af2-4d39-8d33-f4f8a026834e@github.com>
Message-ID: <N0Q0GTZ0BpZzjFdQ5_-tR0hV1ba_uPXjiEWYVE7SerE=.ce7681b0-a3f8-4f38-8ad7-717d42773aab@github.com>

On Mon, 15 Jul 2024 14:02:27 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> So, that means that `arrayElementVarHandle` is ~4x faster than memory segment? Isn't that a bit odd?

I did some more analyis of the benchmark. I first eliminated the closing thread, and started with two simple benchmarks:


@Benchmark
public int memorySegmentAccess() {
        int sum = 0;
        for (int i = 0; i < segment.byteSize(); i++) {
            sum += segment.get(JAVA_BYTE, i);
        }
        return sum;
    }


and


@Benchmark
public int otherAccess() {
        int sum = 0;
        for (int i = 0; i < array.length; i++) {
            sum += (byte)BYTE_HANDLE.get(array, i);
        }
        return sum;
    }


where the setup code is as follows:


static final int SIZE = 10_000;

    MemorySegment segment;
    byte[] array;

    static final VarHandle BYTE_HANDLE = MethodHandles.arrayElementVarHandle(byte[].class);

    @Setup
    public void setup() {
        array = new byte[SIZE];
        segment = MemorySegment.ofArray(array);
    }


With this, I obtained the following results:


Benchmark                            Mode  Cnt   Score   Error  Units
ConcurrentClose.memorySegmentAccess  avgt   10  13.879 ? 0.478  us/op
ConcurrentClose.otherAccess          avgt   10   2.256 ? 0.017  us/op


Ugh. It seems like C2 "blows up" at the third iteration:


# Run progress: 0.00% complete, ETA 00:05:00
# Fork: 1 of 1
# Warmup Iteration   1: 6.712 us/op
# Warmup Iteration   2: 5.756 us/op
# Warmup Iteration   3: 13.267 us/op
# Warmup Iteration   4: 13.267 us/op
# Warmup Iteration   5: 13.274 us/op


This might be a bug/regression. But, let's move on. I then tweaked the induction variable of the memory segment loop to be `long`, not `int` and I got:


Benchmark                            Mode  Cnt  Score   Error  Units
ConcurrentClose.memorySegmentAccess  avgt   10  2.764 ? 0.016  us/op
ConcurrentClose.otherAccess          avgt   10  2.240 ? 0.016  us/op


Far more respectable! And now we have a good baseline, since both workloads take amount the same time, so we can use them to draw interesting comparisons. So, let's add back a thread that does a shared arena close:


Benchmark                                        Mode  Cnt   Score   Error  Units
ConcurrentClose.sharedClose                      avgt   10  12.001 ? 0.061  us/op
ConcurrentClose.sharedClose:closing              avgt   10  19.281 ? 0.323  us/op
ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10   9.802 ? 0.314  us/op
ConcurrentClose.sharedClose:otherAccess          avgt   10   6.921 ? 0.151  us/op


This is with vanilla JDK. If I apply the changes in this PR, I get this:


Benchmark                                        Mode  Cnt   Score   Error  Units
ConcurrentClose.sharedClose                      avgt   10  10.837 ? 0.241  us/op
ConcurrentClose.sharedClose:closing              avgt   10  20.337 ? 1.674  us/op
ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10   8.672 ? 0.993  us/op
ConcurrentClose.sharedClose:otherAccess          avgt   10   3.501 ? 0.162  us/op


This is good. Note how `otherAccess` improved almost 2x, as the code is no longer redundantly de-optimized. Now, we know that, even for memory segment access, we can avoid redundant deopt once JDK-8290892 is fixed. To simulate that, I've dropped the lines which apply the conservative deoptimization in `scopedMemoryAccess.cpp` and ran the bench again:


Benchmark                                        Mode  Cnt   Score   Error  Units
ConcurrentClose.sharedClose                      avgt   10   8.957 ? 0.089  us/op
ConcurrentClose.sharedClose:closing              avgt   10  18.898 ? 0.338  us/op
ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10   4.403 ? 0.054  us/op
ConcurrentClose.sharedClose:otherAccess          avgt   10   3.571 ? 0.042  us/op


Ok, now both accessor threads seem faster.

If I swap the shared arena close with a confined arena close I get this:


Benchmark                                        Mode  Cnt   Score    Error  Units
ConcurrentClose.sharedClose                      avgt   10   1.760 ?  0.008  us/op
ConcurrentClose.sharedClose:closing              avgt   10  ? 10??           us/op
ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10   2.912 ?  0.016  us/op
ConcurrentClose.sharedClose:otherAccess          avgt   10   2.367 ?  0.009  us/op


Summing up:
* there is some issue involving segment access with `int` induction variable which we should investigate separately
* this PR significantly improves performance of threads that are not touching memory segments, even under heavy shared arena close loads
* performance of unrelated memory segment access is still affected by concurrent shared arena close. This is due to conservative deoptimization which will be removed once JDK-8290892 is fixed
* when all fixes will be applied, the performance of the accessing threads gets quite close to ideal, but not 100% there. The loss seems in the acceptable range - given that this benchmark is closing shared arenas in a loop, arguably the worst possible case.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228916752

From uschindler at openjdk.org  Mon Jul 15 16:37:53 2024
From: uschindler at openjdk.org (Uwe Schindler)
Date: Mon, 15 Jul 2024 16:37:53 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <N0Q0GTZ0BpZzjFdQ5_-tR0hV1ba_uPXjiEWYVE7SerE=.ce7681b0-a3f8-4f38-8ad7-717d42773aab@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com>
 <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com>
 <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com>
 <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com>
 <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com>
 <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com>
 <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com>
 <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com>
 <SDZzJPMEmQsSOaDtkC7g10HN4XPM_Q1Cmdl
 CsAZYcKM=.465d6eca-3af2-4d39-8d33-f4f8a026834e@github.com>
 <N0Q0GTZ0BpZzjFdQ5_-tR0hV1ba_uPXjiEWYVE7SerE=.ce7681b0-a3f8-4f38-8ad7-717d42773aab@github.com>
Message-ID: <U_6YS1DDbI2NSgpRxRyPAB0chDFlwDJLDzTG3bKn4kI=.b929ff6a-0a60-4b47-abc6-dfbc12b8b8b5@github.com>

On Mon, 15 Jul 2024 16:30:11 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>>> > > Even with `arrayElementVarHandle` it's about the same
>>> > 
>>> > 
>>> > This is very odd, and I don't have a good explanation as to why that is the case. What does the baseline (confined arena) look like for `arrayElementVarHandle` ?
>>> 
>>> Pretty much exactly the same
>> 
>> So, that means that `arrayElementVarHandle` is ~4x faster than memory segment? Isn't that a bit odd?
>
>> So, that means that `arrayElementVarHandle` is ~4x faster than memory segment? Isn't that a bit odd?
> 
> I did some more analyis of the benchmark. I first eliminated the closing thread, and started with two simple benchmarks:
> 
> 
> @Benchmark
> public int memorySegmentAccess() {
>         int sum = 0;
>         for (int i = 0; i < segment.byteSize(); i++) {
>             sum += segment.get(JAVA_BYTE, i);
>         }
>         return sum;
>     }
> 
> 
> and
> 
> 
> @Benchmark
> public int otherAccess() {
>         int sum = 0;
>         for (int i = 0; i < array.length; i++) {
>             sum += (byte)BYTE_HANDLE.get(array, i);
>         }
>         return sum;
>     }
> 
> 
> where the setup code is as follows:
> 
> 
> static final int SIZE = 10_000;
> 
>     MemorySegment segment;
>     byte[] array;
> 
>     static final VarHandle BYTE_HANDLE = MethodHandles.arrayElementVarHandle(byte[].class);
> 
>     @Setup
>     public void setup() {
>         array = new byte[SIZE];
>         segment = MemorySegment.ofArray(array);
>     }
> 
> 
> With this, I obtained the following results:
> 
> 
> Benchmark                            Mode  Cnt   Score   Error  Units
> ConcurrentClose.memorySegmentAccess  avgt   10  13.879 ? 0.478  us/op
> ConcurrentClose.otherAccess          avgt   10   2.256 ? 0.017  us/op
> 
> 
> Ugh. It seems like C2 "blows up" at the third iteration:
> 
> 
> # Run progress: 0.00% complete, ETA 00:05:00
> # Fork: 1 of 1
> # Warmup Iteration   1: 6.712 us/op
> # Warmup Iteration   2: 5.756 us/op
> # Warmup Iteration   3: 13.267 us/op
> # Warmup Iteration   4: 13.267 us/op
> # Warmup Iteration   5: 13.274 us/op
> 
> 
> This might be a bug/regression. But, let's move on. I then tweaked the induction variable of the memory segment loop to be `long`, not `int` and I got:
> 
> 
> Benchmark                            Mode  Cnt  Score   Error  Units
> ConcurrentClose.memorySegmentAccess  avgt   10  2.764 ? 0.016  us/op
> ConcurrentClose.otherAccess          avgt   10  2.240 ? 0.016  us/op
> 
> 
> Far more respectable! And now we have a good baseline, since both workloads take amount the same time, so we can use them to draw interesting comparisons. So, let's add back a thread that does a shared arena close:
> 
> 
> Benchmark                                        Mode  Cnt   Score   Error  Units
> ConcurrentClose.sharedClose                      avgt   10  12.001 ? 0.061  us/op
> ConcurrentClose.sharedClose:closing              avgt   10  19.281 ? 0.323  us/op
> ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10   9.802 ? 0.314  us/op
> ConcurrentClose.sharedClose:otherAccess          avgt   1...

Thanks @mcimadamore, this sound great! I am so happy that we at least reduced the overhead for non-memory segment threads. This will also be the case for Lucene/Solr because we do not read from segments all the time, we also have other code sometimes executed between reads from memory segments :-)

So +1 to merge this and hopefully backport it at least to 21? This would be great, but as it is not a bug not strictly necessary.

We should open issues for the int problem and work on  JDK-8290892.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228926251

From mcimadamore at openjdk.org  Mon Jul 15 16:44:54 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 16:44:54 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <U_6YS1DDbI2NSgpRxRyPAB0chDFlwDJLDzTG3bKn4kI=.b929ff6a-0a60-4b47-abc6-dfbc12b8b8b5@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com>
 <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com>
 <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com>
 <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com>
 <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com>
 <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com>
 <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com>
 <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com>
 <SDZzJPMEmQsSOaDtkC7g10HN4XPM_Q1Cmdl
 CsAZYcKM=.465d6eca-3af2-4d39-8d33-f4f8a026834e@github.com>
 <N0Q0GTZ0BpZzjFdQ5_-tR0hV1ba_uPXjiEWYVE7SerE=.ce7681b0-a3f8-4f38-8ad7-717d42773aab@github.com>
 <U_6YS1DDbI2NSgpRxRyPAB0chDFlwDJLDzTG3bKn4kI=.b929ff6a-0a60-4b47-abc6-dfbc12b8b8b5@github.com>
Message-ID: <O6xuVpHQIshoOQNyhroVBRHYI_6xhIFSx_HnH6s89Zg=.a2c7cfc3-c081-4c8f-a640-36bad0dc3d03@github.com>

On Mon, 15 Jul 2024 16:35:26 GMT, Uwe Schindler <uschindler at openjdk.org> wrote:

> So +1 to merge this and hopefully backport it at least to 21?

Backport to 21 is difficult, given the handshake code there is different (and, FFM is preview there). But, might be more possible for 22. I have notified Roland re. the `int` problem, will update once I know more about the nature of this issue.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228939812

From uschindler at openjdk.org  Mon Jul 15 16:44:56 2024
From: uschindler at openjdk.org (Uwe Schindler)
Date: Mon, 15 Jul 2024 16:44:56 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <O6xuVpHQIshoOQNyhroVBRHYI_6xhIFSx_HnH6s89Zg=.a2c7cfc3-c081-4c8f-a640-36bad0dc3d03@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com>
 <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com>
 <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com>
 <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com>
 <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com>
 <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com>
 <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com>
 <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com>
 <SDZzJPMEmQsSOaDtkC7g10HN4XPM_Q1Cmdl
 CsAZYcKM=.465d6eca-3af2-4d39-8d33-f4f8a026834e@github.com>
 <N0Q0GTZ0BpZzjFdQ5_-tR0hV1ba_uPXjiEWYVE7SerE=.ce7681b0-a3f8-4f38-8ad7-717d42773aab@github.com>
 <U_6YS1DDbI2NSgpRxRyPAB0chDFlwDJLDzTG3bKn4kI=.b929ff6a-0a60-4b47-abc6-dfbc12b8b8b5@github.com>
 <O6xuVpHQIshoOQNyhroVBRHYI_6xhIFSx_HnH6s89Zg=.a2c7cfc3-c081-4c8f-a640-36bad0dc3d03@github.com>
Message-ID: <1lPwGcVzUtreIzS-ieJpTrtRyHM9PElbHN31NqFCDNI=.c44ff158-4c82-4dc3-9166-01393330db40@github.com>

On Mon, 15 Jul 2024 16:40:06 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> > So +1 to merge this and hopefully backport it at least to 21?
> 
> Backport to 21 is difficult, given the handshake code there is different (and, FFM is preview there). But, might be more possible for 22. I have notified Roland re. the `int` problem, will update once I know more about the nature of this issue.

Ah I remember: the tristate! All fine.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228943554

From forax at openjdk.org  Mon Jul 15 17:02:55 2024
From: forax at openjdk.org (=?UTF-8?B?UsOpbWk=?= Forax)
Date: Mon, 15 Jul 2024 17:02:55 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
Message-ID: <YA4PQgaaD2hop3iZs3pV6hFjQwAps0cKjCf54NBl9eA=.a24956f6-84bf-4530-aeb8-e1d77621e9cf@github.com>

On Mon, 15 Jul 2024 11:33:30 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   improve benchmark

Even if the int vs long issue is fixed for this case, i think we should recommand to call `withInvokeExactBehavior()` after creating any VarHandle so all the auto-conversions are treated as runtime errors.

This is what i do with my students (when using compareAndSet) and it makes this kind of perf issue easy to find and easy to fix.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228979913

From aph at openjdk.org  Mon Jul 15 17:02:56 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 15 Jul 2024 17:02:56 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <iaoi0o--txlXDpM7hHfpbn_wQWD9DxBlRDwXaQ8V9RQ=.e59b601f-3c14-4560-aec1-ba3bce070c01@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
 <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com>
 <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com>
 <iaoi0o--txlXDpM7hHfpbn_wQWD9DxBlRDwXaQ8V9RQ=.e59b601f-3c14-4560-aec1-ba3bce070c01@github.com>
Message-ID: <5M8k0CGVXI79Dgu5BVVkEU6sHy7Z3jLvkqyTAg7TelU=.85707058-20a5-4574-86a4-b5c6ca05b4a7@github.com>

On Mon, 15 Jul 2024 14:42:45 GMT, Hamlin Li <mli at openjdk.org> wrote:

> > I can't tell what problem we're trying to solve by not simply checking in the source code, in its preferred form, to the OpenJDK tree. Thhis has practical advantages to do with traceability and security, and in-principle reasons to do with basic Open Source practice too. On the other side, there are no disadvantages.
> 
> Do you suggest to copy the whole sleef source repo into jdk?

I think so, along with scripting that generates the preprocessed file we use. It might be the case that there are some sleef files not used at all they could be omitted, but I'm not sure it would be useful, and from a traceability point of view it's probably best to grab it all, unless it's really huge

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2228979596

From mcimadamore at openjdk.org  Mon Jul 15 17:09:53 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Mon, 15 Jul 2024 17:09:53 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <YA4PQgaaD2hop3iZs3pV6hFjQwAps0cKjCf54NBl9eA=.a24956f6-84bf-4530-aeb8-e1d77621e9cf@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <YA4PQgaaD2hop3iZs3pV6hFjQwAps0cKjCf54NBl9eA=.a24956f6-84bf-4530-aeb8-e1d77621e9cf@github.com>
Message-ID: <0a3-qVymtC5HI4wh8FQiNeebnG_-8Ax1seA80JhCQLY=.c24d9e87-b34f-45c7-aa7c-e63e294249d4@github.com>

On Mon, 15 Jul 2024 17:00:24 GMT, R?mi Forax <forax at openjdk.org> wrote:

> Even if the int vs long issue is fixed for this case, i think we should recommand to call `withInvokeExactBehavior()` after creating any VarHandle so all the auto-conversions are treated as runtime errors.
> 
> This is what i do with my students (when using compareAndSet) and it makes this kind of perf issue easy to find and easy to fix.

Note that this has nothing to do with implicit conversion, as the memory segment var handle is called by our implementation, with the correct type (a long). This is likely an issue with bound check elimination with "long loops".

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228992252

From luhenry at openjdk.org  Mon Jul 15 17:38:52 2024
From: luhenry at openjdk.org (Ludovic Henry)
Date: Mon, 15 Jul 2024 17:38:52 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <5M8k0CGVXI79Dgu5BVVkEU6sHy7Z3jLvkqyTAg7TelU=.85707058-20a5-4574-86a4-b5c6ca05b4a7@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
 <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com>
 <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com>
 <iaoi0o--txlXDpM7hHfpbn_wQWD9DxBlRDwXaQ8V9RQ=.e59b601f-3c14-4560-aec1-ba3bce070c01@github.com>
 <5M8k0CGVXI79Dgu5BVVkEU6sHy7Z3jLvkqyTAg7TelU=.85707058-20a5-4574-86a4-b5c6ca05b4a7@github.com>
Message-ID: <UwSxg6BMklnndJlZGLVLgDgvcr-VrZbeuJyyBHMFrZ0=.eaad3ac9-b520-4abd-8f74-663f70e20f6d@github.com>

On Mon, 15 Jul 2024 17:00:13 GMT, Andrew Haley <aph at openjdk.org> wrote:

> > > I can't tell what problem we're trying to solve by not simply checking in the source code, in its preferred form, to the OpenJDK tree. Thhis has practical advantages to do with traceability and security, and in-principle reasons to do with basic Open Source practice too. On the other side, there are no disadvantages.
> > 
> > 
> > Do you suggest to copy the whole sleef source repo into jdk?
> 
> I think so, along with scripting that generates the preprocessed file we use. It might be the case that there are some sleef files not used at all they could be omitted, but I'm not sure it would be useful, and from a traceability point of view it's probably best to grab it all, unless it's really huge

Given the Sleef build system currently uses cmake, we would have two choices to build the header files as part of the OpenJDK build system:
1. take a dependency on cmake in order to build the Sleef headers
2. write a custom build system for Sleef to integrate into OpenJDK

Neither approach sound good to me as a mandatory option.

However, if we are to allow the person building OpenJDK to _optionally_ generate the headers from a Sleef source checkout (provided by the user with a `--with-sleef-src=/path/to/sleef`), we can then more easily take the assumption that the user has installed the necessary dependencies. That would also be in line with how binutils is being built and integrated.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2229040615

From pchilanomate at openjdk.org  Mon Jul 15 18:36:17 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Mon, 15 Jul 2024 18:36:17 GMT
Subject: [jdk23] RFR: 8335409: Can't allocate and retain memory from resource
 area in frame::oops_interpreted_do oop closure after 8329665
Message-ID: <d-ziRnu1_RcjgWDVhYQYb4U0xIWyi5B-hljLzDwQlt4=.a53602c1-25b7-4c93-b468-d55201959846@github.com>

Hi all,

This pull request contains a backport of commit [7ab96c74](https://github.com/openjdk/jdk/commit/7ab96c74e2c39f430a5c2f65a981da7314a2385b) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository.

The commit being backported was authored by Patricio Chilano Mateo on 10 Jul 2024 and was reviewed by David Holmes, Thomas Stuefe, Coleen Phillimore and Aleksey Shipilev.

Thanks

-------------

Commit messages:
 - Backport 7ab96c74e2c39f430a5c2f65a981da7314a2385b

Changes: https://git.openjdk.org/jdk/pull/20185/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20185&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335409
  Stats: 55 lines in 3 files changed: 6 ins; 20 del; 29 mod
  Patch: https://git.openjdk.org/jdk/pull/20185.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20185/head:pull/20185

PR: https://git.openjdk.org/jdk/pull/20185

From shade at openjdk.org  Mon Jul 15 18:36:17 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 15 Jul 2024 18:36:17 GMT
Subject: [jdk23] RFR: 8335409: Can't allocate and retain memory from
 resource area in frame::oops_interpreted_do oop closure after 8329665
In-Reply-To: <d-ziRnu1_RcjgWDVhYQYb4U0xIWyi5B-hljLzDwQlt4=.a53602c1-25b7-4c93-b468-d55201959846@github.com>
References: <d-ziRnu1_RcjgWDVhYQYb4U0xIWyi5B-hljLzDwQlt4=.a53602c1-25b7-4c93-b468-d55201959846@github.com>
Message-ID: <g51dGtyAfFu3y_uVGE7KBXcGPIkb1KICznIibbbDoLs=.202e218b-83ca-4b29-840c-0ae03949620a@github.com>

On Mon, 15 Jul 2024 18:13:53 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

> Hi all,
> 
> This pull request contains a backport of commit [7ab96c74](https://github.com/openjdk/jdk/commit/7ab96c74e2c39f430a5c2f65a981da7314a2385b) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository.
> 
> The commit being backported was authored by Patricio Chilano Mateo on 10 Jul 2024 and was reviewed by David Holmes, Thomas Stuefe, Coleen Phillimore and Aleksey Shipilev.
> 
> Thanks

Marked as reviewed by shade (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20185#pullrequestreview-2178383297

From szaldana at openjdk.org  Mon Jul 15 19:41:14 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Mon, 15 Jul 2024 19:41:14 GMT
Subject: RFR: 8300732: Whitebox functions for Metaspace test should use
 byte size [v2]
In-Reply-To: <7kTS7aOEGu5r0uCYvKrIb7nvf1-MBkuCngFWHxNzj2E=.1d2e2913-d442-429f-afc1-0732171cb514@github.com>
References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com>
 <7kTS7aOEGu5r0uCYvKrIb7nvf1-MBkuCngFWHxNzj2E=.1d2e2913-d442-429f-afc1-0732171cb514@github.com>
Message-ID: <54XZwy3Z2ZIeHVMruRRbvsHd750jRJT7zvj-HVkojbM=.9d6b1185-2c9d-472c-aede-97a595d53ca0@github.com>

On Thu, 11 Jul 2024 07:36:06 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> It looks cautiously okay. Small nits remain.
> 
> Please make sure the tests pass for both 64-bit and 32-bit (to test 32-build, simplest way is to build on a x64 linux as normal, but to specify --with-target-bits=32 when configuring).

I made some updates based on feedback. Apologies for the delay - I was figuring out how to verify the 32-bit build.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20039#issuecomment-2229245477

From szaldana at openjdk.org  Mon Jul 15 19:41:13 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Mon, 15 Jul 2024 19:41:13 GMT
Subject: RFR: 8300732: Whitebox functions for Metaspace test should use
 byte size [v2]
In-Reply-To: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com>
References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com>
Message-ID: <it3fVJfbutmJVOuZy7XTjmzgHXPMC8TJHAGI1gxCKjs=.3f8c72d6-97c1-408d-8e27-20ea940d7f89@github.com>

> Hi all, 
> 
> This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. 
> 
> Testing: 
> - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. 
> 
> Thanks, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with two additional commits since the last revision:

 - Hard coding values and adding Unit class
 - whitebox changes based on feedback. Using is_aligned and asserts

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20039/files
  - new: https://git.openjdk.org/jdk/pull/20039/files/5dcc6c9e..7c0138ca

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20039&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20039&range=00-01

  Stats: 83 lines in 6 files changed: 42 ins; 19 del; 22 mod
  Patch: https://git.openjdk.org/jdk/pull/20039.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20039/head:pull/20039

PR: https://git.openjdk.org/jdk/pull/20039

From mikael at openjdk.org  Mon Jul 15 20:42:53 2024
From: mikael at openjdk.org (Mikael Vidstedt)
Date: Mon, 15 Jul 2024 20:42:53 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
Message-ID: <Ebg73qVwNWOzw_TEZ37GgRIeV2AIIWfsSa6EExexRtk=.43e71880-42c8-4d29-b333-76b91563d428@github.com>

On Tue, 9 Jul 2024 12:08:50 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294).
>> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers.
>> 
>> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time.
>> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
>> 
>> Besides of the code changes, one important task is to handle the legal process.
>> 
>> Thanks!
>> 
>> ## Test
>> tests:
>> * test/jdk/jdk/incubator/vector/
>> * test/hotspot/jtreg/compiler/vectorapi/
>> 
>> options:
>> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:+EnableVectorSupport -XX:-UseVectorStubs
>> 
>> ## Performance
>> 
>> ### Options
>> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
>> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
>> 
>> ### Float
>> data
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0)
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408
>> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876
>> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936
>> Float128Vector.ATAN...
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   skip TANH

If we want the traceability (which I agree is good) of the SLEEF source code but want to avoid having it in the jdk repo itself (adding unnecessary "bloat" for everybody), perhaps we can consider having it in a separate repository somewhere in/under `openjdk`?

It's not immediately clear to me that we need to have support in the JDK build system (configure/make) itself for building/updating the header files, as long as there's a simple, documented way of doing so. I like to think the `createSleef.sh` script is that, but I recognize that I'm biased because I wrote it.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2229380066

From mli at openjdk.org  Mon Jul 15 21:00:54 2024
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 15 Jul 2024 21:00:54 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <UwSxg6BMklnndJlZGLVLgDgvcr-VrZbeuJyyBHMFrZ0=.eaad3ac9-b520-4abd-8f74-663f70e20f6d@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
 <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com>
 <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com>
 <iaoi0o--txlXDpM7hHfpbn_wQWD9DxBlRDwXaQ8V9RQ=.e59b601f-3c14-4560-aec1-ba3bce070c01@github.com>
 <5M8k0CGVXI79Dgu5BVVkEU6sHy7Z3jLvkqyTAg7TelU=.85707058-20a5-4574-86a4-b5c6ca05b4a7@github.com>
 <UwSxg6BMklnndJlZGLVLgDgvcr-VrZbeuJyyBHMFrZ0=.eaad3ac9-b520-4abd-8f74-663f70e20f6d@github.com>
Message-ID: <Sbebz83QoFGDL33tqAZROLgJsrJCaH05-ic6q8B9Q_Q=.892d3a14-e0e8-4f31-8068-bda6c5891880@github.com>

On Mon, 15 Jul 2024 17:35:59 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:

> I think so, along with scripting that generates the preprocessed file we use. It might be the case that there are some sleef files not used at all they could be omitted, but I'm not sure it would be useful, and from a traceability point of view it's probably best to grab it all, unless it's really huge

Currently, 
* in https://github.com/openjdk/jdk/pull/19185 it generates the sleef inline headers from sleef 3.6.1, which is tagged in sleef repo. 
* And with the script in https://github.com/openjdk/jdk/pull/19185, anyone with access to sleef repo can re-generate these inline headers by himself( in fact anyone can generate the inline headers from sleef from scratch without using scripts in https://github.com/openjdk/jdk/pull/19185, our script just make it easy for the future maintenance), so it's easy for anyone to verify these inline header files used in jdk.

With these 2 points, seems the traceability is fine to me, please kindly point out if I missed some points. Maybe we can add some more clear and specific information in README or createSleef.sh in https://github.com/openjdk/jdk/pull/19185 to indicate which version of sleef source we're using in jdk.


I'm also fine with your suggestion to add whole sleef repo into jdk (maybe we can remove some of files, but we can ignore the difference temporarily in the dicussion here). To copy the sleef repo into jdk, we still need to pre-generate the inline header files, and check them in jdk along with the sleef repo, I think you also think so too (As without checking in these inline headers, we will have to bring some extra dependencies into jdk, and increase extra compilation time when building jdk). But from traceability point of view, seems to me it does not bring extra benefit than current https://github.com/openjdk/jdk/pull/19185. If someone want to verify the pre-generate inline headers in jdk, he still need to verify the sleef source in jdk, then the pre-generated sleef inline headers.

How do you think about it?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2229421715

From mikael at openjdk.org  Mon Jul 15 21:19:55 2024
From: mikael at openjdk.org (Mikael Vidstedt)
Date: Mon, 15 Jul 2024 21:19:55 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
Message-ID: <uWeKJ7D4_DgMnBgU2o4KzvVU6lB4xVBds2-SAAPEthU=.cfa1b966-9791-4773-9f10-cb35f58871f0@github.com>

On Tue, 9 Jul 2024 12:08:50 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294).
>> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers.
>> 
>> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time.
>> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
>> 
>> Besides of the code changes, one important task is to handle the legal process.
>> 
>> Thanks!
>> 
>> ## Test
>> tests:
>> * test/jdk/jdk/incubator/vector/
>> * test/hotspot/jtreg/compiler/vectorapi/
>> 
>> options:
>> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:+EnableVectorSupport -XX:-UseVectorStubs
>> 
>> ## Performance
>> 
>> ### Options
>> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
>> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
>> 
>> ### Float
>> data
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0)
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408
>> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876
>> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936
>> Float128Vector.ATAN...
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   skip TANH

I think the key question is whether we're comfortable relying on/pointing at an external repository which may or may not be there tomorrow and/or where tags may change outside of our control.

The SLEEF source code looks to be around 7.5MB, give or take. That's not enormous, but it's not exactly small when keeping in mind that if we `#include` it in the jdk repo it's going to be there for every cloned repo in every project/branch and very few will actually care about it. I agree that we'd still have to include the pre-generated header files.

Hence my suggestion to consider putting it under our control, but in a separate `openjdk` controlled repository.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2229457499

From psandoz at openjdk.org  Mon Jul 15 23:31:54 2024
From: psandoz at openjdk.org (Paul Sandoz)
Date: Mon, 15 Jul 2024 23:31:54 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
Message-ID: <YFP94FW91LrpdTMeak-ePVmpwlW788IBynq_qBZVves=.a6acb940-78b0-4fce-826a-fb065d8a41f6@github.com>

On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> See bug for more discussion.
> 
> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
> 
> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
> 
> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
> 
> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
> 
> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
> 
> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
> 
> Additional testing:
>  - [x] New IR tests
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, jcstre...

IIUC this means we can remove the explicit fence here:

    public ConstantCallSite(MethodHandle target) {
        super(target);
        isFrozen = true;
        UNSAFE.storeStoreFence(); // properly publish isFrozen update
    }

?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2229615130

From liach at openjdk.org  Tue Jul 16 03:07:58 2024
From: liach at openjdk.org (Chen Liang)
Date: Tue, 16 Jul 2024 03:07:58 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <YFP94FW91LrpdTMeak-ePVmpwlW788IBynq_qBZVves=.a6acb940-78b0-4fce-826a-fb065d8a41f6@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
 <YFP94FW91LrpdTMeak-ePVmpwlW788IBynq_qBZVves=.a6acb940-78b0-4fce-826a-fb065d8a41f6@github.com>
Message-ID: <jkFv8H1REe7218LdmB3Bwa5k0r7Aj_fWqO_hd6VT3IE=.4ee32eac-ce76-4bdf-938c-26672366cd83@github.com>

On Mon, 15 Jul 2024 23:29:37 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

> IIUC this means we can remove the explicit fence here

`ConstantCallSite` is non-sealed, and we probably wish to read `isFrozen == true` when we can read anything initialized by the subclasses, especially if a malicious subclass leaks itself into some multithreaded environment before quitting the constructor.

That said, I think we can change this to a StoreStore or Release:
https://github.com/openjdk/jdk/blob/8feabc849ba2f617c8c6dbb2ec5074297beb6437/src/java.base/share/classes/java/lang/invoke/MutableCallSite.java#L277

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2229913291

From liach at openjdk.org  Tue Jul 16 03:50:17 2024
From: liach at openjdk.org (Chen Liang)
Date: Tue, 16 Jul 2024 03:50:17 GMT
Subject: RFR: 8336275: Move common Method and Constructor fields to Executable
Message-ID: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com>

Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable.

-------------

Commit messages:
 - Inline some common ctor + method fields to executable

Changes: https://git.openjdk.org/jdk/pull/20188/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20188&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8336275
  Stats: 451 lines in 11 files changed: 77 ins; 238 del; 136 mod
  Patch: https://git.openjdk.org/jdk/pull/20188.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20188/head:pull/20188

PR: https://git.openjdk.org/jdk/pull/20188

From dholmes at openjdk.org  Tue Jul 16 04:58:52 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 16 Jul 2024 04:58:52 GMT
Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers
 [v2]
In-Reply-To: <5Xt9rNCHwYnwvFMglf_Yp5ZzwKEDNrmRecR_NrFLGMA=.7aa1fef1-a977-4244-ad24-df9897bb2743@github.com>
References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
 <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com>
 <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com>
 <0j_XZ2e84ADGz8jxk21pFyF0QNhubV0i7sVi5sxnSyg=.7281e6d1-bf24-49f1-96a6-8284c4c9f90d@github.com>
 <G5EBaq25gdUcR-5HHsF3Bg8vvpXImOqwnKZbIht8LMI=.07dd543a-1442-495b-97cd-c2bffe268949@github.com>
 <5Xt9rNCHwYnwvFMglf_Yp5ZzwKEDNrmRecR_NrFLGMA=.7aa1fef1-a977-4244-ad24-df9897bb2743@github.com>
Message-ID: <JbfdSfSkvw0v3-W6vH1_jeilVN47W1vtRGKCCLuBI-Q=.37ffdc4f-95e7-4b2d-b7a1-89895bb081d4@github.com>

On Mon, 15 Jul 2024 09:15:02 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> Okay, such a change in behaviour was unexpected for a "cleanup" PR. I'm looking into it now. Perhaps @mlchung can comment?
>
> Yeah, this is not really a cleanup (behaviors stay the same) change. For this particular hunk, keeping the old behavior seems to be unnecessary work. Note that we are also changing the behavior in C2: both in `do_exits` we no longer emit the barriers for `static final` stores in `clinits`, plus EA does not care about `clinits` anymore as well. Those are also behavioral changes.
> 
> If you prefer, I can turn this PR into a behaviorally similar cleanup, and do the behavior changes separately.

I certainly would not want those C2 changes to hidden behind what looks like a cleanup on the surface, so please do separate things out.

BTW Mandy is away for a while so we can't get her input on the original intent here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1678756832

From aph at openjdk.org  Tue Jul 16 07:50:55 2024
From: aph at openjdk.org (Andrew Haley)
Date: Tue, 16 Jul 2024 07:50:55 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <UwSxg6BMklnndJlZGLVLgDgvcr-VrZbeuJyyBHMFrZ0=.eaad3ac9-b520-4abd-8f74-663f70e20f6d@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
 <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com>
 <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com>
 <iaoi0o--txlXDpM7hHfpbn_wQWD9DxBlRDwXaQ8V9RQ=.e59b601f-3c14-4560-aec1-ba3bce070c01@github.com>
 <5M8k0CGVXI79Dgu5BVVkEU6sHy7Z3jLvkqyTAg7TelU=.85707058-20a5-4574-86a4-b5c6ca05b4a7@github.com>
 <UwSxg6BMklnndJlZGLVLgDgvcr-VrZbeuJyyBHMFrZ0=.eaad3ac9-b520-4abd-8f74-663f70e20f6d@github.com>
Message-ID: <2YOVweTWkX1_HY8VRJktqfgY9gMsEqfEpov0qdhpTQM=.5472511d-7d82-4f32-98be-d998e2fee617@github.com>

On Mon, 15 Jul 2024 17:35:59 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:

> Given the Sleef build system currently uses cmake, we would have two choices to build the header files as part of the OpenJDK build system

I don't think that anyone is proposing to do that, so we can discount it altogether.

> However, if we are to allow the person building OpenJDK to _optionally_ generate the headers from a Sleef source checkout (provided by the user with a `--with-sleef-src=/path/to/sleef`), we can then more easily take the assumption that the user has installed the necessary dependencies. That would also be in line with how binutils is being built and integrated.

Mmm, but we don't need to do that.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230245021

From aph at openjdk.org  Tue Jul 16 08:23:57 2024
From: aph at openjdk.org (Andrew Haley)
Date: Tue, 16 Jul 2024 08:23:57 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <uWeKJ7D4_DgMnBgU2o4KzvVU6lB4xVBds2-SAAPEthU=.cfa1b966-9791-4773-9f10-cb35f58871f0@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
 <uWeKJ7D4_DgMnBgU2o4KzvVU6lB4xVBds2-SAAPEthU=.cfa1b966-9791-4773-9f10-cb35f58871f0@github.com>
Message-ID: <brYLslKpd_pgvj7HsK527bi1vXS-7FuzMBusHLpZ25I=.e205b93e-ac3b-4213-bda1-bab72946f206@github.com>

On Mon, 15 Jul 2024 21:17:03 GMT, Mikael Vidstedt <mikael at openjdk.org> wrote:

> I think the key question is whether we're comfortable relying on/pointing at an external repository which may or may not be there tomorrow and/or where tags may change outside of our control.

Right. We should adopt best practice, both from an Open Source compliance point of view and (from a security, traceability, and binary reproduceability point of view) with regard to the xz backdoor hack.

> The SLEEF source code looks to be around 7.5MB, give or take. That's not enormous, but it's not exactly small when keeping in mind that if we `#include` it in the jdk repo it's going to be there for every cloned repo in every project/branch and very few will actually care about it. I agree that we'd still have to include the pre-generated header files.
> 
> Hence my suggestion to consider putting it under our control, but in a separate `openjdk` controlled repository.

That ticks many of the boxes, as long as we can be sure to tag everything. But from a space point of view I'm not sure it's compelling. After all, we've recently decided to use branches rather than separate repos for releases, which is a good idea because it keeps everything together, but it does increase the repo size for everyone.

It would be very nice if Git allowed a subset of the repo to be checked out, but as far as I can see it doesn't.

Before checkout, the OpenJDK repo is 1.4G. After checkout that's 2.1G. So, about 0.7G of that is the JDK source code, if you include the file system overhead.

7.5Mb doesn't sound excessive when you consider that SLEEF potentially provides vectorized routines for many OpenJDK targets. It's not just about AArch64.

This is starting to sound like we need a policy decision, because we don't want to re-hash this discussion every time the question comes up, as it surely will.  For me, that supplying preprocessed source code without real source is known bad practice, even to the extent of being expressly forbidden in the open source definition, is a slam-dunk argument. But clearly that argument doesn't work for everyone.  Maybe something to be discussed at the workshop?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230304980

From aph at openjdk.org  Tue Jul 16 08:37:58 2024
From: aph at openjdk.org (Andrew Haley)
Date: Tue, 16 Jul 2024 08:37:58 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
Message-ID: <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com>

On Tue, 9 Jul 2024 12:08:50 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294).
>> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers.
>> 
>> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time.
>> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
>> 
>> Besides of the code changes, one important task is to handle the legal process.
>> 
>> Thanks!
>> 
>> ## Test
>> tests:
>> * test/jdk/jdk/incubator/vector/
>> * test/hotspot/jtreg/compiler/vectorapi/
>> 
>> options:
>> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:+EnableVectorSupport -XX:-UseVectorStubs
>> 
>> ## Performance
>> 
>> ### Options
>> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
>> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
>> 
>> ### Float
>> data
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0)
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408
>> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876
>> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936
>> Float128Vector.ATAN...
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   skip TANH

> Currently,
> 
>     * in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185) it generates the sleef inline headers from sleef 3.6.1, which is tagged in sleef repo.
> 
>     * And with the script in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), anyone with access to sleef repo can re-generate these inline headers by himself

Right, but think about package builders. This isn't about J Random Hacker doing it by hand.

When a package gets built, the builder machine unpacks source code. If SLEEF is included as part of JDK source, all the builder has to do is run the script and overwrite whatever preprocessed source is in there. The alternative is packaging the SLEEF source code tarball separately in the OpenJDK source package. Sure, all of this can be done, but it's a question of whether we do it once, here, now, or all the downstream builders have to do it themselves.

> ( in fact anyone can generate the inline headers from sleef from scratch without using scripts in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), our script just make it easy for the future maintenance), so it's easy for anyone to verify these inline header files used in jdk.

That script must be checked in to the OpenJDK tree.

> With these 2 points, seems the traceability is fine to me, please kindly point out if I missed some points. Maybe we can add some more clear and specific information in README or createSleef.sh in #19185 to indicate which version of sleef source we're using in jdk.
> 
> I'm also fine with your suggestion to add whole sleef repo into jdk (maybe we can remove some of files, but we can ignore the difference temporarily in the dicussion here). To copy the sleef repo into jdk, we still need to pre-generate the inline header files, and check them in jdk along with the sleef repo, I think you also think so too

Yes.

> (As without checking in these inline headers, we will have to bring some extra dependencies into jdk, and increase extra compilation time when building jdk). But from traceability point of view, seems to me it does not bring extra benefit than current #19185. For example, if someone want to verify the pre-generate inline headers in jdk, he need to first verify the sleef source in jdk, then the pre-generated sleef inline headers.

You don't need to verify the pre-generated inline headers, just overwrite them. The point is that the sleef source is digitally signed, not just by the SLEEF maintainers, _but by OpenJDK as well._ This is not a small thing.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230332083

From mli at openjdk.org  Tue Jul 16 09:40:01 2024
From: mli at openjdk.org (Hamlin Li)
Date: Tue, 16 Jul 2024 09:40:01 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
 <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com>
Message-ID: <6LI53-1gh5fncS7RCdJvtGKUjiFtEj3v0quJmzZUbNw=.49250609-57b1-4901-a7ad-8323771f94c7@github.com>

On Tue, 16 Jul 2024 08:35:25 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   skip TANH
>
>> Currently,
>> 
>>     * in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185) it generates the sleef inline headers from sleef 3.6.1, which is tagged in sleef repo.
>> 
>>     * And with the script in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), anyone with access to sleef repo can re-generate these inline headers by himself
> 
> Right, but think about package builders. This isn't about J Random Hacker doing it by hand.
> 
> When a package gets built, the builder machine unpacks source code. If SLEEF is included as part of JDK source, all the builder has to do is run the script and overwrite whatever preprocessed source is in there. The alternative is packaging the SLEEF source code tarball separately in the OpenJDK source package. Sure, all of this can be done, but it's a question of whether we do it once, here, now, or all the downstream builders have to do it themselves.
> 
>> ( in fact anyone can generate the inline headers from sleef from scratch without using scripts in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), our script just make it easy for the future maintenance), so it's easy for anyone to verify these inline header files used in jdk.
> 
> That script must be checked in to the OpenJDK tree.
> 
>> With these 2 points, seems the traceability is fine to me, please kindly point out if I missed some points. Maybe we can add some more clear and specific information in README or createSleef.sh in #19185 to indicate which version of sleef source we're using in jdk.
>> 
>> I'm also fine with your suggestion to add whole sleef repo into jdk (maybe we can remove some of files, but we can ignore the difference temporarily in the dicussion here). To copy the sleef repo into jdk, we still need to pre-generate the inline header files, and check them in jdk along with the sleef repo, I think you also think so too
> 
> Yes.
> 
>> (As without checking in these inline headers, we will have to bring some extra dependencies into jdk, and increase extra compilation time when building jdk). But from traceability point of view, seems to me it does not bring extra benefit than current #19185. For example, if someone want to verify the pre-generate inline headers in jdk, he need to first verify the sleef source in jdk, then the pre-generated sleef inline headers.
> 
> You don't need to verify the pre-generated inline headers, just overwrite them. The point is that the sleef source is di...

@theRealAph Thanks for clarification.

I think there are several different parts involved in the above discussion, please kindly correct me if I misunderstood.
1. package builders. This is about the release of jdk (both src and binary), by either openjdk, adoptium, or any other downstream vendors.
2. jdk daily development. This is about to modify, build, run/test jdk daily by jdk developers.

For the package builders, original sleef source is necessary; for the jdk daily development, only pre-generated sleef inline headers are necessary. The script to pre-generate sleef inline headers is only triggerred by package builders (and I think it involves some scripts which are not part of jdk source ? e.g. the script to trigger pre-generating script), but for jdk daily development, we just need pre-generated sleef inline headers.
Am I understanding correctly above?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230456463

From aph at openjdk.org  Tue Jul 16 09:51:57 2024
From: aph at openjdk.org (Andrew Haley)
Date: Tue, 16 Jul 2024 09:51:57 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
 <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com>
Message-ID: <m3SuSFaXSrlS3hEl2vwD43JqUZFg0CbgXozRVliTa-Q=.26512d19-ecf9-4de5-9106-27794407c61d@github.com>

On Tue, 16 Jul 2024 08:35:25 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   skip TANH
>
>> Currently,
>> 
>>     * in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185) it generates the sleef inline headers from sleef 3.6.1, which is tagged in sleef repo.
>> 
>>     * And with the script in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), anyone with access to sleef repo can re-generate these inline headers by himself
> 
> Right, but think about package builders. This isn't about J Random Hacker doing it by hand.
> 
> When a package gets built, the builder machine unpacks source code. If SLEEF is included as part of JDK source, all the builder has to do is run the script and overwrite whatever preprocessed source is in there. The alternative is packaging the SLEEF source code tarball separately in the OpenJDK source package. Sure, all of this can be done, but it's a question of whether we do it once, here, now, or all the downstream builders have to do it themselves.
> 
>> ( in fact anyone can generate the inline headers from sleef from scratch without using scripts in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), our script just make it easy for the future maintenance), so it's easy for anyone to verify these inline header files used in jdk.
> 
> That script must be checked in to the OpenJDK tree.
> 
>> With these 2 points, seems the traceability is fine to me, please kindly point out if I missed some points. Maybe we can add some more clear and specific information in README or createSleef.sh in #19185 to indicate which version of sleef source we're using in jdk.
>> 
>> I'm also fine with your suggestion to add whole sleef repo into jdk (maybe we can remove some of files, but we can ignore the difference temporarily in the dicussion here). To copy the sleef repo into jdk, we still need to pre-generate the inline header files, and check them in jdk along with the sleef repo, I think you also think so too
> 
> Yes.
> 
>> (As without checking in these inline headers, we will have to bring some extra dependencies into jdk, and increase extra compilation time when building jdk). But from traceability point of view, seems to me it does not bring extra benefit than current #19185. For example, if someone want to verify the pre-generate inline headers in jdk, he need to first verify the sleef source in jdk, then the pre-generated sleef inline headers.
> 
> You don't need to verify the pre-generated inline headers, just overwrite them. The point is that the sleef source is di...

> @theRealAph Thanks for clarification.
> 
> I think there are several different parts involved in the above discussion, please kindly correct me if I misunderstood.
> 
>     1. package builders. This is about the release of jdk (both src and binary), by either openjdk, adoptium, or any other downstream vendors.
> 
>     2. jdk daily development. This is about to modify, build, run/test jdk daily by jdk developers.
> 
> For the package builders, original sleef source is 

may be

> necessary; for the jdk daily development, only pre-generated sleef inline headers are necessary.

Yes, most of the time. Some devs will want to be more thorough.

> The script to pre-generate sleef inline headers is only triggerred by package builders (and I think it involves some scripts which are not part of jdk source ? e.g. the script to trigger pre-generating script),

No: all of the scripts to generate the preprocessed source from the SLEEF source must in the OpenJDK source.

> but for jdk daily development, we just need pre-generated sleef inline headers. Am I understanding correctly above?

Yes, most of the time.

Bear in mind that convenient daily development of OpenJDK is important, because we don't want to discourage developers. But we've never treated the size of the repo as one of our primary considerations.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230478845

From shade at openjdk.org  Tue Jul 16 09:56:57 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Tue, 16 Jul 2024 09:56:57 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <YFP94FW91LrpdTMeak-ePVmpwlW788IBynq_qBZVves=.a6acb940-78b0-4fce-826a-fb065d8a41f6@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
 <YFP94FW91LrpdTMeak-ePVmpwlW788IBynq_qBZVves=.a6acb940-78b0-4fce-826a-fb065d8a41f6@github.com>
Message-ID: <xi5HafA-_m5iDGFCKkFvkqUtOe9vzbxc1Ix6m9EyPVU=.ac3930b3-7aa0-493e-9331-7706f2224f6a@github.com>

On Mon, 15 Jul 2024 23:29:37 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

> IIUC this means we can remove the explicit fence here:
> 
> ```
>     public ConstantCallSite(MethodHandle target) {
>         super(target);
>         isFrozen = true;
>         UNSAFE.storeStoreFence(); // properly publish isFrozen update
>     }
> ```

I think so, but there is more to it: there are other fences around the `CallSite`-s that might be simplified. I would prefer not to do it any of usage changes in this PR. Separately, I tried to benchmark `new ConstantCallSite(MH)` just to see if these barriers are merged, and quickly realized there is a bunch of `MethodHandleNatives$CallSiteContext` with `Cleaners` get instantiated for every `CCS` created, which completely dominates any wins we get from removing this fence.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2230489134

From shade at openjdk.org  Tue Jul 16 10:21:51 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Tue, 16 Jul 2024 10:21:51 GMT
Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers
 [v2]
In-Reply-To: <JbfdSfSkvw0v3-W6vH1_jeilVN47W1vtRGKCCLuBI-Q=.37ffdc4f-95e7-4b2d-b7a1-89895bb081d4@github.com>
References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
 <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com>
 <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com>
 <0j_XZ2e84ADGz8jxk21pFyF0QNhubV0i7sVi5sxnSyg=.7281e6d1-bf24-49f1-96a6-8284c4c9f90d@github.com>
 <G5EBaq25gdUcR-5HHsF3Bg8vvpXImOqwnKZbIht8LMI=.07dd543a-1442-495b-97cd-c2bffe268949@github.com>
 <5Xt9rNCHwYnwvFMglf_Yp5ZzwKEDNrmRecR_NrFLGMA=.7aa1fef1-a977-4244-ad24-df9897bb2743@github.com>
 <JbfdSfSkvw0v3-W6vH1_jeilVN47W1vtRGKCCLuBI-Q=.37ffdc4f-95e7-4b2d-b7a1-89895bb081d4@github.com>
Message-ID: <igvhqHUrjhPCzZuCcU3N27827N5KRF3m6N8PuqXJdAc=.5b174ff0-0044-4618-af75-5cbbf3021f7d@github.com>

On Tue, 16 Jul 2024 04:55:52 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Yeah, this is not really a cleanup (behaviors stay the same) change. For this particular hunk, keeping the old behavior seems to be unnecessary work. Note that we are also changing the behavior in C2: both in `do_exits` we no longer emit the barriers for `static final` stores in `clinits`, plus EA does not care about `clinits` anymore as well. Those are also behavioral changes.
>> 
>> If you prefer, I can turn this PR into a behaviorally similar cleanup, and do the behavior changes separately.
>
> I certainly would not want those C2 changes to hidden behind what looks like a cleanup on the surface, so please do separate things out.
> 
> BTW Mandy is away for a while so we can't get her input on the original intent here.

Fine by me. I am splitting out C2 parts here:
https://bugs.openjdk.org/browse/JDK-8336465
https://bugs.openjdk.org/browse/JDK-8336466

I'll probably fork `get_flags` change as a separate bug as well. 

This PR would then be only the final non-behavioral cleanups that would eliminate the remnant uses of `is_initializer`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1679144326

From adinn at openjdk.org  Tue Jul 16 10:33:53 2024
From: adinn at openjdk.org (Andrew Dinn)
Date: Tue, 16 Jul 2024 10:33:53 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
Message-ID: <5jNrh7ZNo8EcvpWN0AFr73R-TpkZif8iRxM8zzgz458=.a408857e-1642-4a49-98a1-fc6322697115@github.com>

On Tue, 9 Jul 2024 12:08:50 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294).
>> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers.
>> 
>> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time.
>> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
>> 
>> Besides of the code changes, one important task is to handle the legal process.
>> 
>> Thanks!
>> 
>> ## Test
>> tests:
>> * test/jdk/jdk/incubator/vector/
>> * test/hotspot/jtreg/compiler/vectorapi/
>> 
>> options:
>> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:+EnableVectorSupport -XX:-UseVectorStubs
>> 
>> ## Performance
>> 
>> ### Options
>> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
>> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
>> 
>> ### Float
>> data
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0)
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408
>> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876
>> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936
>> Float128Vector.ATAN...
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   skip TANH

Obviously we need to include pre-generated sources in the repo so that most people can just build the library using *sanctioned* code without needing to regenerate anything.

I absolutely agree with @theRealAph that we need to have all relevant SLEEF header build scripts in the OpenJDK repo so that anyone who want to rebuild the headers can do so. I don't believe it is just packagers who will want to do that and it is good open source practice to allow and, where possible, make it easy for anyone to do so.

Given the size of the original SLEEF sources I also agree with @theRealAph it is no great burden to include them in the jdk repo. However, I am not averse to @vidmik's alternative of putting the sources in an openjdk/sleef repo. That would be fine so long as the openjdk repo includes SLEEF build scripts that pull a determinate hash to generate the headers.

Likewise I agree with @vidmik's suggestion of omitting the extra packages the SLEEF generate step requires from the standard configure/make scripts would be fine so long as the SLEEF build scripts prompt users on what to install. We don't want to force everyone to install packages that they don't need. But we do still need to make it straightforward for those who do want to regenerate the sources to achieve that goal.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230559757

From duke at openjdk.org  Tue Jul 16 10:38:58 2024
From: duke at openjdk.org (Stewart X Addison)
Date: Tue, 16 Jul 2024 10:38:58 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <brYLslKpd_pgvj7HsK527bi1vXS-7FuzMBusHLpZ25I=.e205b93e-ac3b-4213-bda1-bab72946f206@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
 <uWeKJ7D4_DgMnBgU2o4KzvVU6lB4xVBds2-SAAPEthU=.cfa1b966-9791-4773-9f10-cb35f58871f0@github.com>
 <brYLslKpd_pgvj7HsK527bi1vXS-7FuzMBusHLpZ25I=.e205b93e-ac3b-4213-bda1-bab72946f206@github.com>
Message-ID: <yWw7g4vjHgTC-zflIyxoc7_f18EcKeoAtJ8KHV1f76Y=.925a5e88-2fb7-4d74-b500-9d9d400b6dfb@github.com>

On Tue, 16 Jul 2024 08:21:04 GMT, Andrew Haley <aph at openjdk.org> wrote:

> This is starting to sound like we need a policy decision, because we don't want to re-hash this discussion every time the question comes up, as it surely will. 

+1 to this if we don't already have one

While I haven't read through every comment in this thread in this specific case I generally agree with what @theRealAph has said in some of his earlier comments. My primary concern is that the generated code in there is currently effectively unreviewable in terms of checking for potential vulnerabilities so I also feel it's best to check in the whole (reviewable) source if this PR is to be accepted. Much as I dislike repository bloat I think it's a fairly easy decision in this case IMHO with SLEEF being 7.5MB in size when the openjdk codebase is so large.

An alternative "absolute minimum" would be to reference the GitHub SHA of the SLEEF source and include the process for regenerating it reproducibly so that this information is available to anyone who wanted to verify it. With my distributor (Temurin) hat on either of those solutions would mean we have the original source referenced for inclusion in the product SBOM to track the supply chain. I'll also note that I'm also making an assumption here that the generated code from SLEEF is reproducible and not sensitive to the build environment like the CDS archives - I have not tried building them myself to verify but I feel that is important to understand before merging the generated code.

As a project should also consider whole issue of ensuring that we have sufficient trust from a supply-chain perspective on the SLEEF source ... I have no specific reason to distrust it but it might be good to understand how well reviewed it is before doing this as it's not a project I'm personally familiar with.

_On a slightly separate note (and I see @luhenry is in this comment thread too and has contributed to SLEEF) it will be good if this can be used to enhance the performance on RISC-V too in the future ;-)_

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230569814

From mli at openjdk.org  Tue Jul 16 10:38:59 2024
From: mli at openjdk.org (Hamlin Li)
Date: Tue, 16 Jul 2024 10:38:59 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <m3SuSFaXSrlS3hEl2vwD43JqUZFg0CbgXozRVliTa-Q=.26512d19-ecf9-4de5-9106-27794407c61d@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
 <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com>
 <m3SuSFaXSrlS3hEl2vwD43JqUZFg0CbgXozRVliTa-Q=.26512d19-ecf9-4de5-9106-27794407c61d@github.com>
Message-ID: <Sd8-4UxWAFuL8h7xMKkLAwF1InMGxN-raDX6HjKNFSY=.4fb8f96a-cb91-417f-802d-837ed28e6266@github.com>

On Tue, 16 Jul 2024 09:48:55 GMT, Andrew Haley <aph at openjdk.org> wrote:

>>> Currently,
>>> 
>>>     * in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185) it generates the sleef inline headers from sleef 3.6.1, which is tagged in sleef repo.
>>> 
>>>     * And with the script in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), anyone with access to sleef repo can re-generate these inline headers by himself
>> 
>> Right, but think about package builders. This isn't about J Random Hacker doing it by hand.
>> 
>> When a package gets built, the builder machine unpacks source code. If SLEEF is included as part of JDK source, all the builder has to do is run the script and overwrite whatever preprocessed source is in there. The alternative is packaging the SLEEF source code tarball separately in the OpenJDK source package. Sure, all of this can be done, but it's a question of whether we do it once, here, now, or all the downstream builders have to do it themselves.
>> 
>>> ( in fact anyone can generate the inline headers from sleef from scratch without using scripts in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), our script just make it easy for the future maintenance), so it's easy for anyone to verify these inline header files used in jdk.
>> 
>> That script must be checked in to the OpenJDK tree.
>> 
>>> With these 2 points, seems the traceability is fine to me, please kindly point out if I missed some points. Maybe we can add some more clear and specific information in README or createSleef.sh in #19185 to indicate which version of sleef source we're using in jdk.
>>> 
>>> I'm also fine with your suggestion to add whole sleef repo into jdk (maybe we can remove some of files, but we can ignore the difference temporarily in the dicussion here). To copy the sleef repo into jdk, we still need to pre-generate the inline header files, and check them in jdk along with the sleef repo, I think you also think so too
>> 
>> Yes.
>> 
>>> (As without checking in these inline headers, we will have to bring some extra dependencies into jdk, and increase extra compilation time when building jdk). But from traceability point of view, seems to me it does not bring extra benefit than current #19185. For example, if someone want to verify the pre-generate inline headers in jdk, he need to first verify the sleef source in jdk, then the pre-generated sleef inline headers.
>> 
>> You don't need to verify the pre-generated inline headers, just overwrite them. The ...
>
>> @theRealAph Thanks for clarification.
>> 
>> I think there are several different parts involved in the above discussion, please kindly correct me if I misunderstood.
>> 
>>     1. package builders. This is about the release of jdk (both src and binary), by either openjdk, adoptium, or any other downstream vendors.
>> 
>>     2. jdk daily development. This is about to modify, build, run/test jdk daily by jdk developers.
>> 
>> For the package builders, original sleef source is 
> 
> may be
> 
>> necessary; for the jdk daily development, only pre-generated sleef inline headers are necessary.
> 
> Yes, most of the time. Some devs will want to be more thorough.
> 
>> The script to pre-generate sleef inline headers is only triggerred by package builders (and I think it involves some scripts which are not part of jdk source ? e.g. the script to trigger pre-generating script),
> 
> No: all of the scripts to generate the preprocessed source from the SLEEF source must in the OpenJDK source.
> 
>> but for jdk daily development, we just need pre-generated sleef inline headers. Am I understanding correctly above?
> 
> Yes, most of the time.
> 
> Bear in mind that convenient daily development of OpenJDK is important, because we don't want to discourage developers. But we've never treated the size of the repo as one of our primary considerations.

@theRealAph I see, I think now I understand the whole picture of your concerns. Thanks!

> I think the key question is whether we're comfortable relying on/pointing at an external repository which may or may not be there tomorrow and/or where tags may change outside of our control.
> The SLEEF source code looks to be around 7.5MB, give or take. That's not enormous, but it's not exactly small when keeping in mind that if we #include it in the jdk repo it's going to be there for every cloned repo in every project/branch and very few will actually care about it. I agree that we'd still have to include the pre-generated header files.
> Hence my suggestion to consider putting it under our control, but in a separate openjdk controlled repository.

Based on @vidmik 's previous comments, I think we all agree original sleef source should be added into jdk, including pre-generated sleef inline headers, the only different opinions between us are about how to include sleef source into jdk, one is to just add it into jdk repo itself, another is to put it in another repo which is under control of jdk. Please kindly correct me if I misunderstood.

I have not particular preference which options to take. My only concern is how long it will take to make that decision. If it could take rather long time, can we take several incremental steps to achieve the final goal? e.g.
1. add pre-generated sleef inline headers into jdk, which is done by https://github.com/openjdk/jdk/pull/19185
2. support vector math in jdk, which is done by this pr.
3. add sleef source into either jdk repo itself or another repo under control of jdk.

I think we have plenty time to achieve the final goal in jdk-24.

How do you think about it? @theRealAph @vidmik @luhenry @magicus @erikj79

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230571779

From mli at openjdk.org  Tue Jul 16 10:44:56 2024
From: mli at openjdk.org (Hamlin Li)
Date: Tue, 16 Jul 2024 10:44:56 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v9]
In-Reply-To: <UwSxg6BMklnndJlZGLVLgDgvcr-VrZbeuJyyBHMFrZ0=.eaad3ac9-b520-4abd-8f74-663f70e20f6d@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <oCz6z6Z7w3GxanCxt7zcGKl-VgMQlo_RLP7gDMBZ4nI=.0ada5ef0-adfb-4da7-9175-660b8b576dbd@github.com>
 <eT48AR-Up7CyMkuiFet-hoQtyaO_hifCSZUQ6LJrjnQ=.026071f1-de0f-4589-a247-c7fc2afe68c4@github.com>
 <2VnXjMF_4HQa-bHWEW0-VaXF9VtQUs92mnPyUlF8UY8=.b6d68aab-b0f5-4544-b543-046d12f92b1b@github.com>
 <iaoi0o--txlXDpM7hHfpbn_wQWD9DxBlRDwXaQ8V9RQ=.e59b601f-3c14-4560-aec1-ba3bce070c01@github.com>
 <5M8k0CGVXI79Dgu5BVVkEU6sHy7Z3jLvkqyTAg7TelU=.85707058-20a5-4574-86a4-b5c6ca05b4a7@github.com>
 <UwSxg6BMklnndJlZGLVLgDgvcr-VrZbeuJyyBHMFrZ0=.eaad3ac9-b520-4abd-8f74-663f70e20f6d@github.com>
Message-ID: <t8i-uWYm_zkQs-I5gD9oW0hXKwcyqv9Q3knUox47A6k=.2ec13bf7-a79d-4438-9cc6-1b840c65c29b@github.com>

On Mon, 15 Jul 2024 17:35:59 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:

>>> > I can't tell what problem we're trying to solve by not simply checking in the source code, in its preferred form, to the OpenJDK tree. Thhis has practical advantages to do with traceability and security, and in-principle reasons to do with basic Open Source practice too. On the other side, there are no disadvantages.
>>> 
>>> Do you suggest to copy the whole sleef source repo into jdk?
>> 
>> I think so, along with scripting that generates the preprocessed file we use. It might be the case that there are some sleef files not used at all they could be omitted, but I'm not sure it would be useful, and from a traceability point of view it's probably best to grab it all, unless it's really huge
>
>> > > I can't tell what problem we're trying to solve by not simply checking in the source code, in its preferred form, to the OpenJDK tree. Thhis has practical advantages to do with traceability and security, and in-principle reasons to do with basic Open Source practice too. On the other side, there are no disadvantages.
>> > 
>> > 
>> > Do you suggest to copy the whole sleef source repo into jdk?
>> 
>> I think so, along with scripting that generates the preprocessed file we use. It might be the case that there are some sleef files not used at all they could be omitted, but I'm not sure it would be useful, and from a traceability point of view it's probably best to grab it all, unless it's really huge
> 
> Given the Sleef build system currently uses cmake, we would have two choices to build the header files as part of the OpenJDK build system:
> 1. take a dependency on cmake in order to build the Sleef headers
> 2. write a custom build system for Sleef to integrate into OpenJDK
> 
> Neither approach sound good to me as a mandatory option.
> 
> However, if we are to allow the person building OpenJDK to _optionally_ generate the headers from a Sleef source checkout (provided by the user with a `--with-sleef-src=/path/to/sleef`), we can then more easily take the assumption that the user has installed the necessary dependencies. That would also be in line with how binutils is being built and integrated.

> _On a slightly separate note (and I see @luhenry is in this comment thread too and has contributed to SLEEF) it will be good if this can be used to enhance the performance on RISC-V too in the future ;-)_

We already had a prototype which depends on this pr, and the performance gain is promising.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230579500

From aph at openjdk.org  Tue Jul 16 10:44:57 2024
From: aph at openjdk.org (Andrew Haley)
Date: Tue, 16 Jul 2024 10:44:57 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <m3SuSFaXSrlS3hEl2vwD43JqUZFg0CbgXozRVliTa-Q=.26512d19-ecf9-4de5-9106-27794407c61d@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
 <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com>
 <m3SuSFaXSrlS3hEl2vwD43JqUZFg0CbgXozRVliTa-Q=.26512d19-ecf9-4de5-9106-27794407c61d@github.com>
Message-ID: <GvXWotWLspc5hS8zeLHxCKLcFdqzeH4D-1r6Ju4lYIw=.c8070480-d31d-48c2-8d0f-9be57a0441de@github.com>

On Tue, 16 Jul 2024 09:48:55 GMT, Andrew Haley <aph at openjdk.org> wrote:

> @theRealAph Thanks for clarification.
> 
> I think there are several different parts involved in the above discussion, please kindly correct me if I misunderstood.
> 
>     1. package builders. This is about the release of jdk (both src and binary), by either openjdk, adoptium, or any other downstream vendors.
> 
>     2. jdk daily development. This is about to modify, build, run/test jdk daily by jdk developers.
> 
> For the package builders, original sleef source is 

may be

> necessary; for the jdk daily development, only pre-generated sleef inline headers are necessary.

Yes, most of the time. Some devs will want to be more thorough.

> The script to pre-generate sleef inline headers is only triggerred by package builders (and I think it involves some scripts which are not part of jdk source ? e.g. the script to trigger pre-generating script),

No: all of the scripts to generate the preprocessed source from the SLEEF source must in the OpenJDK source.

> but for jdk daily development, we just need pre-generated sleef inline headers. Am I understanding correctly above?

Yes, most of the time.

Bear in mind that convenient daily development of OpenJDK is important, because we don't want to discourage developers. But we've never treated the size of the repo as one of our primary considerations.

> I have not particular preference which options to take. My only concern is how long it will take to make that decision. If it could take rather long time, can we take several incremental steps to achieve the final goal? e.g.

We're only a couple of weeks away from the summit. What would be a long time?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230581736

From mli at openjdk.org  Tue Jul 16 10:50:55 2024
From: mli at openjdk.org (Hamlin Li)
Date: Tue, 16 Jul 2024 10:50:55 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <GvXWotWLspc5hS8zeLHxCKLcFdqzeH4D-1r6Ju4lYIw=.c8070480-d31d-48c2-8d0f-9be57a0441de@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
 <HIRpMmVn-DXZ0J6woZFgviDZEcVnNR2m6YecDBTiuPY=.13e18284-0aa7-45fb-8bd2-1c0ae0be1914@github.com>
 <m3SuSFaXSrlS3hEl2vwD43JqUZFg0CbgXozRVliTa-Q=.26512d19-ecf9-4de5-9106-27794407c61d@github.com>
 <GvXWotWLspc5hS8zeLHxCKLcFdqzeH4D-1r6Ju4lYIw=.c8070480-d31d-48c2-8d0f-9be57a0441de@github.com>
Message-ID: <XA-SfmwxqKp1L0Ca3fIalhVzhaukZAEgUEOQ1UDD8Cw=.1346d19a-5dfd-4146-bc45-c593e834d14f@github.com>

On Tue, 16 Jul 2024 10:42:24 GMT, Andrew Haley <aph at openjdk.org> wrote:

> We're only a couple of weeks away from the summit. What would be a long time?

OK, then let's wait for it.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2230591233

From shade at openjdk.org  Tue Jul 16 11:31:55 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Tue, 16 Jul 2024 11:31:55 GMT
Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers
 [v2]
In-Reply-To: <igvhqHUrjhPCzZuCcU3N27827N5KRF3m6N8PuqXJdAc=.5b174ff0-0044-4618-af75-5cbbf3021f7d@github.com>
References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
 <WSVnDVWEq7cIaiEd2-pdWW4Il8Qi4wwvjF2yyveKcgM=.613045d7-a827-4f3d-bcf4-ba9200a2c8f4@github.com>
 <t3K5QhtFrCpM4EoXc_pskncDv72bSfKgUKfguzjVI0Q=.4e5b01d1-9cad-45ec-8d70-656615bee374@github.com>
 <0j_XZ2e84ADGz8jxk21pFyF0QNhubV0i7sVi5sxnSyg=.7281e6d1-bf24-49f1-96a6-8284c4c9f90d@github.com>
 <G5EBaq25gdUcR-5HHsF3Bg8vvpXImOqwnKZbIht8LMI=.07dd543a-1442-495b-97cd-c2bffe268949@github.com>
 <5Xt9rNCHwYnwvFMglf_Yp5ZzwKEDNrmRecR_NrFLGMA=.7aa1fef1-a977-4244-ad24-df9897bb2743@github.com>
 <JbfdSfSkvw0v3-W6vH1_jeilVN47W1vtRGKCCLuBI-Q=.37ffdc4f-95e7-4b2d-b7a1-89895bb081d4@github.com>
 <igvhqHUrjhPCzZuCcU3N27827N5KRF3m6N8PuqXJdAc=.5b174ff0-0044-4618-af75-5cbbf3021f7d@github.com>
Message-ID: <leWZFcvd_U0fheNtw2jg5UNqI3FWde3kWDouz-ysr_w=.cd791610-3d70-45c3-b7bd-1e21b62596d4@github.com>

On Tue, 16 Jul 2024 10:18:53 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> I'll probably fork get_flags change as a separate bug as well.

Now part of: https://bugs.openjdk.org/browse/JDK-8336468

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20120#discussion_r1679228645

From shade at openjdk.org  Tue Jul 16 11:37:51 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Tue, 16 Jul 2024 11:37:51 GMT
Subject: RFR: 8336103: Sharper checks for <init> and <clinit> initializers
 [v3]
In-Reply-To: <swBWpqAm_k6hHjGcwdNBowWfdBpksxtD63PiGp0KI1c=.ad02279c-ed66-40a0-9b01-379d4410a16c@github.com>
References: <bCys51DaXKl64gEdV10WAKffH5KEwwHZH3oIYBHmL38=.0568b7d5-1b38-40bd-8932-07050c69bd8d@github.com>
 <swBWpqAm_k6hHjGcwdNBowWfdBpksxtD63PiGp0KI1c=.ad02279c-ed66-40a0-9b01-379d4410a16c@github.com>
Message-ID: <oTI3X6WBWYwOAhQtv9O2saeOZUMZ0X1eo1cVPd2ojvw=.c531e3e5-dd9e-4793-8152-dafce6162b86@github.com>

On Fri, 12 Jul 2024 09:17:22 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> All around Hotspot, we have calls to `method->is_initializer()`. That methods test for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor (instance initializer), not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. Often we get lucky by never being exposed to static initializers on particular paths.
>> 
>> I would like to sharpen this. I went back and forth, and ultimately decided to remove `is_initializer` completely to avoid future confusion, and rewrite the uses appropriately.
>> 
>> Additional testing:
>>  - [x] Linux AArch64 server fastdebug, `all` (includes Fuzzer and CTW tests)
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Touch up assert messages

Putting back to draft until the behavioral changes are done in separate sub-tasks.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20120#issuecomment-2230671038

From shade at openjdk.org  Tue Jul 16 12:09:14 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Tue, 16 Jul 2024 12:09:14 GMT
Subject: RFR: 8336468: Reflection and MethodHandles should use more precise
 initializer checks
Message-ID: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com>

This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103).
 
There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior.

I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`.

@mlchung, you probably want to look at this more closely.

Additional testing:
 - [x] Linux x86_64 server fastdebug, `tier1`
 - [ ] Linux x86_64 server fastdebug, `all`

-------------

Commit messages:
 - Remove unnecessary handle-izing
 - Fix
 - Fix

Changes: https://git.openjdk.org/jdk/pull/20192/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20192&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8336468
  Stats: 38 lines in 5 files changed: 14 ins; 10 del; 14 mod
  Patch: https://git.openjdk.org/jdk/pull/20192.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20192/head:pull/20192

PR: https://git.openjdk.org/jdk/pull/20192

From rkennke at openjdk.org  Tue Jul 16 12:46:55 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Tue, 16 Jul 2024 12:46:55 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
Message-ID: <1fs1zYHKJsoWuEpKNb1ZY_VQ7_i_gQrbmx4d2fJvQo0=.1e3cbf20-dedf-4113-95c2-444869a75d1d@github.com>

On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision:
> 
>  - Remove try_read
>  - Add explicit to single parameter constructors
>  - Remove superfluous access specifier
>  - Remove unused include
>  - Update assert message OMCache::set_monitor
>  - Fix indentation
>  - Remove outdated comment LightweightSynchronizer::exit
>  - Remove logStream include
>  - Remove strange comment
>  - Fix javaThread include

Another review pass by me. It looks to me like the cache lookup can be improved, see comments below.

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 323:

> 321:       ldr(t1, Address(t3_t));
> 322:       cmp(obj, t1);
> 323:       br(Assembler::EQ, monitor_found);

I think the loop could be optimized a bit, if we start with the (cache_address) - 1 in t3, then increment t3 at the start of the loop, and let the success-case fall-through and only branch back to loop-start or to failure-path. Something like:

bind(loop);
increment(t3_t, in_bytes(OMCache::oop_to_oop_difference()));
ldr(t1, Address(t3_t));
cbnz(t1, loop);
cmp(obj, t1);
br(Assembler::NE, loop);
// Success

Advantage would be that we have no forward-branch in the fast/expected case. CPU static branch prediction tends to not like that. I'm not sure if if makes a difference, though. Also, if you do that, then the unrolled loop also needs corresponding adjustment.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 674:

> 672: 
> 673:       // Search for obj in cache.
> 674:       bind(loop);

Same loop transformation would be possible here.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 776:

> 774:     movl(top, Address(thread, JavaThread::lock_stack_top_offset()));
> 775: 
> 776:     if (!UseObjectMonitorTable) {

Why is the mark loaded here in the !UOMT case, but later in the +UOMT case?

-------------

PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2179942149
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1679210139
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1679313050
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1679315158

From rkennke at openjdk.org  Tue Jul 16 12:46:55 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Tue, 16 Jul 2024 12:46:55 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <1fs1zYHKJsoWuEpKNb1ZY_VQ7_i_gQrbmx4d2fJvQo0=.1e3cbf20-dedf-4113-95c2-444869a75d1d@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
 <1fs1zYHKJsoWuEpKNb1ZY_VQ7_i_gQrbmx4d2fJvQo0=.1e3cbf20-dedf-4113-95c2-444869a75d1d@github.com>
Message-ID: <men3PDFTArcoKhAknFkS7kJ-OEcYDd8-6oyH4MOx36M=.0e9ee765-ed8e-4c9f-8faf-62a7b489f76e@github.com>

On Tue, 16 Jul 2024 12:37:43 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision:
>> 
>>  - Remove try_read
>>  - Add explicit to single parameter constructors
>>  - Remove superfluous access specifier
>>  - Remove unused include
>>  - Update assert message OMCache::set_monitor
>>  - Fix indentation
>>  - Remove outdated comment LightweightSynchronizer::exit
>>  - Remove logStream include
>>  - Remove strange comment
>>  - Fix javaThread include
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 776:
> 
>> 774:     movl(top, Address(thread, JavaThread::lock_stack_top_offset()));
>> 775: 
>> 776:     if (!UseObjectMonitorTable) {
> 
> Why is the mark loaded here in the !UOMT case, but later in the +UOMT case?

Ah I see, it is because we don't have enough registers. Right?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1679316824

From stuefe at openjdk.org  Tue Jul 16 14:19:56 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 16 Jul 2024 14:19:56 GMT
Subject: RFR: 8330144: Revise os::free_memory() [v2]
In-Reply-To: <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com>
References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
 <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com>
Message-ID: <jeGYenDGeG2TuOZuIr_B2ZZBDt4f-HqSm3VmCnI05V0=.dbe1f1cd-641c-4819-85ff-5f5d0e356847@github.com>

On Wed, 10 Jul 2024 20:09:45 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> ### Summary
>> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting.  The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. 
>> 
>> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory.  
>> 
>> **Transparent huge pages:**
>> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted.
>> 
>> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. 
>> 
>> **Explicit huge pages:**
>> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter.
>> 
>> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo.  After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use.  Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead.  After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an imp...
>
> Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Minor cleanup and comments.
>  - rename to disclaim_memory and update test

Good. Thanks!

-------------

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20080#pullrequestreview-2180397408

From stuefe at openjdk.org  Tue Jul 16 14:21:53 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 16 Jul 2024 14:21:53 GMT
Subject: RFR: 8300732: Whitebox functions for Metaspace test should use
 byte size [v2]
In-Reply-To: <it3fVJfbutmJVOuZy7XTjmzgHXPMC8TJHAGI1gxCKjs=.3f8c72d6-97c1-408d-8e27-20ea940d7f89@github.com>
References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com>
 <it3fVJfbutmJVOuZy7XTjmzgHXPMC8TJHAGI1gxCKjs=.3f8c72d6-97c1-408d-8e27-20ea940d7f89@github.com>
Message-ID: <aHSDP47aGik9Qecjv90nVZqfvVoHjqDj-5BaiOaxU44=.c54d9138-58a3-4f63-92e1-8e9abeec7c21@github.com>

On Mon, 15 Jul 2024 19:41:13 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. 
>> 
>> Testing: 
>> - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. 
>> 
>> Thanks, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Hard coding values and adding Unit class
>  - whitebox changes based on feedback. Using is_aligned and asserts

Okay, if 32-bit passes. Thanks!

test/hotspot/jtreg/runtime/Metaspace/elastic/TestMetaspaceAllocation.java line 56:

> 54:         MetaspaceTestContext context = new MetaspaceTestContext();
> 55:         MetaspaceTestArena arena1 = context.createArena(false, 32L * Unit.valueOf("M").size());
> 56:         MetaspaceTestArena arena2 = context.createArena(true, 32L * Unit.valueOf("M").size());

Why not just `Unit.M.size()` ?

-------------

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20039#pullrequestreview-2180404772
PR Review Comment: https://git.openjdk.org/jdk/pull/20039#discussion_r1679500651

From szaldana at openjdk.org  Tue Jul 16 14:39:52 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Tue, 16 Jul 2024 14:39:52 GMT
Subject: RFR: 8300732: Whitebox functions for Metaspace test should use
 byte size [v2]
In-Reply-To: <aHSDP47aGik9Qecjv90nVZqfvVoHjqDj-5BaiOaxU44=.c54d9138-58a3-4f63-92e1-8e9abeec7c21@github.com>
References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com>
 <it3fVJfbutmJVOuZy7XTjmzgHXPMC8TJHAGI1gxCKjs=.3f8c72d6-97c1-408d-8e27-20ea940d7f89@github.com>
 <aHSDP47aGik9Qecjv90nVZqfvVoHjqDj-5BaiOaxU44=.c54d9138-58a3-4f63-92e1-8e9abeec7c21@github.com>
Message-ID: <JcktcaPbfK8cJuEZ5J9ibD5BX0IpFlXY35wVb001QXI=.1550ab2f-8b35-43a7-8964-ff7b7de95249@github.com>

On Tue, 16 Jul 2024 14:19:42 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> Okay, if 32-bit passes. Thanks!

Correct! I made a GHA job to verify with the builds there.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20039#issuecomment-2231088158

From pchilanomate at openjdk.org  Tue Jul 16 14:40:55 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Tue, 16 Jul 2024 14:40:55 GMT
Subject: [jdk23] RFR: 8335409: Can't allocate and retain memory from
 resource area in frame::oops_interpreted_do oop closure after 8329665
In-Reply-To: <g51dGtyAfFu3y_uVGE7KBXcGPIkb1KICznIibbbDoLs=.202e218b-83ca-4b29-840c-0ae03949620a@github.com>
References: <d-ziRnu1_RcjgWDVhYQYb4U0xIWyi5B-hljLzDwQlt4=.a53602c1-25b7-4c93-b468-d55201959846@github.com>
 <g51dGtyAfFu3y_uVGE7KBXcGPIkb1KICznIibbbDoLs=.202e218b-83ca-4b29-840c-0ae03949620a@github.com>
Message-ID: <vvHbxZrKVLQPJKy78pXaJCQwncehdQFwtKoT1UnxLiQ=.084fe4ff-7ae1-4d6d-b9d8-8b726e814b34@github.com>

On Mon, 15 Jul 2024 18:23:44 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> Hi all,
>> 
>> This pull request contains a backport of commit [7ab96c74](https://github.com/openjdk/jdk/commit/7ab96c74e2c39f430a5c2f65a981da7314a2385b) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository.
>> 
>> The commit being backported was authored by Patricio Chilano Mateo on 10 Jul 2024 and was reviewed by David Holmes, Thomas Stuefe, Coleen Phillimore and Aleksey Shipilev.
>> 
>> Thanks
>
> Marked as reviewed by shade (Reviewer).

Thanks for the review @shipilev!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20185#issuecomment-2231083921

From pchilanomate at openjdk.org  Tue Jul 16 14:40:56 2024
From: pchilanomate at openjdk.org (Patricio Chilano Mateo)
Date: Tue, 16 Jul 2024 14:40:56 GMT
Subject: [jdk23] Integrated: 8335409: Can't allocate and retain memory from
 resource area in frame::oops_interpreted_do oop closure after 8329665
In-Reply-To: <d-ziRnu1_RcjgWDVhYQYb4U0xIWyi5B-hljLzDwQlt4=.a53602c1-25b7-4c93-b468-d55201959846@github.com>
References: <d-ziRnu1_RcjgWDVhYQYb4U0xIWyi5B-hljLzDwQlt4=.a53602c1-25b7-4c93-b468-d55201959846@github.com>
Message-ID: <LJe1370nGwLg6AfN-sP11NoXuo3gOw_DNBlL4IAGzuo=.aadc127e-0505-4c6f-9ed1-3721f4a5871f@github.com>

On Mon, 15 Jul 2024 18:13:53 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

> Hi all,
> 
> This pull request contains a backport of commit [7ab96c74](https://github.com/openjdk/jdk/commit/7ab96c74e2c39f430a5c2f65a981da7314a2385b) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository.
> 
> The commit being backported was authored by Patricio Chilano Mateo on 10 Jul 2024 and was reviewed by David Holmes, Thomas Stuefe, Coleen Phillimore and Aleksey Shipilev.
> 
> Thanks

This pull request has now been integrated.

Changeset: d7b7c172
Author:    Patricio Chilano Mateo <pchilanomate at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/d7b7c1724d87e611c854c73a9a6140a91f132125
Stats:     55 lines in 3 files changed: 6 ins; 20 del; 29 mod

8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665

Reviewed-by: shade
Backport-of: 7ab96c74e2c39f430a5c2f65a981da7314a2385b

-------------

PR: https://git.openjdk.org/jdk/pull/20185

From szaldana at openjdk.org  Tue Jul 16 14:42:52 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Tue, 16 Jul 2024 14:42:52 GMT
Subject: RFR: 8300732: Whitebox functions for Metaspace test should use
 byte size [v2]
In-Reply-To: <aHSDP47aGik9Qecjv90nVZqfvVoHjqDj-5BaiOaxU44=.c54d9138-58a3-4f63-92e1-8e9abeec7c21@github.com>
References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com>
 <it3fVJfbutmJVOuZy7XTjmzgHXPMC8TJHAGI1gxCKjs=.3f8c72d6-97c1-408d-8e27-20ea940d7f89@github.com>
 <aHSDP47aGik9Qecjv90nVZqfvVoHjqDj-5BaiOaxU44=.c54d9138-58a3-4f63-92e1-8e9abeec7c21@github.com>
Message-ID: <0K8_t5xuwvqbF5FI0J_LDcSEj6tXJ6m6GEkUXfJHZJw=.a1bce5d3-4a2f-4efb-ac75-bce29583a6a1@github.com>

On Tue, 16 Jul 2024 14:18:27 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Sonia Zaldana Calles has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Hard coding values and adding Unit class
>>  - whitebox changes based on feedback. Using is_aligned and asserts
>
> test/hotspot/jtreg/runtime/Metaspace/elastic/TestMetaspaceAllocation.java line 56:
> 
>> 54:         MetaspaceTestContext context = new MetaspaceTestContext();
>> 55:         MetaspaceTestArena arena1 = context.createArena(false, 32L * Unit.valueOf("M").size());
>> 56:         MetaspaceTestArena arena2 = context.createArena(true, 32L * Unit.valueOf("M").size());
> 
> Why not just `Unit.M.size()` ?

Good point - direct access is less error prone anyway. I can update it.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20039#discussion_r1679543890

From jvernee at openjdk.org  Tue Jul 16 14:46:13 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Tue, 16 Jul 2024 14:46:13 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v4]
In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
Message-ID: <mJv-yN_MA6-w1fqnIbYfoTUJmR_p2myDO2-0BA-Op7I=.ce80bd5d-da24-4150-bed9-5c614c02b3b8@github.com>

> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
> 
> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
> 
> In this PR:
> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
> 
> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
> 
> 
> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
> ConcurrentClose.sharedAccess        1   avgt   10    52.042 ?   0.630  us/op
> ConcurrentClose.conf...

Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:

  JVMCI support

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20158/files
  - new: https://git.openjdk.org/jdk/pull/20158/files/6d0b9b57..62849aa8

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=02-03

  Stats: 31 lines in 9 files changed: 29 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20158.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158

PR: https://git.openjdk.org/jdk/pull/20158

From szaldana at openjdk.org  Tue Jul 16 14:49:27 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Tue, 16 Jul 2024 14:49:27 GMT
Subject: RFR: 8300732: Whitebox functions for Metaspace test should use
 byte size [v3]
In-Reply-To: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com>
References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com>
Message-ID: <D7Jfl3uzXBUJgWKE_iu88rhdWpkge5IK4SPs3lt4xUM=.b112d42f-c718-4922-ad69-da4714cf5ecb@github.com>

> Hi all, 
> 
> This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. 
> 
> Testing: 
> - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. 
> 
> Thanks, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  Feedback - updating Unit.valueOf to direct access

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20039/files
  - new: https://git.openjdk.org/jdk/pull/20039/files/7c0138ca..d6a1155d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20039&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20039&range=01-02

  Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/20039.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20039/head:pull/20039

PR: https://git.openjdk.org/jdk/pull/20039

From duke at openjdk.org  Tue Jul 16 14:57:53 2024
From: duke at openjdk.org (Robert Toyonaga)
Date: Tue, 16 Jul 2024 14:57:53 GMT
Subject: RFR: 8330144: Revise os::free_memory() [v2]
In-Reply-To: <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com>
References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
 <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com>
Message-ID: <ra9L9o5yJ32V6_UDWlfpVtQC_3trCI_o5xr_k4jTHGo=.85b235fd-54cb-4366-994a-c9f71c57a72a@github.com>

On Wed, 10 Jul 2024 20:09:45 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> ### Summary
>> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting.  The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. 
>> 
>> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory.  
>> 
>> **Transparent huge pages:**
>> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted.
>> 
>> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. 
>> 
>> **Explicit huge pages:**
>> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter.
>> 
>> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo.  After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use.  Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead.  After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an imp...
>
> Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Minor cleanup and comments.
>  - rename to disclaim_memory and update test

Thank you @tstuefe for the review!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20080#issuecomment-2231122489

From duke at openjdk.org  Tue Jul 16 14:57:53 2024
From: duke at openjdk.org (duke)
Date: Tue, 16 Jul 2024 14:57:53 GMT
Subject: RFR: 8330144: Revise os::free_memory() [v2]
In-Reply-To: <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com>
References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
 <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com>
Message-ID: <YI-psWpho2158NJ2PL0y6TLAI0VxV7GbAniOT4BynbI=.fa804499-0236-4bac-8b9e-a41f3fb5fe0b@github.com>

On Wed, 10 Jul 2024 20:09:45 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> ### Summary
>> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting.  The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. 
>> 
>> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory.  
>> 
>> **Transparent huge pages:**
>> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted.
>> 
>> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. 
>> 
>> **Explicit huge pages:**
>> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter.
>> 
>> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo.  After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use.  Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead.  After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an imp...
>
> Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Minor cleanup and comments.
>  - rename to disclaim_memory and update test

@roberttoyonaga 
Your change (at version 6c9e6d5c385740e140b800113ec8d2b4d0a93e82) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20080#issuecomment-2231126734

From dnsimon at openjdk.org  Tue Jul 16 15:02:54 2024
From: dnsimon at openjdk.org (Doug Simon)
Date: Tue, 16 Jul 2024 15:02:54 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v4]
In-Reply-To: <mJv-yN_MA6-w1fqnIbYfoTUJmR_p2myDO2-0BA-Op7I=.ce80bd5d-da24-4150-bed9-5c614c02b3b8@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <mJv-yN_MA6-w1fqnIbYfoTUJmR_p2myDO2-0BA-Op7I=.ce80bd5d-da24-4150-bed9-5c614c02b3b8@github.com>
Message-ID: <cjBvDhr4WYpomLhE1f2PFYk68zD-maQIeFE0Juhld00=.e20ac51b-98e5-4d4d-bc0a-698623c18f6f@github.com>

On Tue, 16 Jul 2024 14:46:13 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   JVMCI support

src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethod.java line 62:

> 60: 
> 61:     /**
> 62:      * Returns true if this method has a {@code Scoped} annotation.

Can you please make this a qualified name: `jdk.internal.misc.ScopedMemoryAccess.Scoped`.
That makes it easier for someone not familiar with the code base to find.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1679575238

From jvernee at openjdk.org  Tue Jul 16 15:02:53 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Tue, 16 Jul 2024 15:02:53 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v4]
In-Reply-To: <mJv-yN_MA6-w1fqnIbYfoTUJmR_p2myDO2-0BA-Op7I=.ce80bd5d-da24-4150-bed9-5c614c02b3b8@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <mJv-yN_MA6-w1fqnIbYfoTUJmR_p2myDO2-0BA-Op7I=.ce80bd5d-da24-4150-bed9-5c614c02b3b8@github.com>
Message-ID: <BK3p94TURpN85760p7WQitpA4OmHhbPud0w1woLOYzg=.90c85018-3eda-4db7-9c3c-2da61a7a1f31@github.com>

On Tue, 16 Jul 2024 14:46:13 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   JVMCI support

Added JVMCI/Graal support, courtesy of @c-refice

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2231133607

From jvernee at openjdk.org  Tue Jul 16 15:12:15 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Tue, 16 Jul 2024 15:12:15 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v5]
In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
Message-ID: <gmVrODnXkmeQEvYVo4lI7ETijuxh0FXdsV6U14keGR0=.428ee145-c159-4efc-a705-b13439e92d97@github.com>

> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
> 
> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
> 
> In this PR:
> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
> 
> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
> 
> 
> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
> ConcurrentClose.sharedAccess        1   avgt   10    52.042 ?   0.630  us/op
> ConcurrentClose.conf...

Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:

  clarify javadoc

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20158/files
  - new: https://git.openjdk.org/jdk/pull/20158/files/62849aa8..cd5f290e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=03-04

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20158.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158

PR: https://git.openjdk.org/jdk/pull/20158

From jvernee at openjdk.org  Tue Jul 16 15:12:15 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Tue, 16 Jul 2024 15:12:15 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v4]
In-Reply-To: <cjBvDhr4WYpomLhE1f2PFYk68zD-maQIeFE0Juhld00=.e20ac51b-98e5-4d4d-bc0a-698623c18f6f@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <mJv-yN_MA6-w1fqnIbYfoTUJmR_p2myDO2-0BA-Op7I=.ce80bd5d-da24-4150-bed9-5c614c02b3b8@github.com>
 <cjBvDhr4WYpomLhE1f2PFYk68zD-maQIeFE0Juhld00=.e20ac51b-98e5-4d4d-bc0a-698623c18f6f@github.com>
Message-ID: <OY61zjgZBuXUaCZFohHoJY_Xw5Q9rB2ChH6IVMPWeqo=.083cfde3-3d70-4050-a7c1-cb7a8bf9de9b@github.com>

On Tue, 16 Jul 2024 15:00:04 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

>> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   JVMCI support
>
> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethod.java line 62:
> 
>> 60: 
>> 61:     /**
>> 62:      * Returns true if this method has a {@code Scoped} annotation.
> 
> Can you please make this a qualified name: `jdk.internal.misc.ScopedMemoryAccess.Scoped`.
> That makes it easier for someone not familiar with the code base to find.

Done

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20158#discussion_r1679589138

From stuefe at openjdk.org  Tue Jul 16 15:22:52 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 16 Jul 2024 15:22:52 GMT
Subject: RFR: 8330144: Revise os::free_memory() [v2]
In-Reply-To: <ra9L9o5yJ32V6_UDWlfpVtQC_3trCI_o5xr_k4jTHGo=.85b235fd-54cb-4366-994a-c9f71c57a72a@github.com>
References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
 <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com>
 <ra9L9o5yJ32V6_UDWlfpVtQC_3trCI_o5xr_k4jTHGo=.85b235fd-54cb-4366-994a-c9f71c57a72a@github.com>
Message-ID: <mkuk1TTS0tu1oaF10kVz3QT-565CvFW8uQasZ1_wUUo=.ba47496c-7024-4252-aa38-d80433b59c7d@github.com>

On Tue, 16 Jul 2024 14:53:00 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Minor cleanup and comments.
>>  - rename to disclaim_memory and update test
>
> Thank you @tstuefe for the review!

Hi @roberttoyonaga, unfortunately you'll need a second reviewer (standard rule for hotspot changes).

@MBaesken maybe?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20080#issuecomment-2231212584

From psandoz at openjdk.org  Tue Jul 16 15:43:54 2024
From: psandoz at openjdk.org (Paul Sandoz)
Date: Tue, 16 Jul 2024 15:43:54 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <xi5HafA-_m5iDGFCKkFvkqUtOe9vzbxc1Ix6m9EyPVU=.ac3930b3-7aa0-493e-9331-7706f2224f6a@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
 <YFP94FW91LrpdTMeak-ePVmpwlW788IBynq_qBZVves=.a6acb940-78b0-4fce-826a-fb065d8a41f6@github.com>
 <xi5HafA-_m5iDGFCKkFvkqUtOe9vzbxc1Ix6m9EyPVU=.ac3930b3-7aa0-493e-9331-7706f2224f6a@github.com>
Message-ID: <I0QvjXd3fIHwBOpDG3kRuX0S8D_8oFOoGS42FnOvsVU=.06dfa82c-051e-44fb-a4a2-62d6d7052d63@github.com>

On Tue, 16 Jul 2024 09:53:55 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> > IIUC this means we can remove the explicit fence here:
> > ```
> >     public ConstantCallSite(MethodHandle target) {
> >         super(target);
> >         isFrozen = true;
> >         UNSAFE.storeStoreFence(); // properly publish isFrozen update
> >     }
> > ```
> 
> I think so, but there is more to it: there are other fences around the `CallSite`-s that might be related to this. I would prefer not to do it any of usage changes in this PR. 

Agreed, just wanted to test my understanding.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2231262373

From shade at openjdk.org  Tue Jul 16 17:12:08 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Tue, 16 Jul 2024 17:12:08 GMT
Subject: RFR: 8336468: Reflection and MethodHandles should use more precise
 initializer checks [v2]
In-Reply-To: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com>
References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com>
Message-ID: <gkGtD5v4Ll6St0x5ABZpmL3wMVi1SreY-vNtIuaN-90=.a9282c6b-b8c1-46ea-8a89-ff26badcd949@github.com>

> This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103).
>  
> There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior.
> 
> I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`.
> 
> @mlchung, you probably want to look at this more closely.
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `tier1`
>  - [ ] Linux x86_64 server fastdebug, `all`

Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:

 - Merge branch 'master' into JDK-8336468-reflection-init-checks
 - Remove unnecessary handle-izing
 - Fix
 - Fix

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20192/files
  - new: https://git.openjdk.org/jdk/pull/20192/files/ac4fbcbf..6e35634b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20192&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20192&range=00-01

  Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20192.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20192/head:pull/20192

PR: https://git.openjdk.org/jdk/pull/20192

From jvernee at openjdk.org  Tue Jul 16 18:09:20 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Tue, 16 Jul 2024 18:09:20 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v6]
In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
Message-ID: <Kq6Xf3hnyRVLinNi7rm0oPm34BtiW1-qIqvahxWvXv0=.d44f3b37-e903-4274-aad6-820ec269fc8d@github.com>

> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
> 
> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
> 
> In this PR:
> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
> 
> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
> 
> 
> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
> ConcurrentClose.sharedAccess        1   avgt   10    52.042 ?   0.630  us/op
> ConcurrentClose.conf...

Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:

  Revert JVMCI changes

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20158/files
  - new: https://git.openjdk.org/jdk/pull/20158/files/cd5f290e..138fba42

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=04-05

  Stats: 31 lines in 9 files changed: 0 ins; 29 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20158.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158

PR: https://git.openjdk.org/jdk/pull/20158

From jvernee at openjdk.org  Tue Jul 16 18:09:21 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Tue, 16 Jul 2024 18:09:21 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v5]
In-Reply-To: <gmVrODnXkmeQEvYVo4lI7ETijuxh0FXdsV6U14keGR0=.428ee145-c159-4efc-a705-b13439e92d97@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <gmVrODnXkmeQEvYVo4lI7ETijuxh0FXdsV6U14keGR0=.428ee145-c159-4efc-a705-b13439e92d97@github.com>
Message-ID: <dqB_uPkMWEf9t9WG1Fd-A8yIHDxiGucSjnSqmM2zEAw=.b1898fd7-8630-4a97-a520-d97750a8a2b4@github.com>

On Tue, 16 Jul 2024 15:12:15 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   clarify javadoc

As discussed offline, JVMCI/Graal changes will be handled by a followup PR.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2231508565

From sviswanathan at openjdk.org  Tue Jul 16 18:25:51 2024
From: sviswanathan at openjdk.org (Sandhya Viswanathan)
Date: Tue, 16 Jul 2024 18:25:51 GMT
Subject: RFR: 8335860:
 compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard
 AVX/SSE settings
In-Reply-To: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
Message-ID: <jc55k6HOXEz5yz6Pk0mDv4x-kCPfexkW1QNZ4B8vQaw=.bdb3b2d3-553a-4b56-9490-451092439e65@github.com>

On Fri, 12 Jul 2024 18:26:26 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> Enabling test with explicit feature checks for x86 target.
> Removing from test/hotspot/jtreg/ProblemList.txt
> 
> Best Regards,
> Jatin

@jatin-bhateja  There was also a suggestion from @eme64 as part of https://github.com/openjdk/jdk/pull/20062 to remove @requires vm.compiler2.enabled from the test.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20160#issuecomment-2231538140

From sviswanathan at openjdk.org  Tue Jul 16 18:32:55 2024
From: sviswanathan at openjdk.org (Sandhya Viswanathan)
Date: Tue, 16 Jul 2024 18:32:55 GMT
Subject: RFR: 8335860:
 compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard
 AVX/SSE settings
In-Reply-To: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
Message-ID: <QLczQ2tsKGcHLl1_3_4X7o2OC2CSNpp9gEdcYC9OD0c=.2e7d7e07-ac30-4be4-803c-c8ac49789eac@github.com>

On Fri, 12 Jul 2024 18:26:26 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> Enabling test with explicit feature checks for x86 target.
> Removing from test/hotspot/jtreg/ProblemList.txt
> 
> Best Regards,
> Jatin

src/hotspot/cpu/x86/vm_version_x86.hpp line 838:

> 836: 
> 837:   // For AVX CPUs only since it needs VEX encoding which is missing on SSE targets,
> 838:   // thus f16c support is disabled if UseAVX == 0.

This comment is somewhat or vey confusing. The code for supports_float16() by itself is very clear. I am wondering why do we need this explanation in the comment at all? Let us remove it altogether.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20160#discussion_r1679872428

From dholmes at openjdk.org  Tue Jul 16 21:57:51 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 16 Jul 2024 21:57:51 GMT
Subject: RFR: 8336275: Move common Method and Constructor fields to
 Executable
In-Reply-To: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com>
References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com>
Message-ID: <8cVBN_0pZKGqYGrjKoXi3Rda7wzJJHFU3uui8PSdUFI=.1d65c77a-db23-4278-9ab3-16608b19f0aa@github.com>

On Tue, 16 Jul 2024 03:45:36 GMT, Chen Liang <liach at openjdk.org> wrote:

> Move fields common to Method and Field to executable

s/Field/Constructor

I was a bit confused about executable fields for a moment. :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20188#issuecomment-2231889229

From liach at openjdk.org  Tue Jul 16 22:16:51 2024
From: liach at openjdk.org (Chen Liang)
Date: Tue, 16 Jul 2024 22:16:51 GMT
Subject: RFR: 8336468: Reflection and MethodHandles should use more precise
 initializer checks [v2]
In-Reply-To: <gkGtD5v4Ll6St0x5ABZpmL3wMVi1SreY-vNtIuaN-90=.a9282c6b-b8c1-46ea-8a89-ff26badcd949@github.com>
References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com>
 <gkGtD5v4Ll6St0x5ABZpmL3wMVi1SreY-vNtIuaN-90=.a9282c6b-b8c1-46ea-8a89-ff26badcd949@github.com>
Message-ID: <KeewYhuYEmnBNOOvt6jYqMyaCyjKa1WQao734RKOXwU=.3962ee19-ff17-45bc-9ae3-67a368165476@github.com>

On Tue, 16 Jul 2024 17:12:08 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103).
>>  
>> There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior.
>> 
>> I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`.
>> 
>> @mlchung, you probably want to look at this more closely.
>> 
>> Additional testing:
>>  - [x] Linux x86_64 server fastdebug, `tier1`
>>  - [x] Linux x86_64 server fastdebug, `all`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8336468-reflection-init-checks
>  - Remove unnecessary handle-izing
>  - Fix
>  - Fix

src/hotspot/share/runtime/reflection.cpp line 769:

> 767: 
> 768: oop Reflection::new_method(const methodHandle& method, bool for_constant_pool_access, TRAPS) {
> 769:   // Allow sun.reflect.ConstantPool to refer to <clinit> methods as java.lang.reflect.Methods.

Not quite related, but it's jdk.internal.reflect.ConstantPool now :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20192#discussion_r1680121069

From dholmes at openjdk.org  Tue Jul 16 22:27:52 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 16 Jul 2024 22:27:52 GMT
Subject: RFR: 8336275: Move common Method and Constructor fields to
 Executable
In-Reply-To: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com>
References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com>
Message-ID: <PQVwUikoHRvHZMA_KJhf05g7YNQXvGghDv5F5KbddPo=.10defabe-2651-4de4-90a1-df071b0a9b7b@github.com>

On Tue, 16 Jul 2024 03:45:36 GMT, Chen Liang <liach at openjdk.org> wrote:

> Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable.

Hotspot changes look good. Core-libs do too but I will leave that for libs folk to approve

src/java.base/share/classes/java/lang/reflect/Executable.java line 54:

> 52: public abstract sealed class Executable extends AccessibleObject
> 53:     implements Member, GenericDeclaration permits Constructor, Method {
> 54:     // fields injected by hotspot

If a field is listed here then it is NOT injected by hotspot.

src/java.base/share/classes/java/lang/reflect/Method.java line 73:

> 71:  */
> 72: public final class Method extends Executable {
> 73:     // fields injected by hotspot

Again not injected

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20188#pullrequestreview-2181384669
PR Review Comment: https://git.openjdk.org/jdk/pull/20188#discussion_r1680112370
PR Review Comment: https://git.openjdk.org/jdk/pull/20188#discussion_r1680113161

From liach at openjdk.org  Tue Jul 16 22:43:51 2024
From: liach at openjdk.org (Chen Liang)
Date: Tue, 16 Jul 2024 22:43:51 GMT
Subject: RFR: 8336275: Move common Method and Constructor fields to
 Executable
In-Reply-To: <PQVwUikoHRvHZMA_KJhf05g7YNQXvGghDv5F5KbddPo=.10defabe-2651-4de4-90a1-df071b0a9b7b@github.com>
References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com>
 <PQVwUikoHRvHZMA_KJhf05g7YNQXvGghDv5F5KbddPo=.10defabe-2651-4de4-90a1-df071b0a9b7b@github.com>
Message-ID: <rgKZc3deTibJ4l1BZCk5c2SzfguSjYHbhKJSgO6fEDk=.224cc0e2-7d7f-4fdc-8e3c-ef8277a6aa14@github.com>

On Tue, 16 Jul 2024 22:00:49 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable.
>
> src/java.base/share/classes/java/lang/reflect/Executable.java line 54:
> 
>> 52: public abstract sealed class Executable extends AccessibleObject
>> 53:     implements Member, GenericDeclaration permits Constructor, Method {
>> 54:     // fields injected by hotspot
> 
> If a field is listed here then it is NOT injected by hotspot.

What would be the terminology for a final field that's set by hotspot, against the regular java constrcutor rules?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20188#discussion_r1680139439

From cjplummer at openjdk.org  Wed Jul 17 00:04:01 2024
From: cjplummer at openjdk.org (Chris Plummer)
Date: Wed, 17 Jul 2024 00:04:01 GMT
Subject: RFR: 8336587: failure_handler lldb command times out on macosx-aarch64
 core file
Message-ID: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com>

I was looking at the failure_handler output for the lldb command on a macosx-aarch64 core file (it is trying to use lldb to get a back trace of all threads), and noticed it timed out:


----------------------------------------
[2024-07-15 05:15:47] [<snip>/usr/bin/lldb, --core, <snip>/core.92643, <snip>/bin/java, -o, thread backtrace all, -o, quit] timeout=20000 in <snip>
----------------------------------------
(lldb) target create "<snip>/bin/java" --core "<snip>/core.92643"
WARNING: tool timed out: killed process after 20000 ms
----------------------------------------
[2024-07-15 05:16:07] exit code: -2 time: 20163 ms
----------------------------------------


20 seconds is the failure_handler default timeout for all commands. Core files on macosx-aarch64 tend to be very large. This one was over 13gb. On my MBPro it took 30 seconds. I bumped up the timeout to 60 seconds and reproduce the same crash in mach5 (more than once), and it usually took about 55 seconds for the lldb command, but it did succeed with the longer timeout. I think we should change the timeout to event more than 60 seconds just to make sure we won't see timeouts. 120 seconds is probably a good amount.

-------------

Commit messages:
 - Fix comment.
 - Give lldb a much longer timeout.

Changes: https://git.openjdk.org/jdk/pull/20206/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20206&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8336587
  Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20206.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20206/head:pull/20206

PR: https://git.openjdk.org/jdk/pull/20206

From dlong at openjdk.org  Wed Jul 17 00:49:00 2024
From: dlong at openjdk.org (Dean Long)
Date: Wed, 17 Jul 2024 00:49:00 GMT
Subject: RFR: 8336587: failure_handler lldb command times out on
 macosx-aarch64 core file
In-Reply-To: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com>
References: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com>
Message-ID: <tu_swVP6kVSj4xAVpfM1-RGBVU0wBxh1_isozGXv12E=.48299cc0-93cf-489b-9b36-a3f91dd08f26@github.com>

On Tue, 16 Jul 2024 23:59:09 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

> I was looking at the failure_handler output for the lldb command on a macosx-aarch64 core file (it is trying to use lldb to get a back trace of all threads), and noticed it timed out:
> 
> 
> ----------------------------------------
> [2024-07-15 05:15:47] [<snip>/usr/bin/lldb, --core, <snip>/core.92643, <snip>/bin/java, -o, thread backtrace all, -o, quit] timeout=20000 in <snip>
> ----------------------------------------
> (lldb) target create "<snip>/bin/java" --core "<snip>/core.92643"
> WARNING: tool timed out: killed process after 20000 ms
> ----------------------------------------
> [2024-07-15 05:16:07] exit code: -2 time: 20163 ms
> ----------------------------------------
> 
> 
> 20 seconds is the failure_handler default timeout for all commands. Core files on macosx-aarch64 tend to be very large. This one was over 13gb. On my MBPro it took 30 seconds. I bumped up the timeout to 60 seconds and reproduce the same crash in mach5 (more than once), and it usually took about 55 seconds for the lldb command, but it did succeed with the longer timeout. I think we should change the timeout to event more than 60 seconds just to make sure we won't see timeouts. 120 seconds is probably a good amount.

Marked as reviewed by dlong (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20206#pullrequestreview-2181547441

From liach at openjdk.org  Wed Jul 17 03:03:23 2024
From: liach at openjdk.org (Chen Liang)
Date: Wed, 17 Jul 2024 03:03:23 GMT
Subject: RFR: 8336275: Move common Method and Constructor fields to
 Executable [v2]
In-Reply-To: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com>
References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com>
Message-ID: <Z04ux2yyYVR5W1y8prXM4lYPhycn-DE7aM7elm92C3k=.e9eb01d1-cbc3-4da8-be66-03ad947c20ff@github.com>

> Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable.

Chen Liang has updated the pull request incrementally with one additional commit since the last revision:

  Redundant transient; Update the comments to be more accurate

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20188/files
  - new: https://git.openjdk.org/jdk/pull/20188/files/dbe59a5f..184e8a4e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20188&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20188&range=00-01

  Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/20188.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20188/head:pull/20188

PR: https://git.openjdk.org/jdk/pull/20188

From liach at openjdk.org  Wed Jul 17 03:03:23 2024
From: liach at openjdk.org (Chen Liang)
Date: Wed, 17 Jul 2024 03:03:23 GMT
Subject: RFR: 8336275: Move common Method and Constructor fields to
 Executable [v2]
In-Reply-To: <rgKZc3deTibJ4l1BZCk5c2SzfguSjYHbhKJSgO6fEDk=.224cc0e2-7d7f-4fdc-8e3c-ef8277a6aa14@github.com>
References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com>
 <PQVwUikoHRvHZMA_KJhf05g7YNQXvGghDv5F5KbddPo=.10defabe-2651-4de4-90a1-df071b0a9b7b@github.com>
 <rgKZc3deTibJ4l1BZCk5c2SzfguSjYHbhKJSgO6fEDk=.224cc0e2-7d7f-4fdc-8e3c-ef8277a6aa14@github.com>
Message-ID: <khPrRqxhRoqdOvCPQIQosNrarHwJCdVujMaghVWNB7U=.e94ade82-3de8-45dc-aa6e-bb6ca18a0602@github.com>

On Tue, 16 Jul 2024 22:41:40 GMT, Chen Liang <liach at openjdk.org> wrote:

>> src/java.base/share/classes/java/lang/reflect/Executable.java line 54:
>> 
>>> 52: public abstract sealed class Executable extends AccessibleObject
>>> 53:     implements Member, GenericDeclaration permits Constructor, Method {
>>> 54:     // fields injected by hotspot
>> 
>> If a field is listed here then it is NOT injected by hotspot.
>
> What would be the terminology for a final field that's set by hotspot, against the regular java constrcutor rules?

I have chosen the wording "all final fields are used by the VM"

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20188#discussion_r1680341545

From kvn at openjdk.org  Wed Jul 17 03:42:07 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Wed, 17 Jul 2024 03:42:07 GMT
Subject: RFR: 8335921: Fix HotSpot VM build without JVMTI
Message-ID: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com>

Citing David Holmes from bug report:
"We provided the ability to leave out certain VM services (JVMTI, GC's other than serial, ...) as part of the design of the MinimalVM to support Java SE Embedded, along with the Compact Profiles of JDK 8. This manifested in the source code as a set of INCLUDE_XXX ifdef guards. The build system later exposed these as individual --with-jvm-features=xxx,yyy support. However, it was never intended (and certainly not tested) that you could mix-and-match arbitrary subsets of these VM features at will. Consequently if you start trying to do this you will find things that need fixing."

I added `INCLUDE_JVMTI` guards in two places where it was missed: JVMCI and JFR.  Affected code was added recently, in the past year. After that I was able to build VM on all supported platforms.

Note: building VM without JVMTI is not officially supported feature. We are not testing it and such failures (missing guards) are not unexpected.

A lot of tests failed with VM without JVMTI. All are expected failures. I listed failed tests in bug report.
I fixed (added requires `vm.jvmti`) only one which was part of [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967) changes which introduced JFR code without `INCLUDE_JVMTI` guards.

I ran 2 rounds of testing:

First, only **tier1** with VM built without JVMTI to see if builds passed and which tests affected. I wrote comment in bug report which tests failed (all expected to fail without JVMTI).

Second round of testing with JVMTI in VM: tier1-4

-------------

Commit messages:
 - 8335921: Fix HotSpot VM build without JVMTI

Changes: https://git.openjdk.org/jdk/pull/20209/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20209&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335921
  Stats: 20 lines in 8 files changed: 7 ins; 0 del; 13 mod
  Patch: https://git.openjdk.org/jdk/pull/20209.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20209/head:pull/20209

PR: https://git.openjdk.org/jdk/pull/20209

From dholmes at openjdk.org  Wed Jul 17 04:59:51 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 04:59:51 GMT
Subject: RFR: 8335921: Fix HotSpot VM build without JVMTI
In-Reply-To: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com>
References: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com>
Message-ID: <9pz4Ru-DFK42pLhG6ny7_-bkHzTvDiBq5NfHk_0ron0=.3b8e2d59-7dc2-461c-be8a-00ccc00fe1f8@github.com>

On Wed, 17 Jul 2024 03:37:36 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

> Citing David Holmes from bug report:
> "We provided the ability to leave out certain VM services (JVMTI, GC's other than serial, ...) as part of the design of the MinimalVM to support Java SE Embedded, along with the Compact Profiles of JDK 8. This manifested in the source code as a set of INCLUDE_XXX ifdef guards. The build system later exposed these as individual --with-jvm-features=xxx,yyy support. However, it was never intended (and certainly not tested) that you could mix-and-match arbitrary subsets of these VM features at will. Consequently if you start trying to do this you will find things that need fixing."
> 
> I added `INCLUDE_JVMTI` guards in two places where it was missed: JVMCI and JFR.  Affected code was added recently, in the past year. After that I was able to build VM on all supported platforms.
> 
> Note: building VM without JVMTI is not officially supported feature. We are not testing it and such failures (missing guards) are not unexpected.
> 
> A lot of tests failed with VM without JVMTI. All are expected failures. I listed failed tests in bug report.
> I fixed (added requires `vm.jvmti`) only one which was part of [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967) changes which introduced JFR code without `INCLUDE_JVMTI` guards.
> 
> I ran 2 rounds of testing:
> 
> First, only **tier1** with VM built without JVMTI to see if builds passed and which tests affected. I wrote comment in bug report which tests failed (all expected to fail without JVMTI).
> 
> Second round of testing with JVMTI in VM: tier1-4

This seems reasonable to me.

It highlights the problem we have with optional components in that you either have to work things so that semantically we have a do-nothing implementation of that component, or else you have to put the guards around every piece of code that would normally interact with it.

Thanks.

src/hotspot/share/jfr/instrumentation/jfrJvmtiAgent.hpp line 35:

> 33:   JfrJvmtiAgent();
> 34:   ~JfrJvmtiAgent();
> 35:   static bool create() NOT_JVMTI_RETURN_(true);

It initially seemed odd to return `true` here, but looking through the JFR code that interacts with the Agent it seems the right way to view this is that without JVMTI we have a no-op agent.

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20209#pullrequestreview-2181875380
PR Review Comment: https://git.openjdk.org/jdk/pull/20209#discussion_r1680403451

From thartmann at openjdk.org  Wed Jul 17 05:08:51 2024
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Wed, 17 Jul 2024 05:08:51 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed
In-Reply-To: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
Message-ID: <iEpt5qvDzA23twaT_Yib2vYd1Bc2y7Zl_dfUnbpNORE=.58b42c83-64bb-42a9-9c6f-cd0e60780b89@github.com>

On Fri, 12 Jul 2024 02:17:46 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log.
> 
> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN.
> 
> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless.
> 
> If a string's length exceeds `max_length` then we print it as follows:
> 
> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged)
> 
> For example if we print "ABCDE" with a max_length of 4 then the output is literally:
> 
> "AB ... DE" (abridged)
> 
> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive).
> 
> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`.
> 
> Testing:
>  - new test added for validation purposes
>  - tiers 1 - 3 as sanity testing
> 
> Thanks

Looks good to me otherwise. Thanks for fixing this!

src/hotspot/share/runtime/globals.hpp line 1310:

> 1308:           "maximum number of characters to print for a java.lang.String "   \
> 1309:           "in the VM. If exceeded, an abridged version of the string is "   \
> 1310:           "print with the middle of the string is elided.")                 \

Suggestion:

          "printed with the middle of the string is elided.")               \

-------------

Marked as reviewed by thartmann (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20150#pullrequestreview-2181885422
PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1680410176

From thartmann at openjdk.org  Wed Jul 17 05:08:51 2024
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Wed, 17 Jul 2024 05:08:51 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed
In-Reply-To: <iEpt5qvDzA23twaT_Yib2vYd1Bc2y7Zl_dfUnbpNORE=.58b42c83-64bb-42a9-9c6f-cd0e60780b89@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
 <iEpt5qvDzA23twaT_Yib2vYd1Bc2y7Zl_dfUnbpNORE=.58b42c83-64bb-42a9-9c6f-cd0e60780b89@github.com>
Message-ID: <gABpAo4f_ZUdhuir197iOpkbO_uk-T_BjZzUyfw2f-0=.1ac5353a-d43e-432e-9a46-3583c24b8f11@github.com>

On Wed, 17 Jul 2024 05:03:28 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

>> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log.
>> 
>> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN.
>> 
>> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless.
>> 
>> If a string's length exceeds `max_length` then we print it as follows:
>> 
>> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged)
>> 
>> For example if we print "ABCDE" with a max_length of 4 then the output is literally:
>> 
>> "AB ... DE" (abridged)
>> 
>> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive).
>> 
>> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`.
>> 
>> Testing:
>>  - new test added for validation purposes
>>  - tiers 1 - 3 as sanity testing
>> 
>> Thanks
>
> src/hotspot/share/runtime/globals.hpp line 1310:
> 
>> 1308:           "maximum number of characters to print for a java.lang.String "   \
>> 1309:           "in the VM. If exceeded, an abridged version of the string is "   \
>> 1310:           "print with the middle of the string is elided.")                 \
> 
> Suggestion:
> 
>           "printed with the middle of the string is elided.")               \

I think it should also be "... of the string elided" (without the "is").

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1680410977

From dholmes at openjdk.org  Wed Jul 17 05:18:51 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 05:18:51 GMT
Subject: RFR: 8336275: Move common Method and Constructor fields to
 Executable [v2]
In-Reply-To: <Z04ux2yyYVR5W1y8prXM4lYPhycn-DE7aM7elm92C3k=.e9eb01d1-cbc3-4da8-be66-03ad947c20ff@github.com>
References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com>
 <Z04ux2yyYVR5W1y8prXM4lYPhycn-DE7aM7elm92C3k=.e9eb01d1-cbc3-4da8-be66-03ad947c20ff@github.com>
Message-ID: <s7x0E7F-pTzuxRKXrxsSwAVzE4I5-IxZWhrCWmSM_UQ=.60f9529b-8a94-453f-96ba-5dd32beab06c@github.com>

On Wed, 17 Jul 2024 03:03:23 GMT, Chen Liang <liach at openjdk.org> wrote:

>> Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable.
>
> Chen Liang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Redundant transient; Update the comments to be more accurate

Marked as reviewed by dholmes (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20188#pullrequestreview-2181897055

From dholmes at openjdk.org  Wed Jul 17 05:18:51 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 05:18:51 GMT
Subject: RFR: 8336275: Move common Method and Constructor fields to
 Executable [v2]
In-Reply-To: <khPrRqxhRoqdOvCPQIQosNrarHwJCdVujMaghVWNB7U=.e94ade82-3de8-45dc-aa6e-bb6ca18a0602@github.com>
References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com>
 <PQVwUikoHRvHZMA_KJhf05g7YNQXvGghDv5F5KbddPo=.10defabe-2651-4de4-90a1-df071b0a9b7b@github.com>
 <rgKZc3deTibJ4l1BZCk5c2SzfguSjYHbhKJSgO6fEDk=.224cc0e2-7d7f-4fdc-8e3c-ef8277a6aa14@github.com>
 <khPrRqxhRoqdOvCPQIQosNrarHwJCdVujMaghVWNB7U=.e94ade82-3de8-45dc-aa6e-bb6ca18a0602@github.com>
Message-ID: <zFD7Vl2l5pukfvrtJM0ZrjWBbM9kdORK-S90hig6tc4=.957fdb58-12f2-4750-8634-8572087a7226@github.com>

On Wed, 17 Jul 2024 02:57:51 GMT, Chen Liang <liach at openjdk.org> wrote:

>> What would be the terminology for a final field that's set by hotspot, against the regular java constrcutor rules?
>
> I have chosen the wording "all final fields are used by the VM"

I don't know of any specific terminology - we typically just add a comment saying the field is set and/or read by the VM.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20188#discussion_r1680417581

From dholmes at openjdk.org  Wed Jul 17 05:25:52 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 05:25:52 GMT
Subject: RFR: 8336587: failure_handler lldb command times out on
 macosx-aarch64 core file
In-Reply-To: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com>
References: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com>
Message-ID: <o7Pft4WpyCuNlKDRObyzzyRiAxrNHRnQJO9CLyxsx84=.e24057cf-8b5c-4963-8e7e-365245099bb4@github.com>

On Tue, 16 Jul 2024 23:59:09 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

> I was looking at the failure_handler output for the lldb command on a macosx-aarch64 core file (it is trying to use lldb to get a back trace of all threads), and noticed it timed out:
> 
> 
> ----------------------------------------
> [2024-07-15 05:15:47] [<snip>/usr/bin/lldb, --core, <snip>/core.92643, <snip>/bin/java, -o, thread backtrace all, -o, quit] timeout=20000 in <snip>
> ----------------------------------------
> (lldb) target create "<snip>/bin/java" --core "<snip>/core.92643"
> WARNING: tool timed out: killed process after 20000 ms
> ----------------------------------------
> [2024-07-15 05:16:07] exit code: -2 time: 20163 ms
> ----------------------------------------
> 
> 
> 20 seconds is the failure_handler default timeout for all commands. Core files on macosx-aarch64 tend to be very large. This one was over 13gb. On my MBPro it took 30 seconds. I bumped up the timeout to 60 seconds and reproduce the same crash in mach5 (more than once), and it usually took about 55 seconds for the lldb command, but it did succeed with the longer timeout. I think we should change the timeout to even more than 60 seconds just to make sure we won't see timeouts. 120 seconds is probably a good amount.

Marked as reviewed by dholmes (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20206#pullrequestreview-2181903818

From stuefe at openjdk.org  Wed Jul 17 05:27:53 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 17 Jul 2024 05:27:53 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed
In-Reply-To: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
Message-ID: <0C2xrw7X8gn7dl7LWNZu9lh5XJjvOSNbA0PRqa6ydoM=.29d1d6ee-242f-4ab5-abaa-d2113d030f82@github.com>

On Fri, 12 Jul 2024 02:17:46 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log.
> 
> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN.
> 
> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless.
> 
> If a string's length exceeds `max_length` then we print it as follows:
> 
> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged)
> 
> For example if we print "ABCDE" with a max_length of 4 then the output is literally:
> 
> "AB ... DE" (abridged)
> 
> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive).
> 
> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`.
> 
> Testing:
>  - new test added for validation purposes
>  - tiers 1 - 3 as sanity testing
> 
> Thanks

src/hotspot/share/classfile/javaClasses.cpp line 785:

> 783:       index = length - (max_length / 2);
> 784:       abridge = false; // only do this once
> 785:     }

Instead of the trailing "abridged", in similar cases I printed out the number of omitted characters. E.g.

"Very long long long ... (53 characters omitted) ... long long string"

Makes it obvious how much has been cut, and no danger of confusing the ellipse with naturally occurring dots.

Additionally, I would only do this if length > max_length + X, with X being at least as long as the middle part (3 characters if you only print an ellipse). You end up with printed strings that may be slightly longer than maxlen, but OTOH the output is clearer. Otherwise you may indicate omission where none happened (if length == max_length)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1680421675

From stuefe at openjdk.org  Wed Jul 17 05:29:54 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 17 Jul 2024 05:29:54 GMT
Subject: RFR: 8300732: Whitebox functions for Metaspace test should use
 byte size [v3]
In-Reply-To: <D7Jfl3uzXBUJgWKE_iu88rhdWpkge5IK4SPs3lt4xUM=.b112d42f-c718-4922-ad69-da4714cf5ecb@github.com>
References: <eEn9XGR498GfiVBvO1hTvtfk6Fv1zfTxrAJ-_EP62AQ=.d2fa0e77-8af9-49e5-91f9-50cc8a29d0c6@github.com>
 <D7Jfl3uzXBUJgWKE_iu88rhdWpkge5IK4SPs3lt4xUM=.b112d42f-c718-4922-ad69-da4714cf5ecb@github.com>
Message-ID: <BeeNYBIpm-clapTYjHIWxZVVPvrb7QG0K-Gv0Tl8yaQ=.7f5f8fb6-0b10-48a6-98bb-9dd0b8f6e214@github.com>

On Tue, 16 Jul 2024 14:49:27 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. 
>> 
>> Testing: 
>> - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. 
>> 
>> Thanks, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Feedback - updating Unit.valueOf to direct access

Still good.

-------------

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20039#pullrequestreview-2181909030

From dholmes at openjdk.org  Wed Jul 17 05:32:26 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 05:32:26 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v2]
In-Reply-To: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
Message-ID: <SBNE7wMZ0WMp1bQzyK9EASI6EOXgtVPSJw1uWCqRFko=.947c9a5d-ec2e-450a-a7ca-6272470827ae@github.com>

> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log.
> 
> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN.
> 
> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless.
> 
> If a string's length exceeds `max_length` then we print it as follows:
> 
> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged)
> 
> For example if we print "ABCDE" with a max_length of 4 then the output is literally:
> 
> "AB ... DE" (abridged)
> 
> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive).
> 
> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`.
> 
> Testing:
>  - new test added for validation purposes
>  - tiers 1 - 3 as sanity testing
> 
> Thanks

David Holmes has updated the pull request incrementally with one additional commit since the last revision:

  Fixed grammar

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20150/files
  - new: https://git.openjdk.org/jdk/pull/20150/files/7b155abc..39256bd3

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20150&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20150&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20150.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20150/head:pull/20150

PR: https://git.openjdk.org/jdk/pull/20150

From dholmes at openjdk.org  Wed Jul 17 05:32:26 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 05:32:26 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v2]
In-Reply-To: <iEpt5qvDzA23twaT_Yib2vYd1Bc2y7Zl_dfUnbpNORE=.58b42c83-64bb-42a9-9c6f-cd0e60780b89@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
 <iEpt5qvDzA23twaT_Yib2vYd1Bc2y7Zl_dfUnbpNORE=.58b42c83-64bb-42a9-9c6f-cd0e60780b89@github.com>
Message-ID: <wTjWL_BESpfR0JfYBkUXZCVp2g5KAS0CiPQmS4h_BRw=.6ad07592-cc78-446b-9b57-dff79e4900a9@github.com>

On Wed, 17 Jul 2024 05:05:56 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

>> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fixed grammar
>
> Looks good to me otherwise. Thanks for fixing this!

Thanks for the review @TobiHartmann

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20150#issuecomment-2232449368

From dholmes at openjdk.org  Wed Jul 17 05:32:26 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 05:32:26 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v2]
In-Reply-To: <gABpAo4f_ZUdhuir197iOpkbO_uk-T_BjZzUyfw2f-0=.1ac5353a-d43e-432e-9a46-3583c24b8f11@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
 <iEpt5qvDzA23twaT_Yib2vYd1Bc2y7Zl_dfUnbpNORE=.58b42c83-64bb-42a9-9c6f-cd0e60780b89@github.com>
 <gABpAo4f_ZUdhuir197iOpkbO_uk-T_BjZzUyfw2f-0=.1ac5353a-d43e-432e-9a46-3583c24b8f11@github.com>
Message-ID: <JlguV-ZjQ8cUjVfzfqKI7f84IhxgHHBoJZE2CG-cRVA=.bbd81084-0818-4589-88e1-96b0b6feb368@github.com>

On Wed, 17 Jul 2024 05:04:45 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

>> src/hotspot/share/runtime/globals.hpp line 1310:
>> 
>>> 1308:           "maximum number of characters to print for a java.lang.String "   \
>>> 1309:           "in the VM. If exceeded, an abridged version of the string is "   \
>>> 1310:           "print with the middle of the string is elided.")                 \
>> 
>> Suggestion:
>> 
>>           "printed with the middle of the string is elided.")               \
>
> I think it should also be "... of the string elided" (without the "is").

Fixed. Don't know how I missed that.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1680426244

From jbhateja at openjdk.org  Wed Jul 17 05:37:03 2024
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Wed, 17 Jul 2024 05:37:03 GMT
Subject: RFR: 8335860:
 compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard
 AVX/SSE settings [v2]
In-Reply-To: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
Message-ID: <w77DS-gliOwhxLiidSa1fqG0-aq-7dax2Lcndxk-uLs=.98d75a7d-e0f7-4817-9e5d-1dec22f64c3b@github.com>

> Enabling test with explicit feature checks for x86 target.
> Removing from test/hotspot/jtreg/ProblemList.txt
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  Review suggestions incorporated

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20160/files
  - new: https://git.openjdk.org/jdk/pull/20160/files/16ebbbaa..fa68e6ac

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20160&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20160&range=00-01

  Stats: 3 lines in 2 files changed: 0 ins; 3 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20160.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20160/head:pull/20160

PR: https://git.openjdk.org/jdk/pull/20160

From jbhateja at openjdk.org  Wed Jul 17 05:37:04 2024
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Wed, 17 Jul 2024 05:37:04 GMT
Subject: RFR: 8335860:
 compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard
 AVX/SSE settings
In-Reply-To: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
Message-ID: <Xzk8WdTohUoGqud1EBY6YeTn-MRheHLXJxT3xhX88a4=.34b86427-e557-46ad-94a4-2966184fe33f@github.com>

On Fri, 12 Jul 2024 18:26:26 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> Enabling test with explicit feature checks for x86 target.
> Removing from test/hotspot/jtreg/ProblemList.txt
> 
> Best Regards,
> Jatin

> @jatin-bhateja There was also a suggestion from @eme64 as part of #20062 to remove requires vm.compiler2.enabled from the test.

Test only validates specific C2 IR patten, framework makes sure to compile @Test annotated methods with top tier (c2 : default) compiler using Whitebox mechanism. So @require flags looks redundant here.  Agree.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20160#issuecomment-2232454739

From jbhateja at openjdk.org  Wed Jul 17 05:37:04 2024
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Wed, 17 Jul 2024 05:37:04 GMT
Subject: RFR: 8335860:
 compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard
 AVX/SSE settings [v2]
In-Reply-To: <QLczQ2tsKGcHLl1_3_4X7o2OC2CSNpp9gEdcYC9OD0c=.2e7d7e07-ac30-4be4-803c-c8ac49789eac@github.com>
References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
 <QLczQ2tsKGcHLl1_3_4X7o2OC2CSNpp9gEdcYC9OD0c=.2e7d7e07-ac30-4be4-803c-c8ac49789eac@github.com>
Message-ID: <cHOK-juVOMTDpwPxfKcjn7TCBI8k0hBKyTDQgv4PAb8=.2fd4d184-fb95-4b63-9a51-e8c8adcd3a0f@github.com>

On Tue, 16 Jul 2024 18:30:35 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Review suggestions incorporated
>
> src/hotspot/cpu/x86/vm_version_x86.hpp line 838:
> 
>> 836: 
>> 837:   // For AVX CPUs only since it needs VEX encoding which is missing on SSE targets,
>> 838:   // thus f16c support is disabled if UseAVX == 0.
> 
> This comment is somewhat or vey confusing. The code for supports_float16() by itself is very clear. I am wondering why do we need this explanation in the comment at all? Let us remove it altogether.

Sure, FTR my comment was in relation to following check
https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L1056

 FP16-FP32 conversions are VEX encoded instructions and do not have SSE flavor. Thus, only works with UseAVX > 0.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20160#discussion_r1680430908

From jpai at openjdk.org  Wed Jul 17 05:37:51 2024
From: jpai at openjdk.org (Jaikiran Pai)
Date: Wed, 17 Jul 2024 05:37:51 GMT
Subject: [jdk23] RFR: Merge d876cacf73ad698eda6668ccebbdfbe7690a0b06
Message-ID: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com>

This brings the CPU24_07 changes into jdk23 branch.

-------------

Commit messages:
 - 8323390: Enhance mask blit functionality
 - 8320097: Improve Image transformations
 - 8327413: Enhance compilation efficiency
 - 8324559: Improve 2D image handling
 - 8325600: Better symbol storage
 - 8319859: Better symbol storage
 - 8314794: Improve UTF8 String supports
 - 8320548: Improved loop handling
 - 8323231: Improve array management

The merge commit only contains trivial merges, so no merge-specific webrevs have been generated.

Changes: https://git.openjdk.org/jdk/pull/20212/files
  Stats: 162 lines in 14 files changed: 98 ins; 4 del; 60 mod
  Patch: https://git.openjdk.org/jdk/pull/20212.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20212/head:pull/20212

PR: https://git.openjdk.org/jdk/pull/20212

From jpai at openjdk.org  Wed Jul 17 05:37:53 2024
From: jpai at openjdk.org (Jaikiran Pai)
Date: Wed, 17 Jul 2024 05:37:53 GMT
Subject: RFR: Merge 13341ca70276c891add2e4652b6e1e8020610988
Message-ID: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com>

This brings in CPU24_07 changes into master branch

-------------

Commit messages:
 - 8323390: Enhance mask blit functionality
 - 8320097: Improve Image transformations
 - 8327413: Enhance compilation efficiency
 - 8324559: Improve 2D image handling
 - 8325600: Better symbol storage
 - 8319859: Better symbol storage
 - 8314794: Improve UTF8 String supports
 - 8320548: Improved loop handling
 - 8323231: Improve array management

The merge commit only contains trivial merges, so no merge-specific webrevs have been generated.

Changes: https://git.openjdk.org/jdk/pull/20211/files
  Stats: 162 lines in 14 files changed: 98 ins; 4 del; 60 mod
  Patch: https://git.openjdk.org/jdk/pull/20211.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20211/head:pull/20211

PR: https://git.openjdk.org/jdk/pull/20211

From dholmes at openjdk.org  Wed Jul 17 05:40:53 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 05:40:53 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v2]
In-Reply-To: <0C2xrw7X8gn7dl7LWNZu9lh5XJjvOSNbA0PRqa6ydoM=.29d1d6ee-242f-4ab5-abaa-d2113d030f82@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
 <0C2xrw7X8gn7dl7LWNZu9lh5XJjvOSNbA0PRqa6ydoM=.29d1d6ee-242f-4ab5-abaa-d2113d030f82@github.com>
Message-ID: <j1xFGdRG38i_hvtMSBHJeHVlC4-HTghiPnz1aTEKY8Q=.cec14e3f-5274-420a-9683-1a90ce86aefc@github.com>

On Wed, 17 Jul 2024 05:21:57 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fixed grammar
>
> src/hotspot/share/classfile/javaClasses.cpp line 785:
> 
>> 783:       index = length - (max_length / 2);
>> 784:       abridge = false; // only do this once
>> 785:     }
> 
> Instead of the trailing "abridged", in similar cases I printed out the number of omitted characters. E.g.
> 
> "Very long long long ... (53 characters omitted) ... long long string"
> 
> Makes it obvious how much has been cut, and no danger of confusing the ellipse with naturally occurring dots.
> 
> Additionally, I would only do this if length > max_length + X, with X being at least as long as the middle part (3 characters if you only print an ellipse). You end up with printed strings that may be slightly longer than maxlen, but OTOH the output is clearer. Otherwise you may indicate omission where none happened (if length == max_length)

@tstuefe  - thanks for looking at this Thomas. I don't get your second point. First I only abridge when length > max_length. Second adding in the X fudge factor just means max_length should have been set differently.

To your first point, there will be similar changes to this coming so it would be good to standardise on how to report them. I like the idea you propose, but I couldn't find the code you mention. ??

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1680433575

From kvn at openjdk.org  Wed Jul 17 05:41:52 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Wed, 17 Jul 2024 05:41:52 GMT
Subject: RFR: 8335921: Fix HotSpot VM build without JVMTI
In-Reply-To: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com>
References: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com>
Message-ID: <7TC1wAE-NN6af0pg5dEJxInkJxhIU0mq0RJ8NDK_c3U=.84572ae7-b0d0-44e4-89dc-df7bd73a58ea@github.com>

On Wed, 17 Jul 2024 03:37:36 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

> Citing David Holmes from bug report:
> "We provided the ability to leave out certain VM services (JVMTI, GC's other than serial, ...) as part of the design of the MinimalVM to support Java SE Embedded, along with the Compact Profiles of JDK 8. This manifested in the source code as a set of INCLUDE_XXX ifdef guards. The build system later exposed these as individual --with-jvm-features=xxx,yyy support. However, it was never intended (and certainly not tested) that you could mix-and-match arbitrary subsets of these VM features at will. Consequently if you start trying to do this you will find things that need fixing."
> 
> I added `INCLUDE_JVMTI` guards in two places where it was missed: JVMCI and JFR.  Affected code was added recently, in the past year. After that I was able to build VM on all supported platforms.
> 
> Note: building VM without JVMTI is not officially supported feature. We are not testing it and such failures (missing guards) are not unexpected.
> 
> A lot of tests failed with VM without JVMTI. All are expected failures. I listed failed tests in bug report.
> I fixed (added requires `vm.jvmti`) only one which was part of [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967) changes which introduced JFR code without `INCLUDE_JVMTI` guards.
> 
> I ran 2 rounds of testing:
> 
> First, only **tier1** with VM built without JVMTI to see if builds passed and which tests affected. I wrote comment in bug report which tests failed (all expected to fail without JVMTI).
> 
> Second round of testing with JVMTI in VM: tier1-4

Thank you, David, for review.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20209#issuecomment-2232459481

From kvn at openjdk.org  Wed Jul 17 05:41:53 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Wed, 17 Jul 2024 05:41:53 GMT
Subject: RFR: 8335921: Fix HotSpot VM build without JVMTI
In-Reply-To: <9pz4Ru-DFK42pLhG6ny7_-bkHzTvDiBq5NfHk_0ron0=.3b8e2d59-7dc2-461c-be8a-00ccc00fe1f8@github.com>
References: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com>
 <9pz4Ru-DFK42pLhG6ny7_-bkHzTvDiBq5NfHk_0ron0=.3b8e2d59-7dc2-461c-be8a-00ccc00fe1f8@github.com>
Message-ID: <ir04Nyodej1X5r5JqWJhK7pAqeLhVFaZD2nE7A0iJBI=.4e0163f0-831d-430c-a5f3-f1e8c3c1c31c@github.com>

On Wed, 17 Jul 2024 04:52:35 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Citing David Holmes from bug report:
>> "We provided the ability to leave out certain VM services (JVMTI, GC's other than serial, ...) as part of the design of the MinimalVM to support Java SE Embedded, along with the Compact Profiles of JDK 8. This manifested in the source code as a set of INCLUDE_XXX ifdef guards. The build system later exposed these as individual --with-jvm-features=xxx,yyy support. However, it was never intended (and certainly not tested) that you could mix-and-match arbitrary subsets of these VM features at will. Consequently if you start trying to do this you will find things that need fixing."
>> 
>> I added `INCLUDE_JVMTI` guards in two places where it was missed: JVMCI and JFR.  Affected code was added recently, in the past year. After that I was able to build VM on all supported platforms.
>> 
>> Note: building VM without JVMTI is not officially supported feature. We are not testing it and such failures (missing guards) are not unexpected.
>> 
>> A lot of tests failed with VM without JVMTI. All are expected failures. I listed failed tests in bug report.
>> I fixed (added requires `vm.jvmti`) only one which was part of [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967) changes which introduced JFR code without `INCLUDE_JVMTI` guards.
>> 
>> I ran 2 rounds of testing:
>> 
>> First, only **tier1** with VM built without JVMTI to see if builds passed and which tests affected. I wrote comment in bug report which tests failed (all expected to fail without JVMTI).
>> 
>> Second round of testing with JVMTI in VM: tier1-4
>
> src/hotspot/share/jfr/instrumentation/jfrJvmtiAgent.hpp line 35:
> 
>> 33:   JfrJvmtiAgent();
>> 34:   ~JfrJvmtiAgent();
>> 35:   static bool create() NOT_JVMTI_RETURN_(true);
> 
> It initially seemed odd to return `true` here, but looking through the JFR code that interacts with the Agent it seems the right way to view this is that without JVMTI we have a no-op agent.

Right.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20209#discussion_r1680433885

From djelinski at openjdk.org  Wed Jul 17 05:43:54 2024
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Wed, 17 Jul 2024 05:43:54 GMT
Subject: RFR: Merge 13341ca70276c891add2e4652b6e1e8020610988
In-Reply-To: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com>
References: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com>
Message-ID: <-RlBW1WFIiQyUpGqtLzAdpPq_uVik1dne-34b9lxOUY=.ff17a5c9-db80-4d41-8e7d-9c81b2d665a4@github.com>

On Wed, 17 Jul 2024 05:33:15 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

> This brings in CPU24_07 changes into master branch

Marked as reviewed by djelinski (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20211#pullrequestreview-2181924404

From djelinski at openjdk.org  Wed Jul 17 05:44:52 2024
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Wed, 17 Jul 2024 05:44:52 GMT
Subject: [jdk23] RFR: Merge d876cacf73ad698eda6668ccebbdfbe7690a0b06
In-Reply-To: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com>
References: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com>
Message-ID: <purSX-2YubBpPXCReHDrjW4IVHXozzq_f7EQX95xwD0=.d0b80e1c-4970-47dd-9582-b79b44f14a99@github.com>

On Wed, 17 Jul 2024 05:33:54 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

> This brings the CPU24_07 changes into jdk23 branch.

Marked as reviewed by djelinski (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20212#pullrequestreview-2181924794

From jbhateja at openjdk.org  Wed Jul 17 05:45:05 2024
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Wed, 17 Jul 2024 05:45:05 GMT
Subject: RFR: 8335860:
 compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard
 AVX/SSE settings [v3]
In-Reply-To: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
Message-ID: <89mJFTY-O4WgqC7eYEu125ehHkgVCFtecRPhSQuEisI=.2b86f885-8225-4103-9dcb-6a4be73bad71@github.com>

> Enabling test with explicit feature checks for x86 target.
> Removing from test/hotspot/jtreg/ProblemList.txt
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:

 - Restoring earlier comment
 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8335860
 - Review suggestions incorporated
 - 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings

-------------

Changes: https://git.openjdk.org/jdk/pull/20160/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20160&range=02
  Stats: 4 lines in 2 files changed: 0 ins; 3 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20160.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20160/head:pull/20160

PR: https://git.openjdk.org/jdk/pull/20160

From dholmes at openjdk.org  Wed Jul 17 05:56:50 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 05:56:50 GMT
Subject: RFR: Merge 13341ca70276c891add2e4652b6e1e8020610988
In-Reply-To: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com>
References: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com>
Message-ID: <IR7yERVG_xEnc_ZksWMfirmU73e9o1U69H_A_e3tZ9Y=.e446bbeb-a58d-4fd6-9db6-60fd32a17f1e@github.com>

On Wed, 17 Jul 2024 05:33:15 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

> This brings in CPU24_07 changes into master branch

Hotspot looks good.

Thanks

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20211#pullrequestreview-2181940719

From dholmes at openjdk.org  Wed Jul 17 05:57:50 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 05:57:50 GMT
Subject: [jdk23] RFR: Merge d876cacf73ad698eda6668ccebbdfbe7690a0b06
In-Reply-To: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com>
References: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com>
Message-ID: <VPJ1zpP40QbHN3TIVUxtdjgp-Pg9ixM7rHUrcvjhmTY=.0d02a69d-cfe7-425a-add0-0504840a55d8@github.com>

On Wed, 17 Jul 2024 05:33:54 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

> This brings the CPU24_07 changes into jdk23 branch.

Hotspot looks good. Thanks

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20212#pullrequestreview-2181941981

From jpai at openjdk.org  Wed Jul 17 06:09:08 2024
From: jpai at openjdk.org (Jaikiran Pai)
Date: Wed, 17 Jul 2024 06:09:08 GMT
Subject: RFR: Merge 13341ca70276c891add2e4652b6e1e8020610988 [v2]
In-Reply-To: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com>
References: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com>
Message-ID: <7GKxRhSM3KmXxcXQWeiB_FjMI845R58e06v8z3LPYRg=.f1103e22-f12d-42dc-9528-8cab49e7620e@github.com>

> This brings in CPU24_07 changes into master branch

Jaikiran Pai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20211/files
  - new: https://git.openjdk.org/jdk/pull/20211/files/13341ca7..13341ca7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20211&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20211&range=00-01

  Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20211.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20211/head:pull/20211

PR: https://git.openjdk.org/jdk/pull/20211

From jpai at openjdk.org  Wed Jul 17 06:09:09 2024
From: jpai at openjdk.org (Jaikiran Pai)
Date: Wed, 17 Jul 2024 06:09:09 GMT
Subject: RFR: Merge 13341ca70276c891add2e4652b6e1e8020610988
In-Reply-To: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com>
References: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com>
Message-ID: <jIdXM_bLVOKm5AhfkbA5EtUys_ALwbTgNA3-m86eFMg=.7df0a878-dbb9-4ec6-9eb9-feeae3e44dad@github.com>

On Wed, 17 Jul 2024 05:33:15 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

> This brings in CPU24_07 changes into master branch

Thank you David and Daniel for the reviews.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20211#issuecomment-2232494499

From jpai at openjdk.org  Wed Jul 17 06:09:09 2024
From: jpai at openjdk.org (Jaikiran Pai)
Date: Wed, 17 Jul 2024 06:09:09 GMT
Subject: Integrated: Merge 13341ca70276c891add2e4652b6e1e8020610988
In-Reply-To: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com>
References: <MRvTHoQ77X4EABZzfWIXuTtI1N6lT-CzJjBDNOUIWxQ=.32ebe203-c6f1-42e8-8d55-18c137c4be35@github.com>
Message-ID: <hFXcLRnvb1hk3kN5O6Fx0EsNNC9dR-d-XKFVBTI1zkU=.6ce13b48-d962-46da-ad55-c51fa8d669e9@github.com>

On Wed, 17 Jul 2024 05:33:15 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

> This brings in CPU24_07 changes into master branch

This pull request has now been integrated.

Changeset: d90c20c0
Author:    Jaikiran Pai <jpai at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/d90c20c0c728ced94493e0e58956153f6f61f898
Stats:     162 lines in 14 files changed: 98 ins; 4 del; 60 mod

Merge

Reviewed-by: djelinski, dholmes

-------------

PR: https://git.openjdk.org/jdk/pull/20211

From jpai at openjdk.org  Wed Jul 17 06:09:25 2024
From: jpai at openjdk.org (Jaikiran Pai)
Date: Wed, 17 Jul 2024 06:09:25 GMT
Subject: [jdk23] RFR: Merge d876cacf73ad698eda6668ccebbdfbe7690a0b06 [v2]
In-Reply-To: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com>
References: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com>
Message-ID: <0K-SFe5PeIBtOlrtjWuWi-iNf0kcjadeDHT3Ir_XCrg=.4051dd76-12ff-4ea8-8fe0-a7918d2142b4@github.com>

> This brings the CPU24_07 changes into jdk23 branch.

Jaikiran Pai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20212/files
  - new: https://git.openjdk.org/jdk/pull/20212/files/d876cacf..d876cacf

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20212&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20212&range=00-01

  Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20212.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20212/head:pull/20212

PR: https://git.openjdk.org/jdk/pull/20212

From jpai at openjdk.org  Wed Jul 17 06:09:25 2024
From: jpai at openjdk.org (Jaikiran Pai)
Date: Wed, 17 Jul 2024 06:09:25 GMT
Subject: [jdk23] Integrated: Merge d876cacf73ad698eda6668ccebbdfbe7690a0b06
In-Reply-To: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com>
References: <bY_ChE6g_Ce7D_S_6Hk54mceqpE4d7dnHOTnIQ-mgQ4=.8663036b-b285-4a2a-8e08-8b8d4caab76f@github.com>
Message-ID: <1_DJAHbifu2iXhbt0f4md0YF8Ym8i3t_b-qHM6j2BpU=.c70ecfc8-db8b-469a-a07f-1ff437a97024@github.com>

On Wed, 17 Jul 2024 05:33:54 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

> This brings the CPU24_07 changes into jdk23 branch.

This pull request has now been integrated.

Changeset: 7afb958e
Author:    Jaikiran Pai <jpai at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/7afb958e8d30221456a7b4634de0200dfe074950
Stats:     162 lines in 14 files changed: 98 ins; 4 del; 60 mod

Merge

Reviewed-by: djelinski, dholmes

-------------

PR: https://git.openjdk.org/jdk/pull/20212

From stuefe at openjdk.org  Wed Jul 17 06:28:52 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 17 Jul 2024 06:28:52 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v2]
In-Reply-To: <j1xFGdRG38i_hvtMSBHJeHVlC4-HTghiPnz1aTEKY8Q=.cec14e3f-5274-420a-9683-1a90ce86aefc@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
 <0C2xrw7X8gn7dl7LWNZu9lh5XJjvOSNbA0PRqa6ydoM=.29d1d6ee-242f-4ab5-abaa-d2113d030f82@github.com>
 <j1xFGdRG38i_hvtMSBHJeHVlC4-HTghiPnz1aTEKY8Q=.cec14e3f-5274-420a-9683-1a90ce86aefc@github.com>
Message-ID: <onfNeoQzCfo3lsgAdomCn6xxQx3nsVVk8h8h3gQDJl8=.b8c3a559-fad4-4b60-b22f-f07fd5f0b807@github.com>

On Wed, 17 Jul 2024 05:38:41 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> src/hotspot/share/classfile/javaClasses.cpp line 785:
>> 
>>> 783:       index = length - (max_length / 2);
>>> 784:       abridge = false; // only do this once
>>> 785:     }
>> 
>> Instead of the trailing "abridged", in similar cases I printed out the number of omitted characters. E.g.
>> 
>> "Very long long long ... (53 characters omitted) ... long long string"
>> 
>> Makes it obvious how much has been cut, and no danger of confusing the ellipse with naturally occurring dots.
>> 
>> Additionally, I would only do this if length > max_length + X, with X being at least as long as the middle part (3 characters if you only print an ellipse). You end up with printed strings that may be slightly longer than maxlen, but OTOH the output is clearer. Otherwise you may indicate omission where none happened (if length == max_length)
>
> @tstuefe  - thanks for looking at this Thomas. I don't get your second point. First I only abridge when length > max_length. Second adding in the X fudge factor just means max_length should have been set differently.
> 
> To your first point, there will be similar changes to this coming so it would be good to standardise on how to report them. I like the idea you propose, but I couldn't find the code you mention. ??

Hi David,

> I don't get your second point. First I only abridge when length > max_length. Second adding in the X fudge factor just means max_length should have been set differently.

AFAICS you start abridging if length is exactly max_length. So, you could have:

"Hallo David" maxlen=11 => "Hallo ... omitted 0 characters ... David" 

which is just strange. Additionally, it may seem just strange to replace a small inner portion with a larger "omitted" text, because then the text plus replacement is larger than the original text, e.g. 

"Hallo David" maxlen=10 => "Hallo ... omitted 1 characters ... David" 

So my proposal would be to have a stretch zone the size of the omission text (roughly, modulo variable digits in text) and allow larger text than maxlen but smaller than the stretch zone length. E.g.

"Hallo David" maxlen=10 => "Hallo David"
"Hallo 0123456789012345678901234567890123456789 David" => "Hallo 01234 ... omitted 30 characters ... 56789 David"

Its mainly for optics.

> To your first point, there will be similar changes to this coming so it would be good to standardise on how to report them. I like the idea you propose, but I couldn't find the code you mention. ??

Oh, sorry, this was not in the OpenJDK; this was for a different product I worked on at SAP. But that omission text form served us well.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1680478197

From dholmes at openjdk.org  Wed Jul 17 06:37:57 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 06:37:57 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
Message-ID: <Wj5uaRxDmVYqDnt2V1PgErk7dI10LCro6WSfAm4Q6BU=.6fd91b51-ec40-438f-95a4-d2fbf593a288@github.com>

On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision:
> 
>  - Remove try_read
>  - Add explicit to single parameter constructors
>  - Remove superfluous access specifier
>  - Remove unused include
>  - Update assert message OMCache::set_monitor
>  - Fix indentation
>  - Remove outdated comment LightweightSynchronizer::exit
>  - Remove logStream include
>  - Remove strange comment
>  - Fix javaThread include

src/hotspot/share/runtime/basicLock.hpp line 44:

> 42:   // a sentinel zero value indicating a recursive stack-lock.
> 43:   // * For LM_LIGHTWEIGHT
> 44:   // Used as a cache the ObjectMonitor* used when locking. Must either

The first sentence doesn't read correctly.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680492976

From dholmes at openjdk.org  Wed Jul 17 06:42:53 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 06:42:53 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
Message-ID: <0Dwv0GUezG25Soj6iG3Ti4NCm_RQJdF7psmnDoUAdRU=.c38a44c6-f6e6-4e2a-84ef-45c32d145a13@github.com>

On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision:
> 
>  - Remove try_read
>  - Add explicit to single parameter constructors
>  - Remove superfluous access specifier
>  - Remove unused include
>  - Update assert message OMCache::set_monitor
>  - Fix indentation
>  - Remove outdated comment LightweightSynchronizer::exit
>  - Remove logStream include
>  - Remove strange comment
>  - Fix javaThread include

src/hotspot/share/runtime/basicLock.hpp line 46:

> 44:   // Used as a cache the ObjectMonitor* used when locking. Must either
> 45:   // be nullptr or the ObjectMonitor* used when locking.
> 46:   volatile uintptr_t _metadata;

The displaced header/markword terminology was very well known to people, whereas "metadata" is really abstract - people will always need to go and find out what it actually refers to. Could we not define a union here to support the legacy and lightweight modes more explicitly and keep the existing terminology for the setters/getters for the code that uses it?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680496495

From dholmes at openjdk.org  Wed Jul 17 06:42:54 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 06:42:54 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <0Dwv0GUezG25Soj6iG3Ti4NCm_RQJdF7psmnDoUAdRU=.c38a44c6-f6e6-4e2a-84ef-45c32d145a13@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
 <0Dwv0GUezG25Soj6iG3Ti4NCm_RQJdF7psmnDoUAdRU=.c38a44c6-f6e6-4e2a-84ef-45c32d145a13@github.com>
Message-ID: <U3cg8IdnKu5Eeg-52muJuU0vEGJTRaX4jhKCOB3DVtk=.a1acc8fc-c3b7-4d38-ace8-dd39eff6c139@github.com>

On Wed, 17 Jul 2024 06:39:14 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision:
>> 
>>  - Remove try_read
>>  - Add explicit to single parameter constructors
>>  - Remove superfluous access specifier
>>  - Remove unused include
>>  - Update assert message OMCache::set_monitor
>>  - Fix indentation
>>  - Remove outdated comment LightweightSynchronizer::exit
>>  - Remove logStream include
>>  - Remove strange comment
>>  - Fix javaThread include
>
> src/hotspot/share/runtime/basicLock.hpp line 46:
> 
>> 44:   // Used as a cache the ObjectMonitor* used when locking. Must either
>> 45:   // be nullptr or the ObjectMonitor* used when locking.
>> 46:   volatile uintptr_t _metadata;
> 
> The displaced header/markword terminology was very well known to people, whereas "metadata" is really abstract - people will always need to go and find out what it actually refers to. Could we not define a union here to support the legacy and lightweight modes more explicitly and keep the existing terminology for the setters/getters for the code that uses it?

I should have read ahead. I see you do keep the setters/getters.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680497748

From dholmes at openjdk.org  Wed Jul 17 06:46:00 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 06:46:00 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
Message-ID: <LftlPgWcCiaMMys4K2eXSS2HNSS2WNh5sImxQ8QHKFY=.df009912-4c6e-4c09-b6c9-c5d308cf5cf1@github.com>

On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision:
> 
>  - Remove try_read
>  - Add explicit to single parameter constructors
>  - Remove superfluous access specifier
>  - Remove unused include
>  - Update assert message OMCache::set_monitor
>  - Fix indentation
>  - Remove outdated comment LightweightSynchronizer::exit
>  - Remove logStream include
>  - Remove strange comment
>  - Fix javaThread include

src/hotspot/share/runtime/deoptimization.cpp line 1641:

> 1639:               assert(fr.is_deoptimized_frame(), "frame must be scheduled for deoptimization");
> 1640:               if (LockingMode == LM_LEGACY) {
> 1641:                 mon_info->lock()->set_displaced_header(markWord::unused_mark());

In the existing code how is this restricted to the LM_LEGACY case?? It appears to be unconditional which suggests you are changing the non-UOMT LM_LIGHTWEIGHT logic. ??

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680500696

From dholmes at openjdk.org  Wed Jul 17 06:50:55 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 06:50:55 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
Message-ID: <1FImJurji3MUi1rauLpFYqETg45LmnlxLrRijzXBukg=.7125982a-3507-4711-922e-2c7c9706d87c@github.com>

On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision:
> 
>  - Remove try_read
>  - Add explicit to single parameter constructors
>  - Remove superfluous access specifier
>  - Remove unused include
>  - Update assert message OMCache::set_monitor
>  - Fix indentation
>  - Remove outdated comment LightweightSynchronizer::exit
>  - Remove logStream include
>  - Remove strange comment
>  - Fix javaThread include

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 60:

> 58: 
> 59: // ConcurrentHashTable storing links from objects to ObjectMonitors
> 60: class ObjectMonitorWorld : public CHeapObj<MEMFLAGS::mtObjectMonitor> {

OMWorld describes the project not the hashtable, this should be called ObjectMonitorTable or some such.

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 62:

> 60: class ObjectMonitorWorld : public CHeapObj<MEMFLAGS::mtObjectMonitor> {
> 61:   struct Config {
> 62:     using Value = ObjectMonitor*;

Does this alias really help? We don't state the type that many times and it looks odd to end up with a mix of `Value` and `ObjectMonitor*` in the same code.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680508685
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680508801

From dholmes at openjdk.org  Wed Jul 17 07:02:55 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 07:02:55 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
Message-ID: <fM04k6Q6c7d_WrQHqLgruy7mpffpZrI6A2o7ZcMAwz0=.5433e04a-1d24-4b20-a126-218f20313cfd@github.com>

On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision:
> 
>  - Remove try_read
>  - Add explicit to single parameter constructors
>  - Remove superfluous access specifier
>  - Remove unused include
>  - Update assert message OMCache::set_monitor
>  - Fix indentation
>  - Remove outdated comment LightweightSynchronizer::exit
>  - Remove logStream include
>  - Remove strange comment
>  - Fix javaThread include

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 102:

> 100:       assert(*value != nullptr, "must be");
> 101:       return (*value)->object_is_cleared();
> 102:     }

The `is_dead` functions seem oddly placed given they do not relate to the object stored in the wrapper. Why are they here? And what is the difference between `object_is_cleared` and `object_is_dead` (as used by `LookupMonitor`) ?

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 105:

> 103:   };
> 104: 
> 105:   class LookupMonitor : public StackObj {

I'm not understanding why we need this little wrapper class.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680526331
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1680526868

From epeter at openjdk.org  Wed Jul 17 07:36:51 2024
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Jul 2024 07:36:51 GMT
Subject: RFR: 8335860:
 compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard
 AVX/SSE settings
In-Reply-To: <Xzk8WdTohUoGqud1EBY6YeTn-MRheHLXJxT3xhX88a4=.34b86427-e557-46ad-94a4-2966184fe33f@github.com>
References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
 <Xzk8WdTohUoGqud1EBY6YeTn-MRheHLXJxT3xhX88a4=.34b86427-e557-46ad-94a4-2966184fe33f@github.com>
Message-ID: <stZn5gAlPWALwY9tQlWkFg1uZkG7Hqsg6cSPCR_1ZhI=.4b89ed7b-52c0-4cf4-9c75-daf78a052559@github.com>

On Wed, 17 Jul 2024 05:34:46 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Enabling test with explicit feature checks for x86 target.
>> Removing from test/hotspot/jtreg/ProblemList.txt
>> 
>> Best Regards,
>> Jatin
>
>> @jatin-bhateja There was also a suggestion from @eme64 as part of #20062 to remove requires vm.compiler2.enabled from the test.
> 
> Test only validates specific C2 IR patten, framework makes sure to compile @Test annotated methods with top tier (c2 : default) compiler using Whitebox mechanism. So @require flags looks redundant here.  Agree.

Hi @jatin-bhateja 
Are you also going to address the questions for the VM changes from https://github.com/openjdk/jdk/pull/20062? That could be done in a separate RFE, but it would be nice to at least hear what you plan to do or not to do ;)

Or would you like to look into the VM changes @jaskarth ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20160#issuecomment-2232629036

From jbhateja at openjdk.org  Wed Jul 17 07:40:52 2024
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Wed, 17 Jul 2024 07:40:52 GMT
Subject: RFR: 8335860:
 compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard
 AVX/SSE settings
In-Reply-To: <Xzk8WdTohUoGqud1EBY6YeTn-MRheHLXJxT3xhX88a4=.34b86427-e557-46ad-94a4-2966184fe33f@github.com>
References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
 <Xzk8WdTohUoGqud1EBY6YeTn-MRheHLXJxT3xhX88a4=.34b86427-e557-46ad-94a4-2966184fe33f@github.com>
Message-ID: <dcOE47WZPyhKVU_yN3x68rLj0JDpyXGnIFjncthF-Ss=.a6161082-a3e0-42a0-8bd0-82babed1587d@github.com>

On Wed, 17 Jul 2024 05:34:46 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Enabling test with explicit feature checks for x86 target.
>> Removing from test/hotspot/jtreg/ProblemList.txt
>> 
>> Best Regards,
>> Jatin
>
>> @jatin-bhateja There was also a suggestion from @eme64 as part of #20062 to remove requires vm.compiler2.enabled from the test.
> 
> Test only validates specific C2 IR patten, framework makes sure to compile @Test annotated methods with top tier (c2 : default) compiler using Whitebox mechanism. So @require flags looks redundant here.  Agree.

> Hi @jatin-bhateja Are you also going to address the questions for the VM changes from #20062? That could be done in a separate RFE, but it would be nice to at least hear what you plan to do or not to do ;)
> 
> Or would you like to look into the VM changes @jaskarth ?

Hi @eme64 , My apologies, did not find time to respond in time due to other priority items, please go ahead and file RFE and kindly assign it to me. I will handle it as the part of RFE or close that with appropriate justifications. This PR is for test enablement.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20160#issuecomment-2232634295

From mbaesken at openjdk.org  Wed Jul 17 07:51:52 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Wed, 17 Jul 2024 07:51:52 GMT
Subject: RFR: 8330144: Revise os::free_memory() [v2]
In-Reply-To: <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com>
References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
 <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com>
Message-ID: <0NDfnOazfhdITpxweiblZT0H-K8El5-xeP40MQ5J4LY=.ab0c4862-84dd-42c4-a103-3e7fa36a808a@github.com>

On Wed, 10 Jul 2024 20:09:45 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> ### Summary
>> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting.  The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. 
>> 
>> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory.  
>> 
>> **Transparent huge pages:**
>> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted.
>> 
>> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. 
>> 
>> **Explicit huge pages:**
>> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter.
>> 
>> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo.  After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use.  Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead.  After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an imp...
>
> Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Minor cleanup and comments.
>  - rename to disclaim_memory and update test

The comments above and coding looks okay to me.
Will test it first in our CI .
Are you sure there are no side effects or maybe 'bad' kernel versions where unwanted things occur when using madvice instead?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20080#issuecomment-2232657238

From stuefe at openjdk.org  Wed Jul 17 08:44:52 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 17 Jul 2024 08:44:52 GMT
Subject: RFR: 8330144: Revise os::free_memory() [v2]
In-Reply-To: <0NDfnOazfhdITpxweiblZT0H-K8El5-xeP40MQ5J4LY=.ab0c4862-84dd-42c4-a103-3e7fa36a808a@github.com>
References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
 <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com>
 <0NDfnOazfhdITpxweiblZT0H-K8El5-xeP40MQ5J4LY=.ab0c4862-84dd-42c4-a103-3e7fa36a808a@github.com>
Message-ID: <CW9S4_d61GhdYN2aJuRYiTTB94xZm6tgpjt0d2Rww6A=.555139c2-241f-4d77-a87f-1fb66a3f8fd7@github.com>

On Wed, 17 Jul 2024 07:49:23 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> The comments above and coding looks okay to me. Will test it first in our CI . Are you sure there are no side effects or maybe 'bad' kernel versions where unwanted things occur when using madvice instead?

@MBaesken Reasonably sure. The same technique has been used by the glibc for C-heap trimming for a long time. Thanks for putting this into SAP's CI!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20080#issuecomment-2232763075

From shade at openjdk.org  Wed Jul 17 08:55:51 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Wed, 17 Jul 2024 08:55:51 GMT
Subject: RFR: 8335921: Fix HotSpot VM build without JVMTI
In-Reply-To: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com>
References: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com>
Message-ID: <gvShf6Szqz7zSWXeUB7SZnzAIN_IFy2KwJJrKQxVuMw=.e83a36f6-b619-4ddc-816e-e3336eb63941@github.com>

On Wed, 17 Jul 2024 03:37:36 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

> Citing David Holmes from bug report:
> "We provided the ability to leave out certain VM services (JVMTI, GC's other than serial, ...) as part of the design of the MinimalVM to support Java SE Embedded, along with the Compact Profiles of JDK 8. This manifested in the source code as a set of INCLUDE_XXX ifdef guards. The build system later exposed these as individual --with-jvm-features=xxx,yyy support. However, it was never intended (and certainly not tested) that you could mix-and-match arbitrary subsets of these VM features at will. Consequently if you start trying to do this you will find things that need fixing."
> 
> I added `INCLUDE_JVMTI` guards in two places where it was missed: JVMCI and JFR.  Affected code was added recently, in the past year. After that I was able to build VM on all supported platforms.
> 
> Note: building VM without JVMTI is not officially supported feature. We are not testing it and such failures (missing guards) are not unexpected.
> 
> A lot of tests failed with VM without JVMTI. All are expected failures. I listed failed tests in bug report.
> I fixed (added requires `vm.jvmti`) only one which was part of [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967) changes which introduced JFR code without `INCLUDE_JVMTI` guards.
> 
> I ran 2 rounds of testing:
> 
> First, only **tier1** with VM built without JVMTI to see if builds passed and which tests affected. I wrote comment in bug report which tests failed (all expected to fail without JVMTI).
> 
> Second round of testing with JVMTI in VM: tier1-4

Looks okay.

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20209#pullrequestreview-2182317512

From shade at openjdk.org  Wed Jul 17 08:58:52 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Wed, 17 Jul 2024 08:58:52 GMT
Subject: RFR: 8335921: Fix HotSpot VM build without JVMTI
In-Reply-To: <9pz4Ru-DFK42pLhG6ny7_-bkHzTvDiBq5NfHk_0ron0=.3b8e2d59-7dc2-461c-be8a-00ccc00fe1f8@github.com>
References: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com>
 <9pz4Ru-DFK42pLhG6ny7_-bkHzTvDiBq5NfHk_0ron0=.3b8e2d59-7dc2-461c-be8a-00ccc00fe1f8@github.com>
Message-ID: <EMXROqwZDbBelzLxDkBNoKLRMwku-GG0bVF7FgJfZU8=.c6d63260-eafa-4e24-aed8-a16e76823001@github.com>

On Wed, 17 Jul 2024 04:57:38 GMT, David Holmes <dholmes at openjdk.org> wrote:

> It highlights the problem we have with optional components in that you either have to work things so that semantically we have a do-nothing implementation of that component, or else you have to put the guards around every piece of code that would normally interact with it.

At some point a few years ago I explored a private testing pipeline that built VM with different combination of options. It worked, but there were so many issues that cropped up continuously that I scratched that off as the lost cause. I gave up even on building Minimal. Fixing the particular build configurations every once in a while -- like this PR -- seems to be a pragmatic compromise.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20209#issuecomment-2232790329

From galder at openjdk.org  Wed Jul 17 09:20:51 2024
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Wed, 17 Jul 2024 09:20:51 GMT
Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and
 Math.min(long,long)
In-Reply-To: <l3QGajoAAxigBK5cfIYwdGPTKfbJJJLvnSYisn7O7x8=.15bd4030-3af2-4d3a-a013-8f9c392223f1@github.com>
References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com>
 <l3QGajoAAxigBK5cfIYwdGPTKfbJJJLvnSYisn7O7x8=.15bd4030-3af2-4d3a-a013-8f9c392223f1@github.com>
Message-ID: <MWiyM5dWze8wwUA4nKY5V-TiH98NO5qRlG3UcA3QbKw=.3c1c0d0f-de0c-485b-a5d0-f18c77aa32a4@github.com>

On Wed, 10 Jul 2024 14:24:05 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

> The C2 changes look nice! I just added one comment here about style. It would also be good to add some IR tests checking that the intrinsic is creating `MaxL`/`MinL` nodes before macro expansion, and a microbenchmark to compare results.

Thanks for the review. +1 to the IR tests, I'll work on those.

Re: microbenchmark - what do you have exactly in mind? For vectorization performance there is `ReductionPerf` though it's not a microbenchmark per se. Do you want a microbenchmark for the performance of vectorized max/min long? For non-vectorization performance there is `MathBench`.

I would not expect performance differences in `MathBench` because the backend is still the same and this change really benefits vectorization. I've run the min/max long tests on darwin/aarch64 and linux/x64 and indeed I see no difference:

linux/x64

Benchmark          (seed)   Mode  Cnt        Score       Error   Units
MathBench.maxLong       0  thrpt    8  1464197.164 ? 27044.205  ops/ms # base
MathBench.minLong       0  thrpt    8  1469917.328 ? 25397.401  ops/ms # base
MathBench.maxLong       0  thrpt    8  1469615.250 ? 17950.429  ops/ms # patched
MathBench.minLong       0  thrpt    8  1456290.514 ? 44455.727  ops/ms # patched


darwin/aarch64

Benchmark          (seed)   Mode  Cnt        Score        Error   Units
MathBench.maxLong       0  thrpt    8  1739341.447 ? 210983.444  ops/ms # base
MathBench.minLong       0  thrpt    8  1659547.649 ? 260554.159  ops/ms # base
MathBench.maxLong       0  thrpt    8  1660449.074 ? 254534.725  ops/ms # patched
MathBench.minLong       0  thrpt    8  1729728.021 ?  16327.575  ops/ms # patched

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2232836799

From jpai at openjdk.org  Wed Jul 17 10:26:50 2024
From: jpai at openjdk.org (Jaikiran Pai)
Date: Wed, 17 Jul 2024 10:26:50 GMT
Subject: RFR: 8336587: failure_handler lldb command times out on
 macosx-aarch64 core file
In-Reply-To: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com>
References: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com>
Message-ID: <ytnudkg8F2EyXlgAJrGC5F9oTCnH8T8qMGTP7z5b3-0=.1cc62657-6859-409a-a721-0560d544bfb6@github.com>

On Tue, 16 Jul 2024 23:59:09 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

> I was looking at the failure_handler output for the lldb command on a macosx-aarch64 core file (it is trying to use lldb to get a back trace of all threads), and noticed it timed out:
> 
> 
> ----------------------------------------
> [2024-07-15 05:15:47] [<snip>/usr/bin/lldb, --core, <snip>/core.92643, <snip>/bin/java, -o, thread backtrace all, -o, quit] timeout=20000 in <snip>
> ----------------------------------------
> (lldb) target create "<snip>/bin/java" --core "<snip>/core.92643"
> WARNING: tool timed out: killed process after 20000 ms
> ----------------------------------------
> [2024-07-15 05:16:07] exit code: -2 time: 20163 ms
> ----------------------------------------
> 
> 
> 20 seconds is the failure_handler default timeout for all commands. Core files on macosx-aarch64 tend to be very large. This one was over 13gb. On my MBPro it took 30 seconds. I bumped up the timeout to 60 seconds and reproduce the same crash in mach5 (more than once), and it usually took about 55 seconds for the lldb command, but it did succeed with the longer timeout. I think we should change the timeout to even more than 60 seconds just to make sure we won't see timeouts. 120 seconds is probably a good amount.

Marked as reviewed by jpai (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20206#pullrequestreview-2182516025

From chegar at openjdk.org  Wed Jul 17 13:26:57 2024
From: chegar at openjdk.org (Chris Hegarty)
Date: Wed, 17 Jul 2024 13:26:57 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v6]
In-Reply-To: <Kq6Xf3hnyRVLinNi7rm0oPm34BtiW1-qIqvahxWvXv0=.d44f3b37-e903-4274-aad6-820ec269fc8d@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <Kq6Xf3hnyRVLinNi7rm0oPm34BtiW1-qIqvahxWvXv0=.d44f3b37-e903-4274-aad6-820ec269fc8d@github.com>
Message-ID: <q4Hqz-Tv4gYMfJm2HwDRBzVZXJeneqSxkugIzh1u9bI=.32d426a7-3fa7-48df-8210-4430dfa1d430@github.com>

On Tue, 16 Jul 2024 18:09:20 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Revert JVMCI changes

Thanks for the discussion and changes in this PR - it's super helpful ( in what we can do to workaround ), as well as a great improvement for the future.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2233313234

From chegar at openjdk.org  Wed Jul 17 13:26:58 2024
From: chegar at openjdk.org (Chris Hegarty)
Date: Wed, 17 Jul 2024 13:26:58 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <_PDgnriMr5GoRUoTpxJnhZjIqEcjdF2kscNx94ScPlc=.b035d8ac-e218-46ed-86d9-a08368c63dc5@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <pLIRTEBVE6RFNwR0N7hZ3eRrqmAPcFB3v2PkPnDQUg0=.0e7ea973-6c9b-41e0-abb0-5d975108fbc5@github.com>
 <_PDgnriMr5GoRUoTpxJnhZjIqEcjdF2kscNx94ScPlc=.b035d8ac-e218-46ed-86d9-a08368c63dc5@github.com>
Message-ID: <Rbwpfm2sVsTEvbqEw3rOepj7QkXEX66XiOoaT1-RvLg=.e4cbf7c7-3a14-4b75-86c2-b90afab04f6c@github.com>

On Mon, 15 Jul 2024 12:59:27 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>  Effectively, once all the issues surrounding reachability fences will be addressed, we should be able to achieve numbers similar to above even in the case of shared close.

Is there an issue where I can follow this?  [ EDIT: oh! it's [JDK-8290892](https://bugs.openjdk.org/browse/JDK-8290892) ]

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2233317727

From szaldana at openjdk.org  Wed Jul 17 13:52:05 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Wed, 17 Jul 2024 13:52:05 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID
Message-ID: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>

Hi all, 

This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 

This PR addresses the following diagnostic commands: 
- [x] Compiler.perfmap 
- [x] GC.heap_dump
- [x] System.dump_map
- [x] Thread.dump_to_file
- [x] VM.cds

Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 

I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 


filename         (Optional) Name of the file to which the flight recording data is
                   written when the recording is stopped. If no filename is given, a
                   filename is generated from the PID and the current date and is
                   placed in the directory where the process was started. The
                   filename may also be a directory in which case, the filename is
                   generated from the PID and the current date in the specified
                   directory. (STRING, no default value)

                   Note: If a filename is given, '%p' in the filename will be
                   replaced by the PID, and '%t' will be replaced by the time in
                   'yyyy_MM_dd_HH_mm_ss' format.


Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 

Testing: 

- [x] Added test case passes. 
- [x] Modified existing VM.cds tests to also check for `%p` filenames. 

Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 

Cheers, 
Sonia

-------------

Commit messages:
 - 8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID

Changes: https://git.openjdk.org/jdk/pull/20198/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8334492
  Stats: 130 lines in 5 files changed: 116 ins; 0 del; 14 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From szaldana at openjdk.org  Wed Jul 17 14:02:31 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Wed, 17 Jul 2024 14:02:31 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v2]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <vluUCz7LJUc6FInntimxmXcyImSJfrxWkBOUWat-2zs=.7b3ab621-30a8-4e6d-89f2-77c3504dc432@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  Updating copyright headers

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/ee46dab5..eea54f6d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=00-01

  Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From stuefe at openjdk.org  Wed Jul 17 14:24:54 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 17 Jul 2024 14:24:54 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v2]
In-Reply-To: <vluUCz7LJUc6FInntimxmXcyImSJfrxWkBOUWat-2zs=.7b3ab621-30a8-4e6d-89f2-77c3504dc432@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <vluUCz7LJUc6FInntimxmXcyImSJfrxWkBOUWat-2zs=.7b3ab621-30a8-4e6d-89f2-77c3504dc432@github.com>
Message-ID: <0FaB5dyzz0jaa0RETfdT4wcbS3jPg4QzIzj1s-pPWvw=.805a55dc-d141-482f-b6aa-e6c4fdfbb97d@github.com>

On Wed, 17 Jul 2024 14:02:31 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Updating copyright headers

First cursory review. That is a useful feature

- In all cases: please, in case of an error, don't THROW, don't do `warning`. Instead, just print to the `output()` of the DCmd. You want an error to appear to the user of the dcmd - so, to stdout or stderr of the jcmd process issuing the command. Not an exception in the target JVM process, nor a warning in the target JVM stderr stream

- Can you give us a variant of `Arguments::copy_expand_pid` that receives a zero-terminated const char* as input so that we can avoid having to pass in the length of the input each time?

- when passing in output buffers to functions, try to use sizeof(buffer) instead of repeating the buffer size. Then, one can change the size of the buffer array without having to modify using calls (but aware: pitfall, sizeof(char[]) vs sizeof(char*))

src/hotspot/share/code/codeCache.cpp line 1796:

> 1794:   // Perf expects to find the map file at /tmp/perf-<pid>.map
> 1795:   // if the file name is not specified.
> 1796:   char fname[JVM_MAXPATHLEN];

Good to see this gone, the old code implicitly relied on: max pid len -2147483647 = 11 chars, + length of "/tmp/perf-.map" not overflowing 32, which cuts a bit close to the bone.

src/hotspot/share/code/codeCache.cpp line 1800:

> 1798:     jio_snprintf(fname, sizeof(fname), "/tmp/perf-%d.map",
> 1799:                  os::current_process_id());
> 1800:   }

Arguably one could just do 

constexpr char[] filename_default = "/tmp/perf-%p.map";
Arguments::copy_expand_pid(filename == nullptr ? filename_default : filename, .....);

src/hotspot/share/services/diagnosticCommand.cpp line 525:

> 523:     stringStream msg;
> 524:     msg.print("Invalid file path name specified: %s", _filename.value());
> 525:     THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(), msg.base());

Why throw? Why not just print an error message to the output() stream and return?

src/hotspot/share/services/diagnosticCommand.cpp line 1059:

> 1057:       fileh = java_lang_String::create_from_str(fname, CHECK);
> 1058:     } else {
> 1059:       warning("Invalid file path name specified, fall back to default name");

`warning` prints a warning to the stdout of the JVM process. You don't want that; you want a warning to the issuer of the dcmd, which is another - possibly even remote - process. Write errors to `output()`, instead.

src/hotspot/share/services/diagnosticCommand.cpp line 1138:

> 1136:     stringStream msg;
> 1137:     msg.print("Invalid file path name specified: %s", _filepath.value());
> 1138:     THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(), msg.base());

write to output() and return instead of throwing

-------------

PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2183023385
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681109673
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681115247
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681118969
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681124783
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681125914

From stuefe at openjdk.org  Wed Jul 17 14:24:54 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 17 Jul 2024 14:24:54 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v2]
In-Reply-To: <0FaB5dyzz0jaa0RETfdT4wcbS3jPg4QzIzj1s-pPWvw=.805a55dc-d141-482f-b6aa-e6c4fdfbb97d@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <vluUCz7LJUc6FInntimxmXcyImSJfrxWkBOUWat-2zs=.7b3ab621-30a8-4e6d-89f2-77c3504dc432@github.com>
 <0FaB5dyzz0jaa0RETfdT4wcbS3jPg4QzIzj1s-pPWvw=.805a55dc-d141-482f-b6aa-e6c4fdfbb97d@github.com>
Message-ID: <TwN9BcK3S4jrx7Kxu6GcBdXJULEURmkE3JF3JbG1sF8=.7d0ec11e-86be-4e30-b385-9b2bc6659c74@github.com>

On Wed, 17 Jul 2024 14:02:01 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Updating copyright headers
>
> src/hotspot/share/code/codeCache.cpp line 1800:
> 
>> 1798:     jio_snprintf(fname, sizeof(fname), "/tmp/perf-%d.map",
>> 1799:                  os::current_process_id());
>> 1800:   }
> 
> Arguably one could just do 
> 
> constexpr char[] filename_default = "/tmp/perf-%p.map";
> Arguments::copy_expand_pid(filename == nullptr ? filename_default : filename, .....);

This pattern can be followed in all cases where we have default filenames

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681149193

From fgao at openjdk.org  Wed Jul 17 15:04:53 2024
From: fgao at openjdk.org (Fei Gao)
Date: Wed, 17 Jul 2024 15:04:53 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
Message-ID: <Kvsq4IVjcpSDlBwrMxfnv_p2OWa9sgPkqCe6ZOjrb0Y=.7978edc8-6d5c-4840-86ec-9e728729bb0d@github.com>

On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote:

> In the cases like:
> 
>   UNSAFE.putLong(address + off1 + 1030, lseed);
>   UNSAFE.putLong(address + 1023, lseed);
>   UNSAFE.putLong(address + off2 + 1001, lseed);
> 
> 
> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like:
> 
>   ldr  R10, [R15, #120]    # int ! Field: address
>   ldr  R11, [R16, #136]    # int ! Field: off1
>   ldr  R12, [R16, #144]    # int ! Field: off2
>   add  R11, R11, R10
>   mov R11, R11    # long -> ptr
>   add  R12, R12, R10
>   mov R10, R10    # long -> ptr
>   add R11, R11, #1030    # ptr
>   str  R17, [R11]    # int
>   add R10, R10, #1023    # ptr
>   str  R17, [R10]    # int
>   mov R10, R12    # long -> ptr
>   add R10, R10, #1001    # ptr
>   str  R17, [R10]    # int
> 
> 
> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly:
> 
>   ldr    x10, [x15,#120]
>   ldp    x11, x12, [x16,#136]
>   add    x11, x11, x10
>   add    x12, x12, x10
>   add    x11, x11, #0x406
>   str    x17, [x11]
>   add    x10, x10, #0x3ff
>   str    x17, [x10]
>   mov    x10, x12  <--- extra register copy
>   add    x10, x10, #0x3e9
>   str    x17, [x10]
> 
> 
> There is still one extra register copy, which we're trying to remove in this patch.
> 
> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition.
> 
> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so
> 
> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906

>> #20159 is also to fix the same issue. Please feel free to review the draft PR. Thanks.

> This will need quite a lot of testing, perhaps higher tiers and jcstress. You can test these two PRs together.

Thanks for approval, @theRealAph. I'll test jcstress on my local.

Could you help review and test these two PRs with higher tiers please? @TobiHartmann @vnkozlov Thanks.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2233542851

From jvernee at openjdk.org  Wed Jul 17 15:19:18 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Wed, 17 Jul 2024 15:19:18 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v7]
In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
Message-ID: <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com>

> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
> 
> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
> 
> In this PR:
> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
> 
> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
> 
> 
> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
> ConcurrentClose.sharedAccess        1   avgt   10    52.042 ?   0.630  us/op
> ConcurrentClose.conf...

Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:

  benchmark review comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20158/files
  - new: https://git.openjdk.org/jdk/pull/20158/files/138fba42..2019289b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=05-06

  Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20158.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158

PR: https://git.openjdk.org/jdk/pull/20158

From mcimadamore at openjdk.org  Wed Jul 17 15:43:54 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Wed, 17 Jul 2024 15:43:54 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v7]
In-Reply-To: <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com>
Message-ID: <ohN1Vo69B9dR4rAyFJlhXTcvVdmy9lznmdQ-TMZZgKQ=.e9ec24bb-fcb4-4af0-90d1-72bad69488b3@github.com>

On Wed, 17 Jul 2024 15:19:18 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   benchmark review comments

Changes in scopedMemoryAccess and benchmark look good

-------------

Marked as reviewed by mcimadamore (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20158#pullrequestreview-2183305234

From sviswanathan at openjdk.org  Wed Jul 17 15:55:55 2024
From: sviswanathan at openjdk.org (Sandhya Viswanathan)
Date: Wed, 17 Jul 2024 15:55:55 GMT
Subject: RFR: 8335860:
 compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard
 AVX/SSE settings [v3]
In-Reply-To: <89mJFTY-O4WgqC7eYEu125ehHkgVCFtecRPhSQuEisI=.2b86f885-8225-4103-9dcb-6a4be73bad71@github.com>
References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
 <89mJFTY-O4WgqC7eYEu125ehHkgVCFtecRPhSQuEisI=.2b86f885-8225-4103-9dcb-6a4be73bad71@github.com>
Message-ID: <lJPvlapZrnp_aOnL4jWaveFV-FdbNEVMFJ67cZ47S6Y=.45d2da93-caf4-47f3-a4dc-aaa8dc1ff7ab@github.com>

On Wed, 17 Jul 2024 05:45:05 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Enabling test with explicit feature checks for x86 target.
>> Removing from test/hotspot/jtreg/ProblemList.txt
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
> 
>  - Restoring earlier comment
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8335860
>  - Review suggestions incorporated
>  - 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings

Looks good to me.

-------------

Marked as reviewed by sviswanathan (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20160#pullrequestreview-2183342654

From liach at openjdk.org  Wed Jul 17 16:17:51 2024
From: liach at openjdk.org (Chen Liang)
Date: Wed, 17 Jul 2024 16:17:51 GMT
Subject: RFR: 8336275: Move common Method and Constructor fields to
 Executable [v2]
In-Reply-To: <Z04ux2yyYVR5W1y8prXM4lYPhycn-DE7aM7elm92C3k=.e9eb01d1-cbc3-4da8-be66-03ad947c20ff@github.com>
References: <nYtWyeRXdAr_zmzpxdugyZNRUzhfHJUKX1K2ilpSs8A=.cb1c31be-a7e0-49b5-ab9b-18a3abd122a9@github.com>
 <Z04ux2yyYVR5W1y8prXM4lYPhycn-DE7aM7elm92C3k=.e9eb01d1-cbc3-4da8-be66-03ad947c20ff@github.com>
Message-ID: <UBLc0-jkwNqLXnWhNf-LCRwJQu2G0x3TrrOLGZzmpwE=.1202e9cb-a3d9-4f93-92ab-a1d65c648e38@github.com>

On Wed, 17 Jul 2024 03:03:23 GMT, Chen Liang <liach at openjdk.org> wrote:

>> Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable.
>> 
>> Note to core-libs reviewers: Please review the associated CSR on trivial removal of `abstract` modifier as well.
>
> Chen Liang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Redundant transient; Update the comments to be more accurate

Just noted that I removed some abstract flags from `Executable` methods, which requires a CSR archive entry.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20188#issuecomment-2233696779

From alanb at openjdk.org  Wed Jul 17 16:30:52 2024
From: alanb at openjdk.org (Alan Bateman)
Date: Wed, 17 Jul 2024 16:30:52 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v2]
In-Reply-To: <0FaB5dyzz0jaa0RETfdT4wcbS3jPg4QzIzj1s-pPWvw=.805a55dc-d141-482f-b6aa-e6c4fdfbb97d@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <vluUCz7LJUc6FInntimxmXcyImSJfrxWkBOUWat-2zs=.7b3ab621-30a8-4e6d-89f2-77c3504dc432@github.com>
 <0FaB5dyzz0jaa0RETfdT4wcbS3jPg4QzIzj1s-pPWvw=.805a55dc-d141-482f-b6aa-e6c4fdfbb97d@github.com>
Message-ID: <70ENucevFdhpARyQni4sy3uUh93-FFQz038hJYswZqE=.76860fb0-94a3-473e-bb3c-9c9733a850b2@github.com>

On Wed, 17 Jul 2024 14:07:55 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Updating copyright headers
>
> src/hotspot/share/services/diagnosticCommand.cpp line 1138:
> 
>> 1136:     stringStream msg;
>> 1137:     msg.print("Invalid file path name specified: %s", _filepath.value());
>> 1138:     THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(), msg.base());
> 
> write to output() and return instead of throwing

Yes, the error needs to be written the stream so that it is printed by the tool (at the end other of the pipe).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681352690

From kvn at openjdk.org  Wed Jul 17 16:36:52 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Wed, 17 Jul 2024 16:36:52 GMT
Subject: RFR: 8335921: Fix HotSpot VM build without JVMTI
In-Reply-To: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com>
References: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com>
Message-ID: <0ta5_zgRm_HBLh0jhqZm807Qe5sYsSb3rNTEF9j2p2Q=.a303a8ac-b957-4727-832a-36d834d165a4@github.com>

On Wed, 17 Jul 2024 03:37:36 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

> Citing David Holmes from bug report:
> "We provided the ability to leave out certain VM services (JVMTI, GC's other than serial, ...) as part of the design of the MinimalVM to support Java SE Embedded, along with the Compact Profiles of JDK 8. This manifested in the source code as a set of INCLUDE_XXX ifdef guards. The build system later exposed these as individual --with-jvm-features=xxx,yyy support. However, it was never intended (and certainly not tested) that you could mix-and-match arbitrary subsets of these VM features at will. Consequently if you start trying to do this you will find things that need fixing."
> 
> I added `INCLUDE_JVMTI` guards in two places where it was missed: JVMCI and JFR.  Affected code was added recently, in the past year. After that I was able to build VM on all supported platforms.
> 
> Note: building VM without JVMTI is not officially supported feature. We are not testing it and such failures (missing guards) are not unexpected.
> 
> A lot of tests failed with VM without JVMTI. All are expected failures. I listed failed tests in bug report.
> I fixed (added requires `vm.jvmti`) only one which was part of [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967) changes which introduced JFR code without `INCLUDE_JVMTI` guards.
> 
> I ran 2 rounds of testing:
> 
> First, only **tier1** with VM built without JVMTI to see if builds passed and which tests affected. I wrote comment in bug report which tests failed (all expected to fail without JVMTI).
> 
> Second round of testing with JVMTI in VM: tier1-4

Thank you, Aleksey, for review.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20209#issuecomment-2233729056

From kvn at openjdk.org  Wed Jul 17 16:48:54 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Wed, 17 Jul 2024 16:48:54 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v7]
In-Reply-To: <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com>
Message-ID: <lt202f35FQwPU8wzVkgqeaebZINd2UIna8B-XJCcl_M=.0d2e9a2c-72d3-4557-9f54-04334e8f77d7@github.com>

On Wed, 17 Jul 2024 15:19:18 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   benchmark review comments

I am fine with compiler and CI changes - it is just marking nmethod as having scoped access.

-------------

Marked as reviewed by kvn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20158#pullrequestreview-2183464779

From aph at openjdk.org  Wed Jul 17 17:17:52 2024
From: aph at openjdk.org (Andrew Haley)
Date: Wed, 17 Jul 2024 17:17:52 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter
In-Reply-To: <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com>
Message-ID: <FMWMwnwhdReuAohf_e_EWQN7yFM6WNl8Hv0_0S7goek=.9004d9f0-5755-471e-a120-6b6e83c8ebbd@github.com>

On Thu, 11 Jul 2024 23:47:51 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> src/hotspot/share/oops/klass.cpp line 284:
>> 
>>> 282: // which doesn't zero out the memory before calling the constructor.
>>> 283: Klass::Klass(KlassKind kind) : _kind(kind),
>>> 284:                                _bitmap(SECONDARY_SUPERS_BITMAP_FULL),
>> 
>> I like the idea, but what are the benefits of initializing `_bitmap` separately from `_secondary_supers`?
>
> Another observation while browsing the code: `_secondary_supers_bitmap` would be a better name. (Same considerations apply to `_hash_slot`.)

This is because the C++ runtime does secondary super cache lookups even before the bitmap has been calculated and the hash table sorted. In this case the bitmap is zero, so teh search thinks there are no secondary supers. Setting _bitmap to SECONDARY_SUPERS_BITMAP_FULL forces a linear search.

I guess there might be a better way to do this. Perhaps a comment is needed?

I agree about `_secondary_supers_bitmap` name.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1681410651

From aph at openjdk.org  Wed Jul 17 17:17:53 2024
From: aph at openjdk.org (Andrew Haley)
Date: Wed, 17 Jul 2024 17:17:53 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter
In-Reply-To: <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
Message-ID: <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com>

On Fri, 5 Jul 2024 22:37:34 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> src/hotspot/share/oops/klass.cpp line 469:
> 
>> 467: #endif
>> 468: 
>> 469:   bitmap = hash_secondary_supers(secondary_supers, /*rewrite=*/true); // rewrites freshly allocated array
> 
> I like that hashing is performed unconditionally now. 
> 
> Looks like you can remove `UseSecondarySupersTable`-specific CDS support (in `filemap.cpp`). CDS archive should unconditionally contain hashed tables.

Sure.

> src/hotspot/share/oops/klass.inline.hpp line 122:
> 
>> 120:     return true;
>> 121: 
>> 122:   bool result = lookup_secondary_supers_table(k);
> 
> Should `UseSecondarySupersTable` affect `Klass::search_secondary_supers` as well?

I think not. It'd complicate C++ runtime for no useful reason.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1681411088
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1681412747

From kvn at openjdk.org  Wed Jul 17 17:21:52 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Wed, 17 Jul 2024 17:21:52 GMT
Subject: RFR: 8335860:
 compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard
 AVX/SSE settings [v3]
In-Reply-To: <89mJFTY-O4WgqC7eYEu125ehHkgVCFtecRPhSQuEisI=.2b86f885-8225-4103-9dcb-6a4be73bad71@github.com>
References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
 <89mJFTY-O4WgqC7eYEu125ehHkgVCFtecRPhSQuEisI=.2b86f885-8225-4103-9dcb-6a4be73bad71@github.com>
Message-ID: <juaTQskdYFNW5cFDGPY8zr01qgqRwvflxIuw9ZjVVzM=.7f2ebf55-5214-4d11-a2f9-3d3ae755d7ec@github.com>

On Wed, 17 Jul 2024 05:45:05 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Enabling test with explicit feature checks for x86 target.
>> Removing from test/hotspot/jtreg/ProblemList.txt
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
> 
>  - Restoring earlier comment
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8335860
>  - Review suggestions incorporated
>  - 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings

Looks good. I submitted testing.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20160#issuecomment-2233813722

From liach at openjdk.org  Wed Jul 17 18:08:51 2024
From: liach at openjdk.org (Chen Liang)
Date: Wed, 17 Jul 2024 18:08:51 GMT
Subject: RFR: 8334772: Change Class::protectionDomain and signers to explicit
 fields
Message-ID: <FWayOxGhFrGGyh33wkJIMHkIO4azia9jFdmKszY9EBs=.9ac3895f-e96e-41aa-8a58-e491aaa68339@github.com>

Please review this change that moves `Class.protectionDomain` and `signers` to explicit fields.

Related native methods in `Class` and `AccessController::getProtectionDomain` are converted to pure Java. These fields are still set and used by hotspot. Also fixes the incorrect `protectiondomain_signature` in `vmSymbols`, which is actually an array descriptor.

Note that these new fields are not filtered: filtering in early bootstrap requires other unrelated adjustments as we can't even use hashCode on String, and filtering is not proper encapsulation either.

-------------

Commit messages:
 - Tests rely on Class ctor
 - Move class protectionDomain and signers fields to be explicit

Changes: https://git.openjdk.org/jdk/pull/20221/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20221&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8334772
  Stats: 145 lines in 15 files changed: 25 ins; 90 del; 30 mod
  Patch: https://git.openjdk.org/jdk/pull/20221.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20221/head:pull/20221

PR: https://git.openjdk.org/jdk/pull/20221

From alanb at openjdk.org  Wed Jul 17 18:38:38 2024
From: alanb at openjdk.org (Alan Bateman)
Date: Wed, 17 Jul 2024 18:38:38 GMT
Subject: RFR: 8334772: Change Class::protectionDomain and signers to
 explicit fields
In-Reply-To: <FWayOxGhFrGGyh33wkJIMHkIO4azia9jFdmKszY9EBs=.9ac3895f-e96e-41aa-8a58-e491aaa68339@github.com>
References: <FWayOxGhFrGGyh33wkJIMHkIO4azia9jFdmKszY9EBs=.9ac3895f-e96e-41aa-8a58-e491aaa68339@github.com>
Message-ID: <Qnq3ypuxY7aa3QpAXP072UDegGz3vEM36DZg0_ql6mQ=.8ea26223-bd99-45f5-9dcf-aa0ee7d1a8c2@github.com>

On Wed, 17 Jul 2024 17:47:11 GMT, Chen Liang <liach at openjdk.org> wrote:

> Please review this change that moves `Class.protectionDomain` and `signers` to explicit fields.
> 
> Related native methods in `Class` and `AccessController::getProtectionDomain` are converted to pure Java. These fields are still set and used by hotspot. Also fixes the incorrect `protectiondomain_signature` in `vmSymbols`, which is actually an array descriptor.
> 
> Note that these new fields are not filtered: filtering in early bootstrap requires other unrelated adjustments as we can't even use hashCode on String, and filtering is not proper encapsulation either.

Offline discussion with Chen and I think the advice is to drop all the changes for ProtectionDomain for now. This area will change significantly as part of the SecurityManager removal work.

src/java.base/share/classes/jdk/internal/access/JavaLangAccess.java line 430:

> 428:      * {@link Class#getProtectionDomain()}
> 429:      */
> 430:     ProtectionDomain protectionDomain(Class<?> c, boolean raw);

I don't think we should expose this outside of java.lang.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20221#issuecomment-2233996684
PR Review Comment: https://git.openjdk.org/jdk/pull/20221#discussion_r1681559624

From liach at openjdk.org  Wed Jul 17 18:45:38 2024
From: liach at openjdk.org (Chen Liang)
Date: Wed, 17 Jul 2024 18:45:38 GMT
Subject: RFR: 8334772: Change Class::protectionDomain and signers to
 explicit fields
In-Reply-To: <FWayOxGhFrGGyh33wkJIMHkIO4azia9jFdmKszY9EBs=.9ac3895f-e96e-41aa-8a58-e491aaa68339@github.com>
References: <FWayOxGhFrGGyh33wkJIMHkIO4azia9jFdmKszY9EBs=.9ac3895f-e96e-41aa-8a58-e491aaa68339@github.com>
Message-ID: <J1wdWdHGNcBaJsnGL43q9iJkvz4pTkx0O0IJXrTClJM=.fe47dd18-67f1-4d84-a2d8-a006518ba37d@github.com>

On Wed, 17 Jul 2024 17:47:11 GMT, Chen Liang <liach at openjdk.org> wrote:

> Please review this change that moves `Class.protectionDomain` and `signers` to explicit fields.
> 
> Related native methods in `Class` and `AccessController::getProtectionDomain` are converted to pure Java. These fields are still set and used by hotspot. Also fixes the incorrect `protectiondomain_signature` in `vmSymbols`, which is actually an array descriptor.
> 
> Note that these new fields are not filtered: filtering in early bootstrap requires other unrelated adjustments as we can't even use hashCode on String, and filtering is not proper encapsulation either.

The migration of signers will be in a new PR. This patch will be kept so people will know the extra test updates related to migration of protectionDomain.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20221#issuecomment-2234008622

From liach at openjdk.org  Wed Jul 17 18:45:38 2024
From: liach at openjdk.org (Chen Liang)
Date: Wed, 17 Jul 2024 18:45:38 GMT
Subject: Withdrawn: 8334772: Change Class::protectionDomain and signers to
 explicit fields
In-Reply-To: <FWayOxGhFrGGyh33wkJIMHkIO4azia9jFdmKszY9EBs=.9ac3895f-e96e-41aa-8a58-e491aaa68339@github.com>
References: <FWayOxGhFrGGyh33wkJIMHkIO4azia9jFdmKszY9EBs=.9ac3895f-e96e-41aa-8a58-e491aaa68339@github.com>
Message-ID: <OgugPiJxEziwY9rRp_GkDdQ3K5Adye7kO-qPeQGzRYs=.f4d29f5e-ef1c-43ca-b946-924eac809a87@github.com>

On Wed, 17 Jul 2024 17:47:11 GMT, Chen Liang <liach at openjdk.org> wrote:

> Please review this change that moves `Class.protectionDomain` and `signers` to explicit fields.
> 
> Related native methods in `Class` and `AccessController::getProtectionDomain` are converted to pure Java. These fields are still set and used by hotspot. Also fixes the incorrect `protectiondomain_signature` in `vmSymbols`, which is actually an array descriptor.
> 
> Note that these new fields are not filtered: filtering in early bootstrap requires other unrelated adjustments as we can't even use hashCode on String, and filtering is not proper encapsulation either.

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/20221

From vlivanov at openjdk.org  Wed Jul 17 18:48:33 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Jul 2024 18:48:33 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter
In-Reply-To: <FMWMwnwhdReuAohf_e_EWQN7yFM6WNl8Hv0_0S7goek=.9004d9f0-5755-471e-a120-6b6e83c8ebbd@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com>
 <FMWMwnwhdReuAohf_e_EWQN7yFM6WNl8Hv0_0S7goek=.9004d9f0-5755-471e-a120-6b6e83c8ebbd@github.com>
Message-ID: <xNV7-nhHtDKME2kWU_k3bKZJId61Ii_CW12KMQvd0IY=.03b01561-0358-4635-9d1c-ee931f14f12f@github.com>

On Wed, 17 Jul 2024 17:13:49 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Another observation while browsing the code: `_secondary_supers_bitmap` would be a better name. (Same considerations apply to `_hash_slot`.)
>
> This is because the C++ runtime does secondary super cache lookups even before the bitmap has been calculated and the hash table sorted. In this case the bitmap is zero, so teh search thinks there are no secondary supers. Setting _bitmap to SECONDARY_SUPERS_BITMAP_FULL forces a linear search.
> 
> I guess there might be a better way to do this. Perhaps a comment is needed?
> 
> I agree about `_secondary_supers_bitmap` name.

Now it starts to sound concerning... `Klass::set_secondary_supers()` initializes both `_secondary_supers` and `_bitmap` which implies that `Klass::is_subtype_of()` may be called on not yet initialized Klass. It that's the case, it does look like a bug on its own. How is it expected to work when `_secondary_supers` hasn't been set yet?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1681571806

From kvn at openjdk.org  Wed Jul 17 18:48:35 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Wed, 17 Jul 2024 18:48:35 GMT
Subject: Integrated: 8335921: Fix HotSpot VM build without JVMTI
In-Reply-To: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com>
References: <SFc7wGgnmCR8hHO_6h9j_LC5drW2BLC-sRKuFNtAOjE=.d061ebae-ba38-4d05-9648-e0ff17bb3343@github.com>
Message-ID: <cA66XRsCeAiEtD8Z4E4OJUAeIZRA1AFtG53dK_NpdrY=.351d8e9b-f6a6-4331-83b3-0d00b8675203@github.com>

On Wed, 17 Jul 2024 03:37:36 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

> Citing David Holmes from bug report:
> "We provided the ability to leave out certain VM services (JVMTI, GC's other than serial, ...) as part of the design of the MinimalVM to support Java SE Embedded, along with the Compact Profiles of JDK 8. This manifested in the source code as a set of INCLUDE_XXX ifdef guards. The build system later exposed these as individual --with-jvm-features=xxx,yyy support. However, it was never intended (and certainly not tested) that you could mix-and-match arbitrary subsets of these VM features at will. Consequently if you start trying to do this you will find things that need fixing."
> 
> I added `INCLUDE_JVMTI` guards in two places where it was missed: JVMCI and JFR.  Affected code was added recently, in the past year. After that I was able to build VM on all supported platforms.
> 
> Note: building VM without JVMTI is not officially supported feature. We are not testing it and such failures (missing guards) are not unexpected.
> 
> A lot of tests failed with VM without JVMTI. All are expected failures. I listed failed tests in bug report.
> I fixed (added requires `vm.jvmti`) only one which was part of [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967) changes which introduced JFR code without `INCLUDE_JVMTI` guards.
> 
> I ran 2 rounds of testing:
> 
> First, only **tier1** with VM built without JVMTI to see if builds passed and which tests affected. I wrote comment in bug report which tests failed (all expected to fail without JVMTI).
> 
> Second round of testing with JVMTI in VM: tier1-4

This pull request has now been integrated.

Changeset: bcb5e695
Author:    Vladimir Kozlov <kvn at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/bcb5e69505f6cc8a4f323924cd2c58e630595fc0
Stats:     20 lines in 8 files changed: 7 ins; 0 del; 13 mod

8335921: Fix HotSpot VM build without JVMTI

Reviewed-by: dholmes, shade

-------------

PR: https://git.openjdk.org/jdk/pull/20209

From shade at openjdk.org  Wed Jul 17 18:52:13 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Wed, 17 Jul 2024 18:52:13 GMT
Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v2]
In-Reply-To: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
Message-ID: <TIMHU8ZaFhaLG2GUXBkb0hrvMsUz8Orays0vZlsYO4k=.39157f2c-f720-43d7-b1d8-6900551e5237@github.com>

> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks.
> 
> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. 
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [ ] Linux AArch64 server fastdebug, `all`

Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:

 - More precise barriers
 - Tests work
 - More touchups
 - Fixing the conditions, fixing the tests
 - Crude prototype, still failing the tests

-------------

Changes: https://git.openjdk.org/jdk/pull/20139/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=01
  Stats: 329 lines in 13 files changed: 328 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20139.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139

PR: https://git.openjdk.org/jdk/pull/20139

From kbarrett at openjdk.org  Wed Jul 17 18:52:13 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Wed, 17 Jul 2024 18:52:13 GMT
Subject: RFR: 8329597: C2: Intrinsify Reference.clear
In-Reply-To: <WOpJEGXtCPcCZv7YFhUT2ZOHe8j3mnavPrLjbbFD0Ns=.e514c8c3-ee1f-4e0d-a9ae-a83171959a0e@github.com>
References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
 <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com>
 <K2EJ43EXkTgJE0pjwzy50s3BoTAhF1Y2trwHtDzhojQ=.e837c7bc-717c-4826-8cc3-82a2232bc928@github.com>
 <iFxcPJTPGoxZgIaQKYtEbtg06xXYJewHfSA-f7nbofQ=.37070a3a-681b-4ccb-8857-91be898fd3c9@github.com>
 <WOpJEGXtCPcCZv7YFhUT2ZOHe8j3mnavPrLjbbFD0Ns=.e514c8c3-ee1f-4e0d-a9ae-a83171959a0e@github.com>
Message-ID: <1Y2PaVuIsawmIC7NnLuk4I7WLDmHC55dAamEe3M_gOM=.12267ab4-ecaf-4cd7-8f80-b1c6cbf57507@github.com>

On Fri, 12 Jul 2024 13:19:31 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> > The runtime use of the Access API knows how to resolve an unknown oop ref strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. However, we do not have support for that in the C2 backend. In fact, it does not understand non-strong oop stores at all.
> 
> Aw, nice usability landmine. I thought C2 barrier set would assert on me if it cannot deliver. Apparently not, [...]

Reference.refersTo has similar issues.  See refersToImpl and refersTo0 in both Reference and PhantomReference.
I think you should be able to model on those and the intrinsic implementation for refersTo to get what you want.

One additional complication is that Reference.enqueue intentionally calls clear0.  If implementing clear similarly
to refersTo, then enqueue should be changed to call clearImpl.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2228872926

From kbarrett at openjdk.org  Wed Jul 17 18:52:13 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Wed, 17 Jul 2024 18:52:13 GMT
Subject: RFR: 8329597: C2: Intrinsify Reference.clear
In-Reply-To: <1Y2PaVuIsawmIC7NnLuk4I7WLDmHC55dAamEe3M_gOM=.12267ab4-ecaf-4cd7-8f80-b1c6cbf57507@github.com>
References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
 <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com>
 <K2EJ43EXkTgJE0pjwzy50s3BoTAhF1Y2trwHtDzhojQ=.e837c7bc-717c-4826-8cc3-82a2232bc928@github.com>
 <iFxcPJTPGoxZgIaQKYtEbtg06xXYJewHfSA-f7nbofQ=.37070a3a-681b-4ccb-8857-91be898fd3c9@github.com>
 <WOpJEGXtCPcCZv7YFhUT2ZOHe8j3mnavPrLjbbFD0Ns=.e514c8c3-ee1f-4e0d-a9ae-a83171959a0e@github.com>
 <1Y2PaVuIsawmIC7NnLuk4I7WLDmHC55dAamEe3M_gOM=.12267ab4-ecaf-4cd7-8f80-b1c6cbf57507@github.com>
Message-ID: <bK0RXMOH98fi_QKz0ueMaiEh0GotFD6fHA9D-RBDXoM=.982af5c2-25fe-47bf-afa3-e08ca5291661@github.com>

On Mon, 15 Jul 2024 16:09:39 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> > Aw, nice usability landmine. I thought C2 barrier set would assert on me if it cannot deliver. Apparently not, [...]
> 
> Reference.refersTo has similar issues. See refersToImpl and refersTo0 in both Reference and PhantomReference. I think you should be able to model on those and the intrinsic implementation for refersTo to get what you want.
> 
> One additional complication is that Reference.enqueue intentionally calls clear0. If implementing clear similarly to refersTo, then enqueue should be changed to call clearImpl.

I should have read what I was replying to more carefully, rather than focusing on what was further up in the thread.
Looks like you (@shipilev) already spotted the refersTo stuff.  But the enqueue => clear0 could have easily been missed,
so perhaps not an entirely unneeded suggestion.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2231464762

From shade at openjdk.org  Wed Jul 17 18:52:13 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Wed, 17 Jul 2024 18:52:13 GMT
Subject: RFR: 8329597: C2: Intrinsify Reference.clear
In-Reply-To: <WOpJEGXtCPcCZv7YFhUT2ZOHe8j3mnavPrLjbbFD0Ns=.e514c8c3-ee1f-4e0d-a9ae-a83171959a0e@github.com>
References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
 <o7zszGQ4GxfAx_LutX6S8rCLrZVHro9Ggreo5tICcvw=.825e4096-7b13-4ce5-b5cc-53e8d5603ecf@github.com>
 <K2EJ43EXkTgJE0pjwzy50s3BoTAhF1Y2trwHtDzhojQ=.e837c7bc-717c-4826-8cc3-82a2232bc928@github.com>
 <iFxcPJTPGoxZgIaQKYtEbtg06xXYJewHfSA-f7nbofQ=.37070a3a-681b-4ccb-8857-91be898fd3c9@github.com>
 <WOpJEGXtCPcCZv7YFhUT2ZOHe8j3mnavPrLjbbFD0Ns=.e514c8c3-ee1f-4e0d-a9ae-a83171959a0e@github.com>
Message-ID: <45m_iZZsJLn9OJowCePyhoipmHYfepPVN7GyrTgaaz0=.52fcb527-3318-4eb6-91e6-09b868e9ea32@github.com>

On Fri, 12 Jul 2024 13:19:31 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>>> > The reason we did not do this before is that this is not a strong reference store. Strong reference stores with a SATB collector will keep the referent alive, which is typically the exact opposite of what a user wants when they clear a Reference.
>>> 
>>> You mean not doing this store just on the Java side? Yes, I agree, it would be awkward. In intrinsic, we are storing with the same decorators that `JVM_ReferenceClear` is using, which should be good with SATB collectors. Perhaps I am misunderstanding the comment.
>> 
>> The runtime use of the Access API knows how to resolve an unknown oop ref strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. However, we do not have support for that in the C2 backend. In fact, it does not understand non-strong oop stores at all. Because there hasn't really been a use case for it, other than clearing a Reference. That's the precise reason why we do not have a clear intrinsic; it would have to add that infrastructure.
>
>> The runtime use of the Access API knows how to resolve an unknown oop ref strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. However, we do not have support for that in the C2 backend. In fact, it does not understand non-strong oop stores at all. 
> 
> Aw, nice usability landmine. I thought C2 barrier set would assert on me if it cannot deliver. Apparently not, I see it just does pre-barriers when it is not sure what strongness the store is. Hrmpf. OK, let me see what can be done here. It might be just easier to further specialize `Reference.clear` in subclasses and carry down the actual strongness, like we do with `refersTo0` currently. This would still require C2 backend adjustments to handle `AS_NO_KEEPALIVE` on stores, but at least we would not have to guess about the strongness type in C2 intrinsic.

> I should have read what I was replying to more carefully, rather than focusing on what was further up in the thread. Looks like you (@shipilev) already spotted the refersTo stuff. But the enqueue => clear0 could have easily been missed, so perhaps not an entirely unneeded suggestion.

Yeah, thanks. The `enqueue => clear0` was indeed easy to miss.

Pushed the crude prototype that follows `refersTo` example and drills some new `AS_NO_KEEPALIVE` holes in C2 Access API to cover this intrinsic case. Super untested. IR tests are still failing, I'll take more in-depth look there. (Perhaps it would not be possible to clearly match the absence of pre-barrier in IR tests, we'll see.)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2231613218

From shade at openjdk.org  Wed Jul 17 18:52:14 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Wed, 17 Jul 2024 18:52:14 GMT
Subject: RFR: 8329597: C2: Intrinsify Reference.clear
In-Reply-To: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
Message-ID: <QoMe48MGvE2JlplJD5N6KP-LU5Tbk8XNavLLySPeP-Q=.cbb17fb0-9496-472c-95df-f844452dfc9b@github.com>

On Thu, 11 Jul 2024 15:28:37 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks.
> 
> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. 
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [ ] Linux AArch64 server fastdebug, `all`

Split out the `refersTo` test to https://github.com/openjdk/jdk/pull/20215.

Yeah, so this version seems to work well on tests.

I am being extra paranoid about only accepting `null` stores, since `AS_NO_KEEPALIVE` means all other barriers like inter-generational post-barriers in G1 should still work. G1 barrier set delegates the stores to `CardTable/ModRefBarrierSet`, which: a) does not know which barriers can be bypassed by `AS_NO_KEEPALIVE`; b) calls back `G1BarrierSet` for prebarrier generation, but already loses the decorators. So the simplest way to deal with this is to handle this special case specially.

I think this is insanely sane, given how sharp-edged `AS_NO_KEEPALIVE` is.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2233066721
PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2234005550

From vlivanov at openjdk.org  Wed Jul 17 18:57:32 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Jul 2024 18:57:32 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter
In-Reply-To: <xNV7-nhHtDKME2kWU_k3bKZJId61Ii_CW12KMQvd0IY=.03b01561-0358-4635-9d1c-ee931f14f12f@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com>
 <FMWMwnwhdReuAohf_e_EWQN7yFM6WNl8Hv0_0S7goek=.9004d9f0-5755-471e-a120-6b6e83c8ebbd@github.com>
 <xNV7-nhHtDKME2kWU_k3bKZJId61Ii_CW12KMQvd0IY=.03b01561-0358-4635-9d1c-ee931f14f12f@github.com>
Message-ID: <Eq4u2V3UeGi1VeGyEtA0FPS0sKoqAwqCgw5RmfRww-Y=.7dea6a8d-b59c-49b7-8b31-480b970d3de8@github.com>

On Wed, 17 Jul 2024 18:46:11 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> This is because the C++ runtime does secondary super cache lookups even before the bitmap has been calculated and the hash table sorted. In this case the bitmap is zero, so teh search thinks there are no secondary supers. Setting _bitmap to SECONDARY_SUPERS_BITMAP_FULL forces a linear search.
>> 
>> I guess there might be a better way to do this. Perhaps a comment is needed?
>> 
>> I agree about `_secondary_supers_bitmap` name.
>
> Now it starts to sound concerning... `Klass::set_secondary_supers()` initializes both `_secondary_supers` and `_bitmap` which implies that `Klass::is_subtype_of()` may be called on not yet initialized Klass. It that's the case, it does look like a bug on its own. How is it expected to work when `_secondary_supers` hasn't been set yet?

On a second thought the following setter may be the culprit:

void Klass::set_secondary_supers(Array<Klass*>* secondaries) {
  assert(!UseSecondarySupersTable || secondaries == nullptr, "");
  set_secondary_supers(secondaries, SECONDARY_SUPERS_BITMAP_EMPTY);
}

It should be adjusted to set `SECONDARY_SUPERS_BITMAP_FULL` instead.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1681581587

From uschindler at openjdk.org  Wed Jul 17 19:03:34 2024
From: uschindler at openjdk.org (Uwe Schindler)
Date: Wed, 17 Jul 2024 19:03:34 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v7]
In-Reply-To: <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com>
Message-ID: <gYDpM3O_h6F4SdlH8Yo4kCW47pKh6GxM__8P6FOJI9A=.0d4eebbb-d38e-48f8-94be-7de417701071@github.com>

On Wed, 17 Jul 2024 15:19:18 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   benchmark review comments

Marked as reviewed by uschindler (Author).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20158#pullrequestreview-2183805940

From liach at openjdk.org  Wed Jul 17 19:54:00 2024
From: liach at openjdk.org (Chen Liang)
Date: Wed, 17 Jul 2024 19:54:00 GMT
Subject: RFR: 8334772: Change Class::signers to an explicit field
Message-ID: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>

`Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.)

Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR.

-------------

Commit messages:
 - 8334772: Change Class::signers to an explicit field

Changes: https://git.openjdk.org/jdk/pull/20223/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20223&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8334772
  Stats: 71 lines in 6 files changed: 7 ins; 53 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/20223.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20223/head:pull/20223

PR: https://git.openjdk.org/jdk/pull/20223

From szaldana at openjdk.org  Wed Jul 17 19:59:10 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Wed, 17 Jul 2024 19:59:10 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v3]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <nV9Wgx-NgJ1u0fkiRpE0g_HB1fSFvXohGzMYAV5b6jY=.ac6dd18b-8e67-41ae-9db7-2c4b6ef029b9@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  Updates based on feedback

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/eea54f6d..3bb774d3

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=01-02

  Stats: 35 lines in 4 files changed: 0 ins; 12 del; 23 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From vlivanov at openjdk.org  Wed Jul 17 20:37:35 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Jul 2024 20:37:35 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v7]
In-Reply-To: <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <tTrbqFeAXbr7uyNTiBPBts2WGPtSIqcuoZryE6T1_eY=.42caae9e-b377-457b-8e18-9bf2a3c15cf7@github.com>
Message-ID: <AOvJNfA2r8_pXv_UzMA3z48O91R1lDwXKTkjw2rqCMY=.cf0bcda3-f0da-4d39-b2dc-ae847729c27e@github.com>

On Wed, 17 Jul 2024 15:19:18 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
>> 
>> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
>> 
>> In this PR:
>> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
>> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
>> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
>> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
>> 
>> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
>> 
>> 
>> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
>> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
>> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
>> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
>> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
>> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
>> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
>> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
>> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
>> ConcurrentClose.sharedAccess        1   avgt   10  ...
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   benchmark review comments

Looks good.

-------------

Marked as reviewed by vlivanov (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20158#pullrequestreview-2183984622

From kvn at openjdk.org  Wed Jul 17 21:16:33 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Wed, 17 Jul 2024 21:16:33 GMT
Subject: RFR: 8335860:
 compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard
 AVX/SSE settings [v3]
In-Reply-To: <89mJFTY-O4WgqC7eYEu125ehHkgVCFtecRPhSQuEisI=.2b86f885-8225-4103-9dcb-6a4be73bad71@github.com>
References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
 <89mJFTY-O4WgqC7eYEu125ehHkgVCFtecRPhSQuEisI=.2b86f885-8225-4103-9dcb-6a4be73bad71@github.com>
Message-ID: <4duGDJ3x3AOCD4JWUpJ8RQQn-gV4DjQlZcDupfgYaH0=.5a57d7d2-7eac-4e66-a2e3-43454abefc00@github.com>

On Wed, 17 Jul 2024 05:45:05 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Enabling test with explicit feature checks for x86 target.
>> Removing from test/hotspot/jtreg/ProblemList.txt
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
> 
>  - Restoring earlier comment
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8335860
>  - Review suggestions incorporated
>  - 8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings

My tier1-3 testing passed.

-------------

Marked as reviewed by kvn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20160#pullrequestreview-2184074033

From dholmes at openjdk.org  Wed Jul 17 22:01:33 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 17 Jul 2024 22:01:33 GMT
Subject: RFR: 8334772: Change Class::signers to an explicit field
In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
Message-ID: <po0gx6r_TqVL3Dv1kD5KNDaU6xsqfzFfK0h3sQsgB7E=.aee9ed05-0345-4b0e-ae87-566fe0bdeebf@github.com>

On Wed, 17 Jul 2024 19:47:44 GMT, Chen Liang <liach at openjdk.org> wrote:

> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.)
> 
> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR.

The relocation of the field to Java looks good. But I am concerned that the field is now exposed to reflection.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20223#pullrequestreview-2184157888

From jkarthikeyan at openjdk.org  Wed Jul 17 22:50:31 2024
From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan)
Date: Wed, 17 Jul 2024 22:50:31 GMT
Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and
 Math.min(long,long)
In-Reply-To: <MWiyM5dWze8wwUA4nKY5V-TiH98NO5qRlG3UcA3QbKw=.3c1c0d0f-de0c-485b-a5d0-f18c77aa32a4@github.com>
References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com>
 <l3QGajoAAxigBK5cfIYwdGPTKfbJJJLvnSYisn7O7x8=.15bd4030-3af2-4d3a-a013-8f9c392223f1@github.com>
 <MWiyM5dWze8wwUA4nKY5V-TiH98NO5qRlG3UcA3QbKw=.3c1c0d0f-de0c-485b-a5d0-f18c77aa32a4@github.com>
Message-ID: <idYp67bpPuveqjhqTLQOL2T1DotalBWz_m8iRayFsts=.f8334e0e-2061-458e-86f9-c009444748e6@github.com>

On Wed, 17 Jul 2024 09:18:31 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

> Do you want a microbenchmark for the performance of vectorized max/min long?

Yeah, I think a simple benchmark that tests for long min/max vectorization and reduction would be good. I worry that checking performance manually like in `ReductionPerf` can lead to harder to interpret results than with a microbenchmark, especially with vm warmup ? Thanks for looking into this!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2234515775

From cjplummer at openjdk.org  Thu Jul 18 00:23:36 2024
From: cjplummer at openjdk.org (Chris Plummer)
Date: Thu, 18 Jul 2024 00:23:36 GMT
Subject: RFR: 8336587: failure_handler lldb command times out on
 macosx-aarch64 core file
In-Reply-To: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com>
References: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com>
Message-ID: <gbd7tPdXO0cC1Xj4UGVIhoviY8yd53OtsE1GL4YKcfs=.9d3a4d34-17c1-4b6c-84a1-4c3fab1c545c@github.com>

On Tue, 16 Jul 2024 23:59:09 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

> I was looking at the failure_handler output for the lldb command on a macosx-aarch64 core file (it is trying to use lldb to get a back trace of all threads), and noticed it timed out:
> 
> 
> ----------------------------------------
> [2024-07-15 05:15:47] [<snip>/usr/bin/lldb, --core, <snip>/core.92643, <snip>/bin/java, -o, thread backtrace all, -o, quit] timeout=20000 in <snip>
> ----------------------------------------
> (lldb) target create "<snip>/bin/java" --core "<snip>/core.92643"
> WARNING: tool timed out: killed process after 20000 ms
> ----------------------------------------
> [2024-07-15 05:16:07] exit code: -2 time: 20163 ms
> ----------------------------------------
> 
> 
> 20 seconds is the failure_handler default timeout for all commands. Core files on macosx-aarch64 tend to be very large. This one was over 13gb. On my MBPro it took 30 seconds. I bumped up the timeout to 60 seconds and reproduce the same crash in mach5 (more than once), and it usually took about 55 seconds for the lldb command, but it did succeed with the longer timeout. I think we should change the timeout to even more than 60 seconds just to make sure we won't see timeouts. 120 seconds is probably a good amount.

Thanks for the reviews Jai, Dean, and David!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20206#issuecomment-2234871521

From cjplummer at openjdk.org  Thu Jul 18 00:23:36 2024
From: cjplummer at openjdk.org (Chris Plummer)
Date: Thu, 18 Jul 2024 00:23:36 GMT
Subject: Integrated: 8336587: failure_handler lldb command times out on
 macosx-aarch64 core file
In-Reply-To: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com>
References: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com>
Message-ID: <4uOQrbD6VqzPOFKV6XZmmVO65XfBiNIlC10HFLpIhzk=.5d9c634e-d546-458c-8e8c-8457ea825fac@github.com>

On Tue, 16 Jul 2024 23:59:09 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

> I was looking at the failure_handler output for the lldb command on a macosx-aarch64 core file (it is trying to use lldb to get a back trace of all threads), and noticed it timed out:
> 
> 
> ----------------------------------------
> [2024-07-15 05:15:47] [<snip>/usr/bin/lldb, --core, <snip>/core.92643, <snip>/bin/java, -o, thread backtrace all, -o, quit] timeout=20000 in <snip>
> ----------------------------------------
> (lldb) target create "<snip>/bin/java" --core "<snip>/core.92643"
> WARNING: tool timed out: killed process after 20000 ms
> ----------------------------------------
> [2024-07-15 05:16:07] exit code: -2 time: 20163 ms
> ----------------------------------------
> 
> 
> 20 seconds is the failure_handler default timeout for all commands. Core files on macosx-aarch64 tend to be very large. This one was over 13gb. On my MBPro it took 30 seconds. I bumped up the timeout to 60 seconds and reproduce the same crash in mach5 (more than once), and it usually took about 55 seconds for the lldb command, but it did succeed with the longer timeout. I think we should change the timeout to even more than 60 seconds just to make sure we won't see timeouts. 120 seconds is probably a good amount.

This pull request has now been integrated.

Changeset: 21a6cf84
Author:    Chris Plummer <cjplummer at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/21a6cf848da00c795d833f926f831c7aea05dfa3
Stats:     4 lines in 1 file changed: 3 ins; 0 del; 1 mod

8336587: failure_handler lldb command times out on macosx-aarch64 core file

Reviewed-by: dlong, dholmes, jpai

-------------

PR: https://git.openjdk.org/jdk/pull/20206

From lmesnik at openjdk.org  Thu Jul 18 00:49:34 2024
From: lmesnik at openjdk.org (Leonid Mesnik)
Date: Thu, 18 Jul 2024 00:49:34 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v3]
In-Reply-To: <nV9Wgx-NgJ1u0fkiRpE0g_HB1fSFvXohGzMYAV5b6jY=.ac6dd18b-8e67-41ae-9db7-2c4b6ef029b9@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <nV9Wgx-NgJ1u0fkiRpE0g_HB1fSFvXohGzMYAV5b6jY=.ac6dd18b-8e67-41ae-9db7-2c4b6ef029b9@github.com>
Message-ID: <-1Ejx0t2f_Q0Hl9nKsil_C2hweWnB9pTdvbID9_OMtQ=.fa91c0d0-fb81-4e12-b946-a9cdc1125c6c@github.com>

On Wed, 17 Jul 2024 19:59:10 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Updates based on feedback

Changes requested by lmesnik (Reviewer).

src/hotspot/share/code/codeCache.cpp line 1791:

> 1789: 
> 1790: #ifdef LINUX
> 1791: void CodeCache::write_perf_map(const char* filename, outputStream* out) {

I don't think that it is a right place ti expand arguments here. I think it is more consistent to do it in diagnosticCommand.cpp for any jcmd command.  So this logic might be the same for any _filename processing. Moreover it would be better be add 'FileArgument' like 'MemorySizeArgument' that correctly parse patterns like %p for all file arguments, used be all commands and be extensible for new macroses if needed.

test/jdk/sun/tools/jcmd/TestJcmdPIDSubstitution.java line 32:

> 30:  * @modules java.base/jdk.internal.misc
> 31:  *          java.management
> 32:  * @run main/othervm TestJcmdPIDSubstitution

Why othervm is needed here?  I suggest to add driver mode instead.

test/jdk/sun/tools/jcmd/TestJcmdPIDSubstitution.java line 59:

> 57:         do {
> 58:             path = Paths.get("myfile%d".formatted(pid));
> 59:         } while(Files.exists(path));

Why this do/while loop is needed? Each test should have it's own empty scratch directory.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2184333180
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681931084
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681921063
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1681920781

From dholmes at openjdk.org  Thu Jul 18 03:44:31 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 18 Jul 2024 03:44:31 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v2]
In-Reply-To: <onfNeoQzCfo3lsgAdomCn6xxQx3nsVVk8h8h3gQDJl8=.b8c3a559-fad4-4b60-b22f-f07fd5f0b807@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
 <0C2xrw7X8gn7dl7LWNZu9lh5XJjvOSNbA0PRqa6ydoM=.29d1d6ee-242f-4ab5-abaa-d2113d030f82@github.com>
 <j1xFGdRG38i_hvtMSBHJeHVlC4-HTghiPnz1aTEKY8Q=.cec14e3f-5274-420a-9683-1a90ce86aefc@github.com>
 <onfNeoQzCfo3lsgAdomCn6xxQx3nsVVk8h8h3gQDJl8=.b8c3a559-fad4-4b60-b22f-f07fd5f0b807@github.com>
Message-ID: <0dEcF8Fq29rrS4qhGPuhmWoU8Cpw9vN7FO8yWLtXlTo=.08a889de-2632-4b9c-b814-1414bec74f4d@github.com>

On Wed, 17 Jul 2024 06:25:55 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> AFAICS you start abridging if length is exactly max_length

Where do you see that? I have:

 bool abridge = length > max_length;


>  it may seem just strange to replace a small inner portion with a larger "omitted" text, because then the text plus replacement is larger than the original text,

That is not a practical concern as max_length is expected to be much larger than the replacement text. We don't need the added complexity of the "stretch" zone.

I'm open to changing the omitted text portion though, to include the number of characters omitted.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1682067434

From stuefe at openjdk.org  Thu Jul 18 05:03:31 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 18 Jul 2024 05:03:31 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v3]
In-Reply-To: <-1Ejx0t2f_Q0Hl9nKsil_C2hweWnB9pTdvbID9_OMtQ=.fa91c0d0-fb81-4e12-b946-a9cdc1125c6c@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <nV9Wgx-NgJ1u0fkiRpE0g_HB1fSFvXohGzMYAV5b6jY=.ac6dd18b-8e67-41ae-9db7-2c4b6ef029b9@github.com>
 <-1Ejx0t2f_Q0Hl9nKsil_C2hweWnB9pTdvbID9_OMtQ=.fa91c0d0-fb81-4e12-b946-a9cdc1125c6c@github.com>
Message-ID: <doTMEY5xsducmkODj09ef6yA6eG_GzsEyPpvjKo_Yzo=.b8e03eec-59cb-466e-bb14-d9f881fce861@github.com>

On Thu, 18 Jul 2024 00:45:24 GMT, Leonid Mesnik <lmesnik at openjdk.org> wrote:

>> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Updates based on feedback
>
> src/hotspot/share/code/codeCache.cpp line 1791:
> 
>> 1789: 
>> 1790: #ifdef LINUX
>> 1791: void CodeCache::write_perf_map(const char* filename, outputStream* out) {
> 
> I don't think that it is a right place ti expand arguments here. I think it is more consistent to do it in diagnosticCommand.cpp for any jcmd command.  So this logic might be the same for any _filename processing. Moreover it would be better be add 'FileArgument' like 'MemorySizeArgument' that correctly parse patterns like %p for all file arguments, used be all commands and be extensible for new macroses if needed.

That's a good idea.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1682118709

From dholmes at openjdk.org  Thu Jul 18 06:46:03 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 18 Jul 2024 06:46:03 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v3]
In-Reply-To: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
Message-ID: <A9iMU-hV7SpgO7s8zSdIi-FeoH21jRYIQvcDDqmY860=.20798082-5958-4338-aebe-245b9153f269@github.com>

> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log.
> 
> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN.
> 
> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless.
> 
> If a string's length exceeds `max_length` then we print it as follows:
> 
> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged)
> 
> For example if we print "ABCDE" with a max_length of 4 then the output is literally:
> 
> "AB ... DE" (abridged)
> 
> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive).
> 
> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`.
> 
> Testing:
>  - new test added for validation purposes
>  - tiers 1 - 3 as sanity testing
> 
> Thanks

David Holmes has updated the pull request incrementally with one additional commit since the last revision:

  Change output format to show number of characters ommitted as a suggested by @tstuefe

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20150/files
  - new: https://git.openjdk.org/jdk/pull/20150/files/39256bd3..6d445dbf

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20150&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20150&range=01-02

  Stats: 19 lines in 2 files changed: 11 ins; 0 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/20150.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20150/head:pull/20150

PR: https://git.openjdk.org/jdk/pull/20150

From dholmes at openjdk.org  Thu Jul 18 06:52:44 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 18 Jul 2024 06:52:44 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v4]
In-Reply-To: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
Message-ID: <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com>

> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log.
> 
> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN.
> 
> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless.
> 
> If a string's length exceeds `max_length` then we print it as follows:
> 
> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged)
> 
> For example if we print "ABCDE" with a max_length of 4 then the output is literally:
> 
> "AB ... DE" (abridged)
> 
> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive).
> 
> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`.
> 
> Testing:
>  - new test added for validation purposes
>  - tiers 1 - 3 as sanity testing
> 
> Thanks

David Holmes has updated the pull request incrementally with one additional commit since the last revision:

  Update comment

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20150/files
  - new: https://git.openjdk.org/jdk/pull/20150/files/6d445dbf..e45fb904

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20150&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20150&range=02-03

  Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20150.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20150/head:pull/20150

PR: https://git.openjdk.org/jdk/pull/20150

From dholmes at openjdk.org  Thu Jul 18 06:52:44 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 18 Jul 2024 06:52:44 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v2]
In-Reply-To: <SBNE7wMZ0WMp1bQzyK9EASI6EOXgtVPSJw1uWCqRFko=.947c9a5d-ec2e-450a-a7ca-6272470827ae@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
 <SBNE7wMZ0WMp1bQzyK9EASI6EOXgtVPSJw1uWCqRFko=.947c9a5d-ec2e-450a-a7ca-6272470827ae@github.com>
Message-ID: <e-kCgxAGG_3lxzWCjgLpyPjsbyzPKgenDF0MrBgfUZg=.a02583e4-e7ce-4d38-bf7c-e1d9b96f0822@github.com>

On Wed, 17 Jul 2024 05:32:26 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log.
>> 
>> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN.
>> 
>> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless.
>> 
>> If a string's length exceeds `max_length` then we print it as follows:
>> 
>> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged)
>> 
>> For example if we print "ABCDE" with a max_length of 4 then the output is literally:
>> 
>> "AB ... DE" (abridged)
>> 
>> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive).
>> 
>> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`.
>> 
>> Testing:
>>  - new test added for validation purposes
>>  - tiers 1 - 3 as sanity testing
>> 
>> Thanks
>
> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fixed grammar

I've updated per Thomas's suggestion to show the number of characters omitted. However I kept the " (abridged)" part as well as with actual very long strings you are more likely to spot that at the end than the "... (N characters ommitted) ..." in the middle.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20150#issuecomment-2235749703

From stuefe at openjdk.org  Thu Jul 18 07:28:33 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 18 Jul 2024 07:28:33 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v4]
In-Reply-To: <0dEcF8Fq29rrS4qhGPuhmWoU8Cpw9vN7FO8yWLtXlTo=.08a889de-2632-4b9c-b814-1414bec74f4d@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
 <0C2xrw7X8gn7dl7LWNZu9lh5XJjvOSNbA0PRqa6ydoM=.29d1d6ee-242f-4ab5-abaa-d2113d030f82@github.com>
 <j1xFGdRG38i_hvtMSBHJeHVlC4-HTghiPnz1aTEKY8Q=.cec14e3f-5274-420a-9683-1a90ce86aefc@github.com>
 <onfNeoQzCfo3lsgAdomCn6xxQx3nsVVk8h8h3gQDJl8=.b8c3a559-fad4-4b60-b22f-f07fd5f0b807@github.com>
 <0dEcF8Fq29rrS4qhGPuhmWoU8Cpw9vN7FO8yWLtXlTo=.08a889de-2632-4b9c-b814-1414bec74f4d@github.com>
Message-ID: <DRT4whSKMivdfabSUnQmrmPKcWXpDjouRmZ3J-DnymY=.16e8cff6-2cac-4728-a215-c83565da1da2@github.com>

On Thu, 18 Jul 2024 03:42:13 GMT, David Holmes <dholmes at openjdk.org> wrote:

> That is not a practical concern as max_length is expected to be much larger than the replacement text. We don't need the added complexity of the "stretch" zone.

> I'm open to changing the omitted text portion though, to include the number of characters omitted.

Okay, that is fine. I tend to overengineer :)

Thanks for taking my proposal about the omit text form.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1682336874

From stuefe at openjdk.org  Thu Jul 18 07:28:35 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 18 Jul 2024 07:28:35 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v4]
In-Reply-To: <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
 <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com>
Message-ID: <Iew9krH7gjDdTlpQ2MdEx7ujmdH9wjvn_ZcvrzT7hZw=.52e8d21d-a46f-45c5-a902-43e33602eb23@github.com>

On Thu, 18 Jul 2024 06:52:44 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log.
>> 
>> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN.
>> 
>> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless.
>> 
>> If a string's length exceeds `max_length` then we print it as follows:
>> 
>> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged)
>> 
>> For example if we print "ABCDE" with a max_length of 4 then the output is literally:
>> 
>> "AB ... DE" (abridged)
>> 
>> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive).
>> 
>> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`.
>> 
>> Testing:
>>  - new test added for validation purposes
>>  - tiers 1 - 3 as sanity testing
>> 
>> Thanks
>
> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update comment

src/hotspot/share/classfile/javaClasses.cpp line 799:

> 797:   if (length > max_length) {
> 798:     st->print(" (abridged) ");
> 799:   }

Do we still need this marker?

src/hotspot/share/runtime/globals.hpp line 1312:

> 1310:           "printed with the middle of the string elided.")                  \
> 1311:           range(2, O_BUFLEN)                                                \
> 1312:                                                                             \

This would make sense as a product diagnostic switch. You want to be able to increase this at a customer if needed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1682332121
PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1682334543

From dholmes at openjdk.org  Thu Jul 18 07:38:38 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 18 Jul 2024 07:38:38 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v4]
In-Reply-To: <Iew9krH7gjDdTlpQ2MdEx7ujmdH9wjvn_ZcvrzT7hZw=.52e8d21d-a46f-45c5-a902-43e33602eb23@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
 <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com>
 <Iew9krH7gjDdTlpQ2MdEx7ujmdH9wjvn_ZcvrzT7hZw=.52e8d21d-a46f-45c5-a902-43e33602eb23@github.com>
Message-ID: <L5vy6_40BG_rUhPcrgefCcMpv4SwmLa02iPC3emzAhk=.ad0e7c0e-ab6a-433b-996b-6230d36a4586@github.com>

On Thu, 18 Jul 2024 07:23:16 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update comment
>
> src/hotspot/share/classfile/javaClasses.cpp line 799:
> 
>> 797:   if (length > max_length) {
>> 798:     st->print(" (abridged) ");
>> 799:   }
> 
> Do we still need this marker?

See my comment above:
https://github.com/openjdk/jdk/pull/20150#issuecomment-2235749703

> src/hotspot/share/runtime/globals.hpp line 1312:
> 
>> 1310:           "printed with the middle of the string elided.")                  \
>> 1311:           range(2, O_BUFLEN)                                                \
>> 1312:                                                                             \
> 
> This would make sense as a product diagnostic switch. You want to be able to increase this at a customer if needed.

This is modelled after the `MaxElementPrintSize` that precedes it.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1682353932
PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1682354725

From stuefe at openjdk.org  Thu Jul 18 07:45:31 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 18 Jul 2024 07:45:31 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v4]
In-Reply-To: <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
 <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com>
Message-ID: <5XGcYDWXJwui2ftrbZRG26kv_VqiFX79_rdrMP0sMdU=.4da6620f-b3f1-47f7-80ee-a0af0a154030@github.com>

On Thu, 18 Jul 2024 06:52:44 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log.
>> 
>> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN.
>> 
>> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless.
>> 
>> If a string's length exceeds `max_length` then we print it as follows:
>> 
>> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged)
>> 
>> For example if we print "ABCDE" with a max_length of 4 then the output is literally:
>> 
>> "AB ... DE" (abridged)
>> 
>> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive).
>> 
>> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`.
>> 
>> Testing:
>>  - new test added for validation purposes
>>  - tiers 1 - 3 as sanity testing
>> 
>> Thanks
>
> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update comment

Looks good.

-------------

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20150#pullrequestreview-2184984870

From stuefe at openjdk.org  Thu Jul 18 07:45:33 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Thu, 18 Jul 2024 07:45:33 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v4]
In-Reply-To: <L5vy6_40BG_rUhPcrgefCcMpv4SwmLa02iPC3emzAhk=.ad0e7c0e-ab6a-433b-996b-6230d36a4586@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
 <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com>
 <Iew9krH7gjDdTlpQ2MdEx7ujmdH9wjvn_ZcvrzT7hZw=.52e8d21d-a46f-45c5-a902-43e33602eb23@github.com>
 <L5vy6_40BG_rUhPcrgefCcMpv4SwmLa02iPC3emzAhk=.ad0e7c0e-ab6a-433b-996b-6230d36a4586@github.com>
Message-ID: <vgYER8TAjO_r2qrS16kO2CG1yoLg54AbJuxRbmI8Vas=.cef75cd4-1575-4cde-aef3-714d19f4ab0e@github.com>

On Thu, 18 Jul 2024 07:35:03 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> src/hotspot/share/classfile/javaClasses.cpp line 799:
>> 
>>> 797:   if (length > max_length) {
>>> 798:     st->print(" (abridged) ");
>>> 799:   }
>> 
>> Do we still need this marker?
>
> See my comment above:
> https://github.com/openjdk/jdk/pull/20150#issuecomment-2235749703

Oh that makes sense. Okay!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20150#discussion_r1682363411

From dnsimon at openjdk.org  Thu Jul 18 07:54:37 2024
From: dnsimon at openjdk.org (Doug Simon)
Date: Thu, 18 Jul 2024 07:54:37 GMT
Subject: RFR: 8336587: failure_handler lldb command times out on
 macosx-aarch64 core file
In-Reply-To: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com>
References: <L1fxCYdEJTI5I2mfuEWOkkTihGnPgioh2A2Q5f-qXwg=.4ba1fe74-0395-4a87-bf39-56af4080b55b@github.com>
Message-ID: <Z78N6x85yK7RtN89rqajPHdYGclCuLPcXEEpp7FSCuo=.9524b644-b456-4789-91a6-a39602447852@github.com>

On Tue, 16 Jul 2024 23:59:09 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

> I was looking at the failure_handler output for the lldb command on a macosx-aarch64 core file (it is trying to use lldb to get a back trace of all threads), and noticed it timed out:
> 
> 
> ----------------------------------------
> [2024-07-15 05:15:47] [<snip>/usr/bin/lldb, --core, <snip>/core.92643, <snip>/bin/java, -o, thread backtrace all, -o, quit] timeout=20000 in <snip>
> ----------------------------------------
> (lldb) target create "<snip>/bin/java" --core "<snip>/core.92643"
> WARNING: tool timed out: killed process after 20000 ms
> ----------------------------------------
> [2024-07-15 05:16:07] exit code: -2 time: 20163 ms
> ----------------------------------------
> 
> 
> 20 seconds is the failure_handler default timeout for all commands. Core files on macosx-aarch64 tend to be very large. This one was over 13gb. On my MBPro it took 30 seconds. I bumped up the timeout to 60 seconds and reproduce the same crash in mach5 (more than once), and it usually took about 55 seconds for the lldb command, but it did succeed with the longer timeout. I think we should change the timeout to even more than 60 seconds just to make sure we won't see timeouts. 120 seconds is probably a good amount.

Thanks for this change - thread dumps are often crucial for investigating time outs.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20206#issuecomment-2235856819

From fyang at openjdk.org  Thu Jul 18 08:28:36 2024
From: fyang at openjdk.org (Fei Yang)
Date: Thu, 18 Jul 2024 08:28:36 GMT
Subject: RFR: 8334999: RISC-V: implement AES single block
 encryption/decryption intrinsics [v3]
In-Reply-To: <IATUuy7OYBIasXTq1KFmVEjeg2eQ9qFM2UP5B0UhoHw=.7a112155-e875-4752-b6f4-fbeb56248759@github.com>
References: <iltry713BDlJr1GffgMQl5nYUL6mAhTXp9t-nAnrdu8=.631de5af-05b9-42d3-a7df-b593ef81128f@github.com>
 <F1yms2X9VVITjLPANuQqABre5E199ILHQ4ywpS4cicY=.3e2c0af1-8070-497a-bfa0-5732eb199974@github.com>
 <IATUuy7OYBIasXTq1KFmVEjeg2eQ9qFM2UP5B0UhoHw=.7a112155-e875-4752-b6f4-fbeb56248759@github.com>
Message-ID: <DJY55PklmzAYqbYNmhh4j-F6BJPVRd8O0aiqLDPdqEE=.30c393c8-c801-4625-b903-1db0a2e509ff@github.com>

On Tue, 9 Jul 2024 05:28:13 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Left a note on a side effect of generate_vle32_pack2
>
> Changes requested by fyang (Reviewer).

> As for comparison with the openssl version: first of all, thanks for the sources, @RealFYang! The main difference that I see is that they introduced three different different versions of encryption depending on the key sizes, which allows them to skip a couple of instructions, like when I did `vaesem_vv(res, vzero)` followed by `vxor_vv(res, res, vtemp1)`. So I thought it'll be more efficient to replace the current version by something openssl-lookalike. The only problem I see is increasing code size a bit. Please let me know if we are not interested in this change for some reason

Does `vaesz_vs` help in anyway? And what about the `generate_aescrypt_decryptBlock`?  [1]

[1] https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aes-riscv64-zvkned.pl#L451

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19960#issuecomment-2235925757

From alanb at openjdk.org  Thu Jul 18 09:43:33 2024
From: alanb at openjdk.org (Alan Bateman)
Date: Thu, 18 Jul 2024 09:43:33 GMT
Subject: RFR: 8334772: Change Class::signers to an explicit field
In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
Message-ID: <O9uAoVHWx7skQYM26-2L-6lKyUj1qxOnfIHvddTaHcI=.d0934aab-1113-4eb7-ab4e-7ff0cee26019@github.com>

On Wed, 17 Jul 2024 19:47:44 GMT, Chen Liang <liach at openjdk.org> wrote:

> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.)
> 
> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR.

Signers dates back to JDK 1.1, touching it now will shine light on long standing technical debt and other issues.

I think the main thing that is jumping out is that ClassLoader.setSigners (protected final method so may be called in subclasses) isn't fully specified and it is also missing a number of important checks. The method doesn't specify that the method ignores array classes or class objects for primitives. It doesn't say anything about the elements that aren't a Certificate are ignored. It doesn't specify null behavior either and doesn't say anything that the signers can change at any time. What's worse is that in the hands of a cowboy builder, it can be used to set or clear the signers for any Class, or keep a reference to the signers array and muck with them mid-flight. So lots of issues there.

Technical debt aside, I think the transformation looks okay, just a bit confusing to have signers declared under a comment "Set by VM", it's not clear the comment applies only to the classData before it. At some point I think we should put a question mark on the JVMTI JVMTI_HEAP_REFERENCE_SIGNERS heap ref and the HPROF heap dump CLASS record where there is a slot for the signers array. I don't see any need for these in 2024 and could be potentially be null'ed in the future (would require a JVM TI spec change of course).

On David's comment about exposing the field to code using Class.getDecalredField or Class.getDeclaredFields. Nosy code can use these methods to get a reference to the Field but it's not accessible by default. For now, code can using sun.misc.Unsafe but that is temporary and will go away. I don't have a strong opinion and no objection to adding it to the filter (which I think is what David is wondering about).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20223#issuecomment-2236065493

From jvernee at openjdk.org  Thu Jul 18 11:03:40 2024
From: jvernee at openjdk.org (Jorn Vernee)
Date: Thu, 18 Jul 2024 11:03:40 GMT
Subject: Integrated: 8335480: Only deoptimize threads if needed when closing
 shared arena
In-Reply-To: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
Message-ID: <GQmh0WOGyJZqSLpgnp5PdHpifM2VG1Q0Mz_Hlm6qKzo=.1a25625e-3cf4-4aa4-8041-69429fa3b803@github.com>

On Fri, 12 Jul 2024 13:57:23 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

> This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
> 
> Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
> 
> In this PR:
> - Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an `@Scoped` method, but are inside a compiled frame that has a scoped access somewhere inside of it.
> - I've separated the stack walking code (`for_scope_method`) from the code that checks for a reference to the arena being closed (`is_accessing_session`), and added logging code to the former. That also required changing vframe code to accept an `ouputStream*` rather than always printing to `tty`.
> - Added a new test (`TestConcurrentClose`), that tries to close many shared arenas at the same time, in order to stress that use case.
> - Added a new benchmark (`ConcurrentClose`), that stresses the cases where many threads are accessing and closing shared arenas.
> 
> I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
> 
> 
> Benchmark                     Threads   Mode  Cnt     Score     Error  Units
> ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ? 202.870  us/op
> ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ? 164.922  us/op
> ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ? 165.754  us/op
> ConcurrentClose.sharedAccess        8   avgt   10   593.828 ?   8.321  us/op
> ConcurrentClose.sharedAccess        7   avgt   10   470.700 ?  22.511  us/op
> ConcurrentClose.sharedAccess        6   avgt   10   386.697 ?  59.170  us/op
> ConcurrentClose.sharedAccess        5   avgt   10   291.157 ?   7.023  us/op
> ConcurrentClose.sharedAccess        4   avgt   10   209.178 ?   5.802  us/op
> ConcurrentClose.sharedAccess        1   avgt   10    52.042 ?   0.630  us/op
> ConcurrentClose.conf...

This pull request has now been integrated.

Changeset: 7bf53132
Author:    Jorn Vernee <jvernee at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/7bf531324404419e7de3e83e245d351e1a4e4499
Stats:     482 lines in 20 files changed: 393 ins; 19 del; 70 mod

8335480: Only deoptimize threads if needed when closing shared arena

Reviewed-by: mcimadamore, kvn, uschindler, vlivanov, eosterlund

-------------

PR: https://git.openjdk.org/jdk/pull/20158

From jbhateja at openjdk.org  Thu Jul 18 11:25:42 2024
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Thu, 18 Jul 2024 11:25:42 GMT
Subject: Integrated: 8335860:
 compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard
 AVX/SSE settings
In-Reply-To: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
References: <B1g5tLUcLIObnRz2TRvraHnj25qo9XBkqgOebAUqbGo=.c11e415c-3e77-48a1-baab-93856093cde6@github.com>
Message-ID: <RBZe07a04W-62A_HLu0MiEFv69JDrjujQbvHQ1EecMk=.397a3105-dc57-4028-b4d3-0285d458b6b2@github.com>

On Fri, 12 Jul 2024 18:26:26 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> Enabling test with explicit feature checks for x86 target.
> Removing from test/hotspot/jtreg/ProblemList.txt
> 
> Best Regards,
> Jatin

This pull request has now been integrated.

Changeset: 35df48e1
Author:    Jatin Bhateja <jbhateja at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/35df48e1b321d16f44ba924065143af67143cf95
Stats:     4 lines in 2 files changed: 0 ins; 3 del; 1 mod

8335860: compiler/vectorization/TestFloat16VectorConvChain.java fails with non-standard AVX/SSE settings

Reviewed-by: sviswanathan, kvn

-------------

PR: https://git.openjdk.org/jdk/pull/20160

From rkennke at openjdk.org  Thu Jul 18 11:33:40 2024
From: rkennke at openjdk.org (Roman Kennke)
Date: Thu, 18 Jul 2024 11:33:40 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
Message-ID: <OCv6QKq_A8dUaKUbnzSdEnlEqrMIcb6pUyLfObBFq-o=.1d78e62f-151c-403d-a291-fbab38c5f4d6@github.com>

On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision:
> 
>  - Remove try_read
>  - Add explicit to single parameter constructors
>  - Remove superfluous access specifier
>  - Remove unused include
>  - Update assert message OMCache::set_monitor
>  - Fix indentation
>  - Remove outdated comment LightweightSynchronizer::exit
>  - Remove logStream include
>  - Remove strange comment
>  - Fix javaThread include

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 77:

> 75:   using ConcurrentTable = ConcurrentHashTable<Config, MEMFLAGS::mtObjectMonitor>;
> 76: 
> 77:   ConcurrentTable* _table;

So you have a class ObjectMonitorWorld, which references the ConcurrentTable, which, internally also has its actual table. This is 3 dereferences to get to the actual table, if I counted correctly. I'd try to eliminate the outermost ObjectMonitorWorld class, or at least make it a global flat structure instead of a reference to a heap-allocated object. I think, because this is a structure that is global and would exist throughout the lifetime of the Java program anyway, it might be worth figuring out how to do the actual ConcurrentHashTable flat in the global structure, too.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1682682065

From mbaesken at openjdk.org  Thu Jul 18 11:37:35 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Thu, 18 Jul 2024 11:37:35 GMT
Subject: RFR: 8330144: Revise os::free_memory() [v2]
In-Reply-To: <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com>
References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
 <3tmcwY9jO3oa_xQevkj-VdwIt-VRvz-w2EWeoHAqpNw=.bcc48ae4-4dc8-4b67-8f1d-8f1d5350b8b4@github.com>
Message-ID: <hsyl5GR2ddiGPaY1gNEbkRT0zbLsALHg1ILn2bGzwAg=.6c3c54d2-f080-4cf3-8148-f9c69724149f@github.com>

On Wed, 10 Jul 2024 20:09:45 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> ### Summary
>> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting.  The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. 
>> 
>> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory.  
>> 
>> **Transparent huge pages:**
>> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted.
>> 
>> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. 
>> 
>> **Explicit huge pages:**
>> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter.
>> 
>> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo.  After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use.  Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead.  After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an imp...
>
> Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Minor cleanup and comments.
>  - rename to disclaim_memory and update test

Marked as reviewed by mbaesken (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20080#pullrequestreview-2185565822

From thartmann at openjdk.org  Thu Jul 18 11:54:32 2024
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Thu, 18 Jul 2024 11:54:32 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v4]
In-Reply-To: <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
 <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com>
Message-ID: <gEMfMw0RxHxCxbpXk8zf59ELEBGuY6T4j5xrk8iaq7I=.b0434363-0f67-40d4-9724-864a4cdbdaae@github.com>

On Thu, 18 Jul 2024 06:52:44 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log.
>> 
>> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN.
>> 
>> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless.
>> 
>> If a string's length exceeds `max_length` then we print it as follows:
>> 
>> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged)
>> 
>> For example if we print "ABCDE" with a max_length of 4 then the output is literally:
>> 
>> "AB ... DE" (abridged)
>> 
>> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive).
>> 
>> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`.
>> 
>> Testing:
>>  - new test added for validation purposes
>>  - tiers 1 - 3 as sanity testing
>> 
>> Thanks
>
> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update comment

Still looks good to me.

-------------

Marked as reviewed by thartmann (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20150#pullrequestreview-2185596550

From thartmann at openjdk.org  Thu Jul 18 11:54:38 2024
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Thu, 18 Jul 2024 11:54:38 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
Message-ID: <VjZBnADJJJzZtRFXXdS90tBzYVzRuS_3W9q1iBNex9k=.151572e1-8a4d-47a7-a195-e5bf31a2a8ac@github.com>

On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote:

> In the cases like:
> 
>   UNSAFE.putLong(address + off1 + 1030, lseed);
>   UNSAFE.putLong(address + 1023, lseed);
>   UNSAFE.putLong(address + off2 + 1001, lseed);
> 
> 
> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like:
> 
>   ldr  R10, [R15, #120]    # int ! Field: address
>   ldr  R11, [R16, #136]    # int ! Field: off1
>   ldr  R12, [R16, #144]    # int ! Field: off2
>   add  R11, R11, R10
>   mov R11, R11    # long -> ptr
>   add  R12, R12, R10
>   mov R10, R10    # long -> ptr
>   add R11, R11, #1030    # ptr
>   str  R17, [R11]    # int
>   add R10, R10, #1023    # ptr
>   str  R17, [R10]    # int
>   mov R10, R12    # long -> ptr
>   add R10, R10, #1001    # ptr
>   str  R17, [R10]    # int
> 
> 
> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly:
> 
>   ldr    x10, [x15,#120]
>   ldp    x11, x12, [x16,#136]
>   add    x11, x11, x10
>   add    x12, x12, x10
>   add    x11, x11, #0x406
>   str    x17, [x11]
>   add    x10, x10, #0x3ff
>   str    x17, [x10]
>   mov    x10, x12  <--- extra register copy
>   add    x10, x10, #0x3e9
>   str    x17, [x10]
> 
> 
> There is still one extra register copy, which we're trying to remove in this patch.
> 
> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition.
> 
> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so
> 
> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906

Sure, I'll run this through our testing and report back once it passed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2236307911

From coleenp at openjdk.org  Thu Jul 18 12:32:31 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Thu, 18 Jul 2024 12:32:31 GMT
Subject: RFR: 8334772: Change Class::signers to an explicit field
In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
Message-ID: <E6_2baf-dUa28dZyZdQlfDmCqJ7sPoIGdgsfJLxPYaU=.2c9e818f-ebfb-4ae8-8d42-0ecd860089a9@github.com>

On Wed, 17 Jul 2024 19:47:44 GMT, Chen Liang <liach at openjdk.org> wrote:

> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.)
> 
> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR.

I thought we moved this already.  There's a change in the heapDumper.cpp that probably has to change also, because I think we're now dumping signers twice (two lines).  The one in jvmtiTagMap.cpp reports the SIGNERS tag so that has to stay.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20223#pullrequestreview-2185676651

From liach at openjdk.org  Thu Jul 18 12:40:32 2024
From: liach at openjdk.org (Chen Liang)
Date: Thu, 18 Jul 2024 12:40:32 GMT
Subject: RFR: 8334772: Change Class::signers to an explicit field
In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
Message-ID: <5gvvLeGQfl_OU3uY4P2QTYTXDxZTt5-INzv-Yt4mpRM=.a49d4d77-793a-46f0-90cb-d219af508f37@github.com>

On Wed, 17 Jul 2024 19:47:44 GMT, Chen Liang <liach at openjdk.org> wrote:

> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.)
> 
> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR.

The `native` flag is not rendered in API spec, so indeed we can drop without a CSR.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20223#issuecomment-2236359362

From alanb at openjdk.org  Thu Jul 18 12:46:31 2024
From: alanb at openjdk.org (Alan Bateman)
Date: Thu, 18 Jul 2024 12:46:31 GMT
Subject: RFR: 8334772: Change Class::signers to an explicit field
In-Reply-To: <E6_2baf-dUa28dZyZdQlfDmCqJ7sPoIGdgsfJLxPYaU=.2c9e818f-ebfb-4ae8-8d42-0ecd860089a9@github.com>
References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
 <E6_2baf-dUa28dZyZdQlfDmCqJ7sPoIGdgsfJLxPYaU=.2c9e818f-ebfb-4ae8-8d42-0ecd860089a9@github.com>
Message-ID: <MbTytLxlPVZ_kNtGqlQ9nciUEmNcyZfd_QM1eDRARRE=.cff0b65e-8339-4dc0-b3dc-52944299002c@github.com>

On Thu, 18 Jul 2024 12:30:24 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> There's a change in the heapDumper.cpp that probably has to change also, because I think we're now dumping signers twice (two lines).

The HPROF heap dump has a slot for signers so have to keep that to avoid breakage. So yes, it means two refs as the signers will be in the instance fields list too. The HPROF format could be rev'ed but may not be worth it.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20223#issuecomment-2236404312

From coleenp at openjdk.org  Thu Jul 18 13:09:34 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Thu, 18 Jul 2024 13:09:34 GMT
Subject: RFR: 8334772: Change Class::signers to an explicit field
In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
Message-ID: <z6428CN4RgF6LTYx8R-MuHuPu_jerimhS5-XTjVvE6A=.b3b927f7-e7ca-4a90-afdb-2cee94d105f6@github.com>

On Wed, 17 Jul 2024 19:47:44 GMT, Chen Liang <liach at openjdk.org> wrote:

> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.)
> 
> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR.

Ok.  It's not obvious from the code but I don't think it's worth commenting.

-------------

Marked as reviewed by coleenp (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20223#pullrequestreview-2185765767

From liach at openjdk.org  Thu Jul 18 13:31:02 2024
From: liach at openjdk.org (Chen Liang)
Date: Thu, 18 Jul 2024 13:31:02 GMT
Subject: RFR: 8334772: Change Class::signers to an explicit field [v2]
In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
Message-ID: <Btsn_5ZHvYuNbW8Pjyyy43sSSz4TjVlW4tfyU1tUza4=.00cd3fb4-6074-484e-bead-3cfb7a3569b6@github.com>

> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.)
> 
> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR.

Chen Liang has updated the pull request incrementally with one additional commit since the last revision:

  Reorder comment of classData to avoid misunderstanding

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20223/files
  - new: https://git.openjdk.org/jdk/pull/20223/files/86b3a248..dd62b9d2

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20223&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20223&range=00-01

  Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20223.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20223/head:pull/20223

PR: https://git.openjdk.org/jdk/pull/20223

From alanb at openjdk.org  Thu Jul 18 13:36:33 2024
From: alanb at openjdk.org (Alan Bateman)
Date: Thu, 18 Jul 2024 13:36:33 GMT
Subject: RFR: 8334772: Change Class::signers to an explicit field [v2]
In-Reply-To: <Btsn_5ZHvYuNbW8Pjyyy43sSSz4TjVlW4tfyU1tUza4=.00cd3fb4-6074-484e-bead-3cfb7a3569b6@github.com>
References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
 <Btsn_5ZHvYuNbW8Pjyyy43sSSz4TjVlW4tfyU1tUza4=.00cd3fb4-6074-484e-bead-3cfb7a3569b6@github.com>
Message-ID: <2ZVM5wAKjhLfFx4CFBSyQ7yND6VIsMcNTxRubvcuXps=.c511f719-746a-4c01-b744-82f1f2b3619f@github.com>

On Thu, 18 Jul 2024 13:31:02 GMT, Chen Liang <liach at openjdk.org> wrote:

>> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.)
>> 
>> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR.
>
> Chen Liang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Reorder comment of classData to avoid misunderstanding

Marked as reviewed by alanb (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20223#pullrequestreview-2185842171

From duke at openjdk.org  Thu Jul 18 13:38:38 2024
From: duke at openjdk.org (Robert Toyonaga)
Date: Thu, 18 Jul 2024 13:38:38 GMT
Subject: Integrated: 8330144: Revise os::free_memory()
In-Reply-To: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
References: <KxIdDPlzKri2D4Tdwu4wU4SKclh8PFY7-KGX76O2RQY=.051d1485-4686-4153-88bd-6fe33564966b@github.com>
Message-ID: <AOtzhjQz_eSZz92AbEtMHyuvEkyUat-9Mjp1yDZa7A4=.480e5df1-5c17-4aa7-bab5-1daa028dff02@github.com>

On Mon, 8 Jul 2024 17:33:41 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

> ### Summary
> On linux, change `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` so that it uses `madvise(MADV_DONTNEED)` (similar to the BSD implementation) instead of recommitting over the existing committed memory to discard the existing pages. This function should free the underlying memory without uncommitting.  The benefit of this change is that we can get rid of conditional logic dependent on whether we're dealing with huge pages, `madvise` can't fail, and we can also get rid of the "alignment_hint" parameter. 
> 
> `os::free_memory(char *addr, size_t bytes, size_t alignment_hint)` has also been renamed to `os::disclaim_memory(char *addr, size_t bytes)` to differentiate it from `os::free_memory()` which reports the size of free memory instead of actually releasing memory.  
> 
> **Transparent huge pages:**
> `madvise(MADV_DONTNEED)` works with THP. As with small pages, `madvise(MADV_DONTNEED)` results in the memory being freed, RSS decreasing, and the addresses can be re-touched without being explicitly recommitted.
> 
> To determine this, I set /sys/kernel/mm/transparent_hugepage/enabled to "always" and allocated a large amount of memory. Then /proc/PID/smaps shows that THP are being used to back that memory. After calling `disclaim_memory`, RSS decreases indicating the memory is no longer live. The `os::committed_in_range function` also reports that the memory has been freed (This function should probably be renamed to `live_in_range`). Touching the addresses again afterward is fine as well. 
> 
> **Explicit huge pages:**
> `madvise(MADV_DONTNEED)` does not result in memory being freed when used on explicit huge pages. However, the pages are not lost either. Additionally, after `madvise(MADV_DONTNEED)`, we can retouch the addresses without any problems. In conclusion, `madvise(MADV_DONTNEED)` has no affect on huge pages. This means the behavior of of this function with respect to huge pages remains the same. We can remove the "alignment_hint" parameter.
> 
> To determine this, I allocated some huge pages via /proc/sys/vm/nr_hugepages. Successful allocation was confirmed with /proc/meminfo.  After calling `disclaim_memory`, /proc/meminfo shows no change in the number of huge pages in use.  Explicit huge pages are not reflected in RSS so I used the `os::committed_in_range function` instead.  After calling `disclaim_memory`, the `os::committed_in_range` function reports that the memory is still live. Unfortunately that's not an improvement upon existing behav...

This pull request has now been integrated.

Changeset: 4a73ed44
Author:    Robert Toyonaga <rtoyonag at redhat.com>
Committer: Thomas Stuefe <stuefe at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/4a73ed44f1af4ea3e53b1e1a6acfca1ba6b636c3
Stats:     44 lines in 10 files changed: 24 ins; 6 del; 14 mod

8330144: Revise os::free_memory()

Reviewed-by: stuefe, mbaesken

-------------

PR: https://git.openjdk.org/jdk/pull/20080

From liach at openjdk.org  Thu Jul 18 13:48:06 2024
From: liach at openjdk.org (Chen Liang)
Date: Thu, 18 Jul 2024 13:48:06 GMT
Subject: RFR: 8334772: Change Class::signers to an explicit field [v3]
In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
Message-ID: <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com>

> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.)
> 
> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR.

Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-signers
 - Reorder comment of classData to avoid misunderstanding
 - 8334772: Change Class::signers to an explicit field

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20223/files
  - new: https://git.openjdk.org/jdk/pull/20223/files/dd62b9d2..5d742e34

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20223&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20223&range=01-02

  Stats: 779 lines in 28 files changed: 676 ins; 29 del; 74 mod
  Patch: https://git.openjdk.org/jdk/pull/20223.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20223/head:pull/20223

PR: https://git.openjdk.org/jdk/pull/20223

From rriggs at openjdk.org  Thu Jul 18 15:11:33 2024
From: rriggs at openjdk.org (Roger Riggs)
Date: Thu, 18 Jul 2024 15:11:33 GMT
Subject: RFR: 8334772: Change Class::signers to an explicit field [v3]
In-Reply-To: <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com>
References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
 <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com>
Message-ID: <Fkuk4y6oSyWvTF0jU0i8HV8W2f0bEzzU9ebCGYWsW7M=.843ca90f-1247-4772-9c9c-36d983b44203@github.com>

On Thu, 18 Jul 2024 13:48:06 GMT, Chen Liang <liach at openjdk.org> wrote:

>> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.)
>> 
>> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR.
>
> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-signers
>  - Reorder comment of classData to avoid misunderstanding
>  - 8334772: Change Class::signers to an explicit field

lgtm

-------------

Marked as reviewed by rriggs (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20223#pullrequestreview-2186132979

From mcimadamore at openjdk.org  Thu Jul 18 15:27:45 2024
From: mcimadamore at openjdk.org (Maurizio Cimadamore)
Date: Thu, 18 Jul 2024 15:27:45 GMT
Subject: RFR: 8335480: Only deoptimize threads if needed when closing
 shared arena [v3]
In-Reply-To: <N0Q0GTZ0BpZzjFdQ5_-tR0hV1ba_uPXjiEWYVE7SerE=.ce7681b0-a3f8-4f38-8ad7-717d42773aab@github.com>
References: <dqtLXEzL_BsALoslg04Wz7E7UNYMIYKdvsA6u83IDws=.9f8d97cb-beed-430d-a07e-34ba4b12e473@github.com>
 <cU4Xrxc35k0srIqSdeEiFGtRsyfQC2aZEsCxHX6kshg=.0654c19d-d56a-45ed-bdc9-54a7adf60974@github.com>
 <ccVC9sxlN3Fns4165dO3IVYWNr4Z3jEouwU-pcMuhc4=.21858569-5e0a-4e3e-9556-316fbb556ff5@github.com>
 <vHxv_SHVjB-fNJRe9tkXADLPoVr1NdVjb90ZgSdrxW4=.1e7c4e79-d62b-4a48-a9a1-cd3627b9bd8d@github.com>
 <vkis_Q4wJQqAp1yD68PRq0cMZrUx40OCYWRSIInivPE=.d2b50a6e-1cfc-46ea-b315-43abfd46ea63@github.com>
 <AHMA1a2t2LmvOsuoTJXkA-g4vY1MMiUmoC7QcUath14=.b68d4910-9a33-4af0-87e0-0da3e356bfd0@github.com>
 <SLRLJ0POQPOmE_s6A7xdWVS3OA8Nsk3cz11OGpUMgDw=.0ca111a0-efaf-4551-9802-9b52dbaa83df@github.com>
 <xjy3mm5IYGAjpUPi3pW6PzKlyGjm4MDBByQdZoKwP-U=.0ee7421a-a825-46da-900e-1120ce20bbac@github.com>
 <05NPlQ4U6cgxul3_rm6V-5PhPdRYSWO6oKIn67lfTxo=.e36064f0-274e-422a-aeed-4672159aaf7e@github.com>
 <6LWfBFLTU5Umn6EoF6qNsNjOi-uzedphDp661DUr2Q4=.7cc12bce-2283-4038-b3a5-28e6750dacfa@github.com>
 <SDZzJPMEmQsSOaDtkC7g10HN4XPM_Q1Cmdl
 CsAZYcKM=.465d6eca-3af2-4d39-8d33-f4f8a026834e@github.com>
 <N0Q0GTZ0BpZzjFdQ5_-tR0hV1ba_uPXjiEWYVE7SerE=.ce7681b0-a3f8-4f38-8ad7-717d42773aab@github.com>
Message-ID: <AXRsHfMntYQvmuTs1Uw8ZsVl6BSuY0300CtoLcvRKXw=.2a15f486-829d-48ee-8972-a2e0fed76c13@github.com>

On Mon, 15 Jul 2024 16:30:11 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> * there is some issue involving segment access with `int` induction variable which we should investigate separately

This issue is tracked here:
https://bugs.openjdk.org/browse/JDK-8336759

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2236855896

From aph at openjdk.org  Thu Jul 18 16:39:33 2024
From: aph at openjdk.org (Andrew Haley)
Date: Thu, 18 Jul 2024 16:39:33 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter
In-Reply-To: <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com>
Message-ID: <K2b9CDiX4T2gCntyr6LF2q04W4ztFfyhMddDmpZoJqI=.bbce5e35-fb80-408c-adbf-920c6ed9ee72@github.com>

On Thu, 11 Jul 2024 23:22:19 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1433:
>> 
>>> 1431: 
>>> 1432:   // Don't check secondary_super_cache
>>> 1433:   if (super_check_offset.is_register()
>> 
>> Do you see any effects from this particular change?
>> 
>> It adds a runtime check on the fast path for all subtype checks (irrespective of whether it checks primary or secondary super). Moreover, the very same check is performed after primary super slot is checked.
>> 
>> Unless `_secondary_super_cache` field is removed, unconditionally checking the slot at `super_check_offset` is benign.
>
> BTW `MacroAssembler::check_klass_subtype_fast_path` deserves a cleanup: `super_check_offset` can be safely turned into `Register` thus eliminating the code guarded by `super_check_offset.is_register() == false`.

> Do you see any effects from this particular change?
> 
> It adds a runtime check on the fast path for all subtype checks (irrespective of whether it checks primary or secondary super). Moreover, the very same check is performed after primary super slot is checked.

OK. I think this was more for testing, but you make sense.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683178664

From aph at openjdk.org  Thu Jul 18 16:39:33 2024
From: aph at openjdk.org (Andrew Haley)
Date: Thu, 18 Jul 2024 16:39:33 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter
In-Reply-To: <Eq4u2V3UeGi1VeGyEtA0FPS0sKoqAwqCgw5RmfRww-Y=.7dea6a8d-b59c-49b7-8b31-480b970d3de8@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com>
 <FMWMwnwhdReuAohf_e_EWQN7yFM6WNl8Hv0_0S7goek=.9004d9f0-5755-471e-a120-6b6e83c8ebbd@github.com>
 <xNV7-nhHtDKME2kWU_k3bKZJId61Ii_CW12KMQvd0IY=.03b01561-0358-4635-9d1c-ee931f14f12f@github.com>
 <Eq4u2V3UeGi1VeGyEtA0FPS0sKoqAwqCgw5RmfRww-Y=.7dea6a8d-b59c-49b7-8b31-480b970d3de8@github.com>
Message-ID: <I-RsEHCmrbxEX17LborYwpe-VJr3VbpHeCUyLooPsoo=.31ddede4-8af9-439f-bfe9-7a5273363b85@github.com>

On Wed, 17 Jul 2024 18:54:32 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Now it starts to sound concerning... `Klass::set_secondary_supers()` initializes both `_secondary_supers` and `_bitmap` which implies that `Klass::is_subtype_of()` may be called on not yet initialized Klass. It that's the case, it does look like a bug on its own. How is it expected to work when `_secondary_supers` hasn't been set yet?
>
> On a second thought the following setter may be the culprit:
> 
> void Klass::set_secondary_supers(Array<Klass*>* secondaries) {
>   assert(!UseSecondarySupersTable || secondaries == nullptr, "");
>   set_secondary_supers(secondaries, SECONDARY_SUPERS_BITMAP_EMPTY);
> }
> 
> It should be adjusted to set `SECONDARY_SUPERS_BITMAP_FULL` instead.

I've spent a while trying to reproduce the problem but I can't. 

I was seeing a problem where `Klass::is_subtype_of(vmClasses::Cloneable_klass())` was being called before the bitmap had been set. I'm not sure what to think, really. Maybe I should just back out this change to see what happens.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683174591

From aph at openjdk.org  Thu Jul 18 16:44:32 2024
From: aph at openjdk.org (Andrew Haley)
Date: Thu, 18 Jul 2024 16:44:32 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter
In-Reply-To: <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
Message-ID: <P-SlUxusbtJqhV2MGXwLS2u9P4Yq3aQFJW664g1fOug=.e610cb90-b511-43e3-9caa-e9293a25fa5c@github.com>

On Thu, 11 Jul 2024 23:39:11 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1040:
> 
>> 1038: 
>> 1039:   // Secondary subtype checking
>> 1040:   void lookup_secondary_supers_table(Register sub_klass,
> 
> While browsing the code, I noticed that it's far from evident at call sites which overload is used (especially with so many arguments). Does it make sense to avoid method overloads here and use distinct method names instead?

So I confess: this is surely true, but I failed to think of a name for the known- and unknown-at-compile-time versions. maybe  `check_const_klass_subtype_slow_path_table` and `check_var_klass_subtype_slow_path_table` ?

> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 1981:
> 
>> 1979:       __ load_klass(r19_klass, copied_oop);// query the object klass
>> 1980: 
>> 1981:       BLOCK_COMMENT("type_check:");
> 
> Why don't you move it inside `generate_type_check`?

Sorry, what? Do you mean move just this block comment?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683182967
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683184664

From aph at openjdk.org  Thu Jul 18 17:43:29 2024
From: aph at openjdk.org (Andrew Haley)
Date: Thu, 18 Jul 2024 17:43:29 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v2]
In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
Message-ID: <M4ZQME975c-3u4MqZN8p8Mhg5g2NbSpNAdiFqgA4OSk=.13a84093-bff4-4ab3-9812-83e309c45328@github.com>

> This patch expands the use of a hash table for secondary superclasses
> to the interpreter, C1, and runtime. It also adds a C2 implementation
> of hashed lookup in cases where the superclass isn't known at compile
> time.
> 
> HotSpot shared runtime
> ----------------------
> 
> Building hashed secondary tables is now unconditional. It takes very
> little time, and now that the shared runtime always has the tables, it
> might as well take advantage of them. The shared code is easier to
> follow now, I think.
> 
> There might be a performance issue with x86-64 in that we build
> HotSpot for a default x86-64 target that does not support popcount.
> This means that HotSpot C++ runtime on x86 always uses a software
> emulation for popcount, even though the vast majority of machines made
> for the past 20 years can do popcount in a single instruction. It
> wouldn't be terribly hard to do something about that.
> 
> Having said that, the software popcount is really not bad.
> 
> x86
> ---
> 
> x86 is rather tricky, because we still support
> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
> well as 32- and 64-bit ports. There's some further complication in
> that only `RCX` can be used as a shift count, so there's some register
> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
> rather gnarly, with multiple levels of conditionals at compile time
> and runtime.
> 
> AArch64
> -------
> 
> AArch64 is considerably more straightforward. We always have a
> popcount instruction and (thankfully) no 32-bit code to worry about.
> 
> Generally
> ---------
> 
> I would dearly love simply to rip out the "old" secondary supers cache
> support, but I've left it in just in case someone has a performance
> regression.
> 
> The versions of `MacroAssembler::lookup_secondary_supers_table` that
> work with variable superclasses don't take a fixed set of temp
> registers, and neither do they call out to to a slow path subroutine.
> Instead, the slow patch is expanded inline.
> 
> I don't think this is necessarily bad. Apart from the very rare cases
> where C2 can't determine the superclass to search for at compile time,
> this code is only used for generating stubs, and it seemed to me
> ridiculous to have stubs calling other stubs.
> 
> I've followed the guidance from @iwanowww not to obsess too much about
> the performance of C1-compiled secondary supers lookups, and to prefer
> simplicity over absolute performance. Nonetheless, this is a
> complicated patch that touches many areas.

Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:

  Negated tests

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19989/files
  - new: https://git.openjdk.org/jdk/pull/19989/files/7d7694cc..bfe9ceed

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=00-01

  Stats: 23 lines in 3 files changed: 10 ins; 10 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/19989.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989

PR: https://git.openjdk.org/jdk/pull/19989

From vlivanov at openjdk.org  Thu Jul 18 19:07:33 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 18 Jul 2024 19:07:33 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v2]
In-Reply-To: <P-SlUxusbtJqhV2MGXwLS2u9P4Yq3aQFJW664g1fOug=.e610cb90-b511-43e3-9caa-e9293a25fa5c@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <P-SlUxusbtJqhV2MGXwLS2u9P4Yq3aQFJW664g1fOug=.e610cb90-b511-43e3-9caa-e9293a25fa5c@github.com>
Message-ID: <yRd8QN05KfE7K_D63gauu473mUmIY5PoybKfeg0yzdA=.ad06dd5e-d1ce-4cc5-a2bf-179e0602322f@github.com>

On Thu, 18 Jul 2024 16:40:47 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1040:
>> 
>>> 1038: 
>>> 1039:   // Secondary subtype checking
>>> 1040:   void lookup_secondary_supers_table(Register sub_klass,
>> 
>> While browsing the code, I noticed that it's far from evident at call sites which overload is used (especially with so many arguments). Does it make sense to avoid method overloads here and use distinct method names instead?
>
> So I confess: this is surely true, but I failed to think of a name for the known- and unknown-at-compile-time versions. maybe  `check_const_klass_subtype_slow_path_table` and `check_var_klass_subtype_slow_path_table` ?

Another idea: `lookup_secondary_supers_table_var` vs `lookup_secondary_supers_table_const`.
Or `lookup_secondary_supers_table_super_var` vs `lookup_secondary_supers_table_super_const`.

>> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 1981:
>> 
>>> 1979:       __ load_klass(r19_klass, copied_oop);// query the object klass
>>> 1980: 
>>> 1981:       BLOCK_COMMENT("type_check:");
>> 
>> Why don't you move it inside `generate_type_check`?
>
> Sorry, what? Do you mean move just this block comment?

No, the whole piece with `if (UseSecondarySupersTable) { ... } else { ... }` included.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683349665
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683348239

From vlivanov at openjdk.org  Thu Jul 18 19:59:35 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 18 Jul 2024 19:59:35 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v2]
In-Reply-To: <I-RsEHCmrbxEX17LborYwpe-VJr3VbpHeCUyLooPsoo=.31ddede4-8af9-439f-bfe9-7a5273363b85@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com>
 <FMWMwnwhdReuAohf_e_EWQN7yFM6WNl8Hv0_0S7goek=.9004d9f0-5755-471e-a120-6b6e83c8ebbd@github.com>
 <xNV7-nhHtDKME2kWU_k3bKZJId61Ii_CW12KMQvd0IY=.03b01561-0358-4635-9d1c-ee931f14f12f@github.com>
 <Eq4u2V3UeGi1VeGyEtA0FPS0sKoqAwqCgw5RmfRww-Y=.7dea6a8d-b59c-49b7-8b31-480b970d3de8@github.com>
 <I-RsEHCmrbxEX17LborYwpe-VJr3VbpHeCUyLooPsoo=.31ddede4-8af9-439f-bfe9-7a5273363b85@github.com>
Message-ID: <zJHx1UKVSPhz1zoL3CMSYuiI3MPN23AfMraemiDG-8k=.30ff1b8d-37f9-4af0-bf9f-5005f3021596@github.com>

On Thu, 18 Jul 2024 16:35:16 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> On a second thought the following setter may be the culprit:
>> 
>> void Klass::set_secondary_supers(Array<Klass*>* secondaries) {
>>   assert(!UseSecondarySupersTable || secondaries == nullptr, "");
>>   set_secondary_supers(secondaries, SECONDARY_SUPERS_BITMAP_EMPTY);
>> }
>> 
>> It should be adjusted to set `SECONDARY_SUPERS_BITMAP_FULL` instead.
>
> I've spent a while trying to reproduce the problem but I can't. 
> 
> I was seeing a problem where `Klass::is_subtype_of(vmClasses::Cloneable_klass())` was being called before the bitmap had been set. I'm not sure what to think, really. Maybe I should just back out this change to see what happens.

I'm in favor of backing out this change and adding an assert/guarantee (on `_secondary_supers != nullptr`) in `Klass::is_subtype_of()` to ensure no subtype checks happen on uninitialized Klasses. Then we should be able to spot and fix all the places where problematic checks happen.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683409343

From vlivanov at openjdk.org  Thu Jul 18 20:09:33 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 18 Jul 2024 20:09:33 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v2]
In-Reply-To: <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com>
Message-ID: <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com>

On Wed, 17 Jul 2024 17:15:32 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> src/hotspot/share/oops/klass.inline.hpp line 122:
>> 
>>> 120:     return true;
>>> 121: 
>>> 122:   bool result = lookup_secondary_supers_table(k);
>> 
>> Should `UseSecondarySupersTable` affect `Klass::search_secondary_supers` as well?
>
> I think not. It'd complicate C++ runtime for no useful reason.

On the other hand, if `-XX:-UseSecondarySupersTable` is intended solely for diagnostic purposes, then handling all possible execution modes uniformly is preferable, since it gives more confidence when troubleshooting seemingly related failures.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683419259

From vlivanov at openjdk.org  Thu Jul 18 20:13:32 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 18 Jul 2024 20:13:32 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v2]
In-Reply-To: <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com>
 <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com>
Message-ID: <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com>

On Thu, 18 Jul 2024 20:07:14 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> I think not. It'd complicate C++ runtime for no useful reason.
>
> On the other hand, if `-XX:-UseSecondarySupersTable` is intended solely for diagnostic purposes, then handling all possible execution modes uniformly is preferable, since it gives more confidence when troubleshooting seemingly related failures.

Alternatively, `Klass::is_subtype_of()` can unconditionally perform linear search over secondary_supers array. 

Even though I very much like to see table lookup written in C++ (accompanying heavily optimized platform-specific MacroAssembler variants), it would make C++ runtime even simpler.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1683423052

From duke at openjdk.org  Thu Jul 18 20:52:38 2024
From: duke at openjdk.org (fitzsim)
Date: Thu, 18 Jul 2024 20:52:38 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
Message-ID: <YdH1sbYiXMYAeQfEUigdlRCH1rycWckinWAPMt7wmCE=.a79dd884-1ddd-40a1-9f36-0a3af2de9d86@github.com>

On Tue, 9 Jul 2024 12:08:50 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294).
>> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers.
>> 
>> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time.
>> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
>> 
>> Besides of the code changes, one important task is to handle the legal process.
>> 
>> Thanks!
>> 
>> ## Test
>> tests:
>> * test/jdk/jdk/incubator/vector/
>> * test/hotspot/jtreg/compiler/vectorapi/
>> 
>> options:
>> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:+EnableVectorSupport -XX:-UseVectorStubs
>> 
>> ## Performance
>> 
>> ### Options
>> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
>> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
>> 
>> ### Float
>> data
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0)
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408
>> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876
>> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936
>> Float128Vector.ATAN...
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   skip TANH

It is possible to regenerate `sleefinline_advsimd.h` and `sleefinline_sve.h` with some new OpenJDK build logic and only the following fifteen SLEEF source files:


32K	./src/jdk.incubator.vector/linux/native/sleef/src/arch/helperadvsimd.h
40K	./src/jdk.incubator.vector/linux/native/sleef/src/arch/helpersve.h
8.0K	./src/jdk.incubator.vector/linux/native/sleef/src/common/addSuffix.c
20K	./src/jdk.incubator.vector/linux/native/sleef/src/common/commonfuncs.h
16K	./src/jdk.incubator.vector/linux/native/sleef/src/common/dd.h
20K	./src/jdk.incubator.vector/linux/native/sleef/src/common/df.h
4.0K	./src/jdk.incubator.vector/linux/native/sleef/src/common/estrin.h
12K	./src/jdk.incubator.vector/linux/native/sleef/src/common/keywords.txt
12K	./src/jdk.incubator.vector/linux/native/sleef/src/common/misc.h
4.0K	./src/jdk.incubator.vector/linux/native/sleef/src/common/quaddef.h
4.0K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/funcproto.h
20K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/mkrename.c
116K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefinline_header.h.org
164K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefsimddp.c
152K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefsimdsp.c
624K	total


I was able to extract the shell and C preprocessing steps from the upstream CMake-based build system (by adding `--verbose` to `cmake --build` in `createSleef.sh`) and convert them into an OpenJDK `.gmk` file.

[This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) shows various approaches; ideas include:

- the fifteen source files are checked directly into the OpenJDK repository
- a `--regenerate-sleef-headers` configure option that will cause the headers to be rebuilt as their dependencies change
- a `make regenerate-sleef-headers` phony target that unconditionally rebuilds the headers
- cross-compilation support when `--openjdk-target=aarch64-linux-gnu` is specified on an `x86-64` build machine
- a README section with hints on how to maintain the OpenJDK build rules

Whenever the OpenJDK SLEEF source code copies were updated, one would also check for changes in the upstream CMake steps.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2237551700

From dholmes at openjdk.org  Thu Jul 18 21:40:35 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 18 Jul 2024 21:40:35 GMT
Subject: RFR: 8334772: Change Class::signers to an explicit field [v3]
In-Reply-To: <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com>
References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
 <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com>
Message-ID: <hgrXmQamLgNKwuayDeBmFrbpmoJwIblYZjORh-O13tY=.93ca6837-0cab-4ca3-9d5f-7b64255e9bb8@github.com>

On Thu, 18 Jul 2024 13:48:06 GMT, Chen Liang <liach at openjdk.org> wrote:

>> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.)
>> 
>> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR.
>
> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-signers
>  - Reorder comment of classData to avoid misunderstanding
>  - 8334772: Change Class::signers to an explicit field

On the JVMTI side and the heapDumper ... I see that heapDumper explicitly fills in a slot for the classloader, but that is also an explicit field. Does that mean that the classloader appears twice, or does the fact it is filtered by reflection mean that the heapDumper doesn't see it when dumping fields? If the latter then it suggests to me that we should be doing the same for the signers. Otherwise I don't know what the implications might be for having the field listed twice.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20223#pullrequestreview-2186958021

From dholmes at openjdk.org  Thu Jul 18 22:18:32 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 18 Jul 2024 22:18:32 GMT
Subject: RFR: 8334772: Change Class::signers to an explicit field [v3]
In-Reply-To: <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com>
References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
 <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com>
Message-ID: <wppy0f7ZIf9KFcjbtUtVv-J3g8xgT1Rn-7I432UE_5g=.61853901-2b23-4180-9a87-09cbddc74c1d@github.com>

On Thu, 18 Jul 2024 13:48:06 GMT, Chen Liang <liach at openjdk.org> wrote:

>> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.)
>> 
>> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR.
>
> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-signers
>  - Reorder comment of classData to avoid misunderstanding
>  - 8334772: Change Class::signers to an explicit field

I am not a hprof expert but AFAICS the  `HPROF_GC_CLASS_DUMP` contains an explicit id for the classloader, signers, and pd, of the class, and then later a list of all fields declared in the class. AFAICS there is no real connection between these, so it doesn't matter if the classloader/signers/pd is an injected field, a regular Java field, or not a field at all. So in that regard it seems `signers` will now be handled the same way as `classloader` and so that should be fine.

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20223#pullrequestreview-2186999675

From liach at openjdk.org  Thu Jul 18 22:25:36 2024
From: liach at openjdk.org (Chen Liang)
Date: Thu, 18 Jul 2024 22:25:36 GMT
Subject: RFR: 8334772: Change Class::signers to an explicit field [v3]
In-Reply-To: <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com>
References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
 <nPAU4Lju2n6vy_fxtrTFpWDwt9XAXOmi8NSKnKCCy70=.62735fa0-0e49-4b04-ab6c-f856eb5f58a7@github.com>
Message-ID: <gNyQrDdFcmXh88F6ECesn4vaXjwtyBvhLqp4i4ioB6o=.3e858f5c-34a4-4e65-82e3-93637c6ade73@github.com>

On Thu, 18 Jul 2024 13:48:06 GMT, Chen Liang <liach at openjdk.org> wrote:

>> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.)
>> 
>> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR.
>
> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-signers
>  - Reorder comment of classData to avoid misunderstanding
>  - 8334772: Change Class::signers to an explicit field

Thanks for the reviews! I will go ahead and integrate.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20223#issuecomment-2237720701

From liach at openjdk.org  Thu Jul 18 22:25:36 2024
From: liach at openjdk.org (Chen Liang)
Date: Thu, 18 Jul 2024 22:25:36 GMT
Subject: Integrated: 8334772: Change Class::signers to an explicit field
In-Reply-To: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
References: <yLwpf9Mrl1RTotITm9TqtMjGOvkfIo_XFM7RnrmXLZ4=.c37214cf-317d-4924-8ec7-1f94c688e852@github.com>
Message-ID: <lwMnJtI4qZulSvJespfXJNOsPeNKlA_yrUcludNkqds=.5e05f673-a3d4-433e-ba97-c9da4ab71b24@github.com>

On Wed, 17 Jul 2024 19:47:44 GMT, Chen Liang <liach at openjdk.org> wrote:

> `Class` has 2 VM-injected fields that can be made explicit: `Object[] signers` and `ProtectionDomain protectionDomain`. We make the signers field explicit. (The ProtectionDomain can be revisited when SecurityManager is removed, as SecurityManager is accessing it via JNI as well.)
> 
> Migrate the JNI code to Java. The getter previously had a redundant primitive type check, which is dropped in the migrated Java code. The `Object[] getSigners` is no longer `native`, thus requiring a CSR record. Reviewers please help review the associated CSR.

This pull request has now been integrated.

Changeset: 39f44768
Author:    Chen Liang <liach at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/39f44768131254ee11f723f92e2bac57b0d1ade0
Stats:     72 lines in 6 files changed: 6 ins; 53 del; 13 mod

8334772: Change Class::signers to an explicit field

Reviewed-by: dholmes, alanb, rriggs, coleenp

-------------

PR: https://git.openjdk.org/jdk/pull/20223

From dholmes at openjdk.org  Fri Jul 19 06:25:38 2024
From: dholmes at openjdk.org (David Holmes)
Date: Fri, 19 Jul 2024 06:25:38 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v4]
In-Reply-To: <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
 <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com>
Message-ID: <j2qM-EBPh_DROJGmAJrnHA0z6i8hDvDddqRr8wZZMaQ=.df6479d5-2dbd-46bc-baa4-f3933dc75be7@github.com>

On Thu, 18 Jul 2024 06:52:44 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log.
>> 
>> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN.
>> 
>> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless.
>> 
>> If a string's length exceeds `max_length` then we print it as follows:
>> 
>> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged)
>> 
>> For example if we print "ABCDE" with a max_length of 4 then the output is literally:
>> 
>> "AB ... DE" (abridged)
>> 
>> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive).
>> 
>> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`.
>> 
>> Testing:
>>  - new test added for validation purposes
>>  - tiers 1 - 3 as sanity testing
>> 
>> Thanks
>
> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update comment

Thanks for the reviews!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20150#issuecomment-2238360300

From dholmes at openjdk.org  Fri Jul 19 06:25:40 2024
From: dholmes at openjdk.org (David Holmes)
Date: Fri, 19 Jul 2024 06:25:40 GMT
Subject: Integrated: 8325945: Error reporting should limit the number of String
 characters printed
In-Reply-To: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
Message-ID: <aPbD1pUbidcjt6JTaxo9KShn1xG_-SzcoKEc9rQrk68=.4487bc83-b234-46f2-98b0-7fe20ae0db30@github.com>

On Fri, 12 Jul 2024 02:17:46 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log.
> 
> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN.
> 
> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless.
> 
> If a string's length exceeds `max_length` then we print it as follows:
> 
> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged)
> 
> For example if we print "ABCDE" with a max_length of 4 then the output is literally:
> 
> "AB ... DE" (abridged)
> 
> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive).
> 
> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`.
> 
> Testing:
>  - new test added for validation purposes
>  - tiers 1 - 3 as sanity testing
> 
> Thanks

This pull request has now been integrated.

Changeset: 10fcad70
Author:    David Holmes <dholmes at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/10fcad70b3894023d65716b42dc67c1a2bda9c03
Stats:     166 lines in 6 files changed: 163 ins; 0 del; 3 mod

8325945: Error reporting should limit the number of String characters printed

Reviewed-by: thartmann, stuefe

-------------

PR: https://git.openjdk.org/jdk/pull/20150

From mli at openjdk.org  Fri Jul 19 07:20:43 2024
From: mli at openjdk.org (Hamlin Li)
Date: Fri, 19 Jul 2024 07:20:43 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <YdH1sbYiXMYAeQfEUigdlRCH1rycWckinWAPMt7wmCE=.a79dd884-1ddd-40a1-9f36-0a3af2de9d86@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
 <YdH1sbYiXMYAeQfEUigdlRCH1rycWckinWAPMt7wmCE=.a79dd884-1ddd-40a1-9f36-0a3af2de9d86@github.com>
Message-ID: <LCCK0gVv2r5lhKluIDzAF_WV9dsKwwRo7WVfcj8-NxU=.2f6b2a21-716c-4868-8365-e3fd28bfc8eb@github.com>

On Thu, 18 Jul 2024 20:50:14 GMT, fitzsim <duke at openjdk.org> wrote:

> It is possible to regenerate `sleefinline_advsimd.h` and `sleefinline_sve.h` with some new OpenJDK build logic and only the following fifteen SLEEF source files:
> 
> ```
> 32K	./src/jdk.incubator.vector/linux/native/sleef/src/arch/helperadvsimd.h
> 40K	./src/jdk.incubator.vector/linux/native/sleef/src/arch/helpersve.h
> 8.0K	./src/jdk.incubator.vector/linux/native/sleef/src/common/addSuffix.c
> 20K	./src/jdk.incubator.vector/linux/native/sleef/src/common/commonfuncs.h
> 16K	./src/jdk.incubator.vector/linux/native/sleef/src/common/dd.h
> 20K	./src/jdk.incubator.vector/linux/native/sleef/src/common/df.h
> 4.0K	./src/jdk.incubator.vector/linux/native/sleef/src/common/estrin.h
> 12K	./src/jdk.incubator.vector/linux/native/sleef/src/common/keywords.txt
> 12K	./src/jdk.incubator.vector/linux/native/sleef/src/common/misc.h
> 4.0K	./src/jdk.incubator.vector/linux/native/sleef/src/common/quaddef.h
> 4.0K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/funcproto.h
> 20K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/mkrename.c
> 116K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefinline_header.h.org
> 164K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefsimddp.c
> 152K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefsimdsp.c
> 624K	total
> ```
> 
> I was able to extract the shell and C preprocessing steps from the upstream CMake-based build system (by adding `--verbose` to `cmake --build` in `createSleef.sh`) and convert them into an OpenJDK `.gmk` file.
> 
> [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) shows various approaches; ideas include:
> 
> * the fifteen source files are checked directly into the OpenJDK repository
> * a `--regenerate-sleef-headers` configure option that will cause the headers to be rebuilt as their dependencies change
> * a `make regenerate-sleef-headers` phony target that unconditionally rebuilds the headers
> * cross-compilation support when `--openjdk-target=aarch64-linux-gnu` is specified on an `x86-64` build machine
> * a README section with hints on how to maintain the OpenJDK build rules
> 

Really nice work, Thanks!

> Whenever the OpenJDK SLEEF source code copies were updated, one would also check for changes in the upstream CMake steps.

Compared to current implementation in https://github.com/openjdk/jdk/pull/19185, my bit concern about [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) is the future maintainence effort when we need to update the sleef source along with the cmake changes, also when new platforms support of sleef are added in jdk.

In another hand, I'm not sure if [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) qualify the traceability requirement discussed above.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2238531577

From aph at openjdk.org  Fri Jul 19 09:20:36 2024
From: aph at openjdk.org (Andrew Haley)
Date: Fri, 19 Jul 2024 09:20:36 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
Message-ID: <ml4dF0TwSQdlURT8ETSAN9RVnx3iMIVNDCWedq8lc1Y=.6b3da39c-41fc-460e-8632-d5a42be279ab@github.com>

On Tue, 9 Jul 2024 12:08:50 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294).
>> * NOTE: This pr depends on https://github.com/openjdk/jdk/pull/19185, which includes a README, a script to generate sleef inline headers and generated sleef inline headers.
>> 
>> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time.
>> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
>> 
>> Besides of the code changes, one important task is to handle the legal process.
>> 
>> Thanks!
>> 
>> ## Test
>> tests:
>> * test/jdk/jdk/incubator/vector/
>> * test/hotspot/jtreg/compiler/vectorapi/
>> 
>> options:
>> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
>> * -XX:+EnableVectorSupport -XX:-UseVectorStubs
>> 
>> ## Performance
>> 
>> ### Options
>> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
>> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
>> 
>> ### Float
>> data
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0)
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408
>> Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876
>> Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936
>> Float128Vector.ATAN...
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   skip TANH

> Compared to current implementation in #19185, my bit concern about [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) is the future maintainence effort when we need to update the sleef source along with the cmake changes, also when new platforms support of sleef are added in jdk.

That's a fair point.  However, it's probably less work than any adequate alternative proposed thus far.

> In another hand, I'm not sure if [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) qualify the traceability requirement discussed above.

I'm sure it's fine: we have readable source code in the preferred form, along with a script that generates it from the corresponding SLEEF release.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2238734507

From jiefu at openjdk.org  Fri Jul 19 11:56:39 2024
From: jiefu at openjdk.org (Jie Fu)
Date: Fri, 19 Jul 2024 11:56:39 GMT
Subject: RFR: 8325945: Error reporting should limit the number of String
 characters printed [v4]
In-Reply-To: <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com>
References: <YEuTl4iBSHs5CiCfBK_ces4v77mV20I70dqJmO_u6UU=.2514dc99-aa28-4881-8bdb-7ad04d4939c2@github.com>
 <mHlCtFCitj8_YGchzdAHdKC3db_MXGam6Am_z_M1BNM=.1e9e4b5a-3f8c-4946-8254-c425d64da354@github.com>
Message-ID: <yJfZgDUnG3kEsld9PIIiHxaRPfbuDDoY40lYeuHQ_jU=.17c3b559-bc8c-43f0-b716-d7272e096460@github.com>

On Thu, 18 Jul 2024 06:52:44 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Please review this enhancement that intends to improve the readability of error logs when very long `java.lang.String`s exist and when printed in full they obscure things in the log.
>> 
>> The suggestion was to add a `MaxStringPrintSize` flag, similar to the `MaxElementPrintSize` for arrays. I've set the default to 256 (arbitrary selection: not too big, not too small - may need adjusting) with a range from 2 to O_BUFLEN.
>> 
>> The method `java_lang_String::print` now takes a `max_length` parameter that defaults to `MaxStringPrintSize`. This allows more direct control if specific call sites want to print full strings regardless.
>> 
>> If a string's length exceeds `max_length` then we print it as follows:
>> 
>> "< first max_length/2 characters> ... <last max_length/2 characters>" (abridged)
>> 
>> For example if we print "ABCDE" with a max_length of 4 then the output is literally:
>> 
>> "AB ... DE" (abridged)
>> 
>> The message doesn't mention `MaxPrintStringSize` as that may not be involved in limiting the printed length. Developers will need to know to look at that (which is not 100% satisfactory but explaining everything in the output itself seems a bit excessive).
>> 
>> For testing purposes I added a WhiteBox API to print the string to a `stringStream` and then return it as a new `java.lang.String`.
>> 
>> Testing:
>>  - new test added for validation purposes
>>  - tiers 1 - 3 as sanity testing
>> 
>> Thanks
>
> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update comment

runtime/PrintingTests/StringPrinting.java fails with release VMs.
Please see https://github.com/openjdk/jdk/pull/20249
Thanks.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20150#issuecomment-2238980728

From aturbanov at openjdk.org  Fri Jul 19 12:19:37 2024
From: aturbanov at openjdk.org (Andrey Turbanov)
Date: Fri, 19 Jul 2024 12:19:37 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
Message-ID: <vVDog-D2CN7FfbKEkwymEfCwaBoYG0qtLcP67v4ddqk=.3903cae2-4571-4292-910e-44733436b607@github.com>

On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision:
> 
>  - Remove try_read
>  - Add explicit to single parameter constructors
>  - Remove superfluous access specifier
>  - Remove unused include
>  - Update assert message OMCache::set_monitor
>  - Fix indentation
>  - Remove outdated comment LightweightSynchronizer::exit
>  - Remove logStream include
>  - Remove strange comment
>  - Fix javaThread include

test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java line 126:

> 124:                 int count = getCount();
> 125:                 if (count != i * THREADS) {
> 126:                     throw new RuntimeException("WaitNotifyTest: Invalid Count "  + count +

Suggestion:

                    throw new RuntimeException("WaitNotifyTest: Invalid Count " + count +

test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java line 136:

> 134:             int count = getCount();
> 135:             if (count != ITERATIONS * THREADS) {
> 136:                 throw new RuntimeException("WaitNotifyTest: Invalid Count "  + count);

Suggestion:

                throw new RuntimeException("WaitNotifyTest: Invalid Count " + count);

test/hotspot/jtreg/runtime/Monitor/UseObjectMonitorTableTest.java line 217:

> 215:             int count = getCount();
> 216:             if (count != THREADS * ITERATIONS) {
> 217:                 throw new RuntimeException("RandomDepthTest: Invalid Count "  + count);

Suggestion:

                throw new RuntimeException("RandomDepthTest: Invalid Count " + count);

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1684293578
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1684293811
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1684293954

From szaldana at openjdk.org  Fri Jul 19 13:51:17 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Fri, 19 Jul 2024 13:51:17 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v4]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <n9g0YwM2xZHvUOcVPAjck8WMEgMol0NXR_XT9fwdk4w=.3afe7768-2832-4c76-8002-671b8e0c72e3@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with three additional commits since the last revision:

 - Adding tests for file dcmd argument
 - Updates to test case
 - Adding FileArgument as a diagnostic argument

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/3bb774d3..c71cb639

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=02-03

  Stats: 146 lines in 11 files changed: 76 ins; 46 del; 24 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From szaldana at openjdk.org  Fri Jul 19 14:07:12 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Fri, 19 Jul 2024 14:07:12 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v5]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  Missing copyright header update

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/c71cb639..cdf1d457

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=03-04

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From szaldana at openjdk.org  Fri Jul 19 14:07:12 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Fri, 19 Jul 2024 14:07:12 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v4]
In-Reply-To: <n9g0YwM2xZHvUOcVPAjck8WMEgMol0NXR_XT9fwdk4w=.3afe7768-2832-4c76-8002-671b8e0c72e3@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <n9g0YwM2xZHvUOcVPAjck8WMEgMol0NXR_XT9fwdk4w=.3afe7768-2832-4c76-8002-671b8e0c72e3@github.com>
Message-ID: <a_uaHVLjX1O1prL33-UPUq7_T8CVQXk0_opluJj0yEI=.de732778-4a50-46bf-bdff-e80555c333b6@github.com>

On Fri, 19 Jul 2024 13:51:17 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - Adding tests for file dcmd argument
>  - Updates to test case
>  - Adding FileArgument as a diagnostic argument

Hi folks, 

I made some updates. Just wanted to note a few things: 
* I think we can remove `test/jdk/sun/tools/jcmd/TestJcmdPIDSubstitution.java` and the changes to test/hotspot/jtreg/runtime/cds/appcds/jcmd tests. I?ve added a test case for dcmd file argument parsing which is more general. I?ve left the old tests in for reference at the moment. 

* Regarding warnings, I noted we wanted to issue any warnings to the issuer of the dcmd and not the JVM process. However,  in ```diagnosticArgument.cpp```, they are issuing the warnings directly to the JVM process. I tried to stay consistent with how things are done there, but let me know what you think. 

Thanks for the comments!
Sonia

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2239256942

From shade at openjdk.org  Fri Jul 19 15:52:14 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 19 Jul 2024 15:52:14 GMT
Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3]
In-Reply-To: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
References: <UUK4x10bUNfUXL5R6t7ljHta6VMbko4xvGIdbTsVkXI=.641dde03-e6fb-4c8f-b6c3-5ad97cf5e9e7@github.com>
Message-ID: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com>

> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks.
> 
> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. 
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, `all`

Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:

  Amend the test case for guaranteing it works under different compilation regimes

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20139/files
  - new: https://git.openjdk.org/jdk/pull/20139/files/79ece901..437f2329

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=01-02

  Stats: 36 lines in 1 file changed: 18 ins; 7 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/20139.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139

PR: https://git.openjdk.org/jdk/pull/20139

From aph at openjdk.org  Fri Jul 19 16:51:01 2024
From: aph at openjdk.org (Andrew Haley)
Date: Fri, 19 Jul 2024 16:51:01 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v3]
In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
Message-ID: <7Kzb5V0WYTDNKGrZ7ugIELsASZhoMMJn3UTU_QFWq7Q=.7da728cd-061f-498d-a3a4-46e62c6020e8@github.com>

> This patch expands the use of a hash table for secondary superclasses
> to the interpreter, C1, and runtime. It also adds a C2 implementation
> of hashed lookup in cases where the superclass isn't known at compile
> time.
> 
> HotSpot shared runtime
> ----------------------
> 
> Building hashed secondary tables is now unconditional. It takes very
> little time, and now that the shared runtime always has the tables, it
> might as well take advantage of them. The shared code is easier to
> follow now, I think.
> 
> There might be a performance issue with x86-64 in that we build
> HotSpot for a default x86-64 target that does not support popcount.
> This means that HotSpot C++ runtime on x86 always uses a software
> emulation for popcount, even though the vast majority of machines made
> for the past 20 years can do popcount in a single instruction. It
> wouldn't be terribly hard to do something about that.
> 
> Having said that, the software popcount is really not bad.
> 
> x86
> ---
> 
> x86 is rather tricky, because we still support
> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
> well as 32- and 64-bit ports. There's some further complication in
> that only `RCX` can be used as a shift count, so there's some register
> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
> rather gnarly, with multiple levels of conditionals at compile time
> and runtime.
> 
> AArch64
> -------
> 
> AArch64 is considerably more straightforward. We always have a
> popcount instruction and (thankfully) no 32-bit code to worry about.
> 
> Generally
> ---------
> 
> I would dearly love simply to rip out the "old" secondary supers cache
> support, but I've left it in just in case someone has a performance
> regression.
> 
> The versions of `MacroAssembler::lookup_secondary_supers_table` that
> work with variable superclasses don't take a fixed set of temp
> registers, and neither do they call out to to a slow path subroutine.
> Instead, the slow patch is expanded inline.
> 
> I don't think this is necessarily bad. Apart from the very rare cases
> where C2 can't determine the superclass to search for at compile time,
> this code is only used for generating stubs, and it seemed to me
> ridiculous to have stubs calling other stubs.
> 
> I've followed the guidance from @iwanowww not to obsess too much about
> the performance of C1-compiled secondary supers lookups, and to prefer
> simplicity over absolute performance. Nonetheless, this is a
> complicated patch that touches many areas.

Andrew Haley has updated the pull request incrementally with four additional commits since the last revision:

 - Review feedback
 - Review feedback
 - Review feedback
 - Cleanup check_klass_subtype_fast_path for AArch64, deleting dead code

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19989/files
  - new: https://git.openjdk.org/jdk/pull/19989/files/bfe9ceed..98f6b2b7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=01-02

  Stats: 127 lines in 4 files changed: 6 ins; 46 del; 75 mod
  Patch: https://git.openjdk.org/jdk/pull/19989.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989

PR: https://git.openjdk.org/jdk/pull/19989

From szaldana at openjdk.org  Fri Jul 19 18:57:40 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Fri, 19 Jul 2024 18:57:40 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output()
Message-ID: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>

Hi all, 

This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to.

Testing: 
- [x] Added test case passes. 

Thanks, 
Sonia

-------------

Commit messages:
 - 8327054: DiagnosticCommand Compiler.perfmap does not log on output()

Changes: https://git.openjdk.org/jdk/pull/20257/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20257&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8327054
  Stats: 16 lines in 5 files changed: 10 ins; 0 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/20257.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20257/head:pull/20257

PR: https://git.openjdk.org/jdk/pull/20257

From cjplummer at openjdk.org  Fri Jul 19 19:23:31 2024
From: cjplummer at openjdk.org (Chris Plummer)
Date: Fri, 19 Jul 2024 19:23:31 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output()
In-Reply-To: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
Message-ID: <HxQcEMgEFzfrvupDsbiTmOutUnDui2kZf7STpf0xw0U=.6c6c3464-3d04-49a3-9ee3-e12d58d0bf9b@github.com>

On Fri, 19 Jul 2024 15:07:39 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

> Hi all, 
> 
> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to.
> 
> Testing: 
> - [x] Added test case passes. 
> 
> Thanks, 
> Sonia

test/hotspot/jtreg/serviceability/dcmd/compiler/PerfMapTest.java line 124:

> 122:         output.shouldContain("Failed to create nonexistent/%s for perf map".formatted(test_dir));
> 123:         output.shouldNotHaveExitValue(0);
> 124:         Files.deleteIfExists(path);

If the file exists, that means the expected error message will not be found, which means an exception will be thrown before you get to the `Files.deleteIfExits(path)` call. If the file doesn't exist, then there is nothing to delete. So as things stand now this call will never delete anything. Maybe put it in a finally block so if the file does exist it will get deleted.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1684890829

From lmesnik at openjdk.org  Fri Jul 19 20:10:34 2024
From: lmesnik at openjdk.org (Leonid Mesnik)
Date: Fri, 19 Jul 2024 20:10:34 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v5]
In-Reply-To: <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
Message-ID: <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com>

On Fri, 19 Jul 2024 14:07:12 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Missing copyright header update

Thanks for updating the fix. The new  version looks moistly good. I added a few small comments.

src/hotspot/share/prims/wbtestmethods/parserTests.cpp line 132:

> 130:    } else if (strcmp(type, "FILE") == 0) {
> 131:      DCmdArgument<FileArgument> *argument =
> 132:       new DCmdArgument<FileArgument>(name, desc, "FILE", mandatory);

Please check indentation.

src/hotspot/share/services/diagnosticArgument.cpp line 358:

> 356: template <> void DCmdArgument<MemorySizeArgument>::destroy_value() { }
> 357: 
> 358: template <>

The common style here is to place in the single line 'template<> and other part of declaration.

src/hotspot/share/services/diagnosticArgument.cpp line 366:

> 364:     _value._name = NEW_C_HEAP_ARRAY(char, JVM_MAXPATHLEN, mtInternal);
> 365:     if (!Arguments::copy_expand_pid(str, len, _value._name, JVM_MAXPATHLEN)) {
> 366:       fatal("Invalid file path: %s", str);

As I understand the 'copy_expand_pid' might fail if very long line is used. This cause jvm crash.,
So there is possibility that user might crash jvm accidentally invoking jcmd command. 
It doesn't look safe, I believe it would be better to throw Exception like for any other invalid command, see
" THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(),"

The 'fatal" owuld make sense only if failing of 'copy_expand_pid' means some unrecoverable jvm bug.

-------------

Changes requested by lmesnik (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2189044201
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1684887604
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1684892964
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1684923626

From lmesnik at openjdk.org  Fri Jul 19 20:10:35 2024
From: lmesnik at openjdk.org (Leonid Mesnik)
Date: Fri, 19 Jul 2024 20:10:35 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v4]
In-Reply-To: <a_uaHVLjX1O1prL33-UPUq7_T8CVQXk0_opluJj0yEI=.de732778-4a50-46bf-bdff-e80555c333b6@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <n9g0YwM2xZHvUOcVPAjck8WMEgMol0NXR_XT9fwdk4w=.3afe7768-2832-4c76-8002-671b8e0c72e3@github.com>
 <a_uaHVLjX1O1prL33-UPUq7_T8CVQXk0_opluJj0yEI=.de732778-4a50-46bf-bdff-e80555c333b6@github.com>
Message-ID: <hCo8C9dfEUDHystD6OUU6HebS7e_x5Q8Mo9aHaPneak=.1a72d889-2e66-46e0-9545-1aa05da8680f@github.com>

On Fri, 19 Jul 2024 14:03:43 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

> * Regarding warnings, I noted we wanted to issue any warnings to the issuer of the dcmd and not the JVM process. However,  in `diagnosticArgument.cpp`, they are issuing the warnings directly to the JVM process. I tried to stay consistent with how things are done there, but let me know what you think.
> 

It makes sense to file separate issue for this and keep current behavior in the fix.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2240037485

From szaldana at openjdk.org  Fri Jul 19 20:17:46 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Fri, 19 Jul 2024 20:17:46 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v2]
In-Reply-To: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
Message-ID: <1IQ-_rNXXSFB9LAsP0kbK3MAQSOgKKrqZxFC8tZzrkc=.8b2c00dd-bddf-4362-9125-36ad2042e794@github.com>

> Hi all, 
> 
> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to.
> 
> Testing: 
> - [x] Added test case passes. 
> 
> Thanks, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  Ensuring test case deletes file in case of exception

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20257/files
  - new: https://git.openjdk.org/jdk/pull/20257/files/484f25eb..6129d87c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20257&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20257&range=00-01

  Stats: 7 lines in 1 file changed: 3 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/20257.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20257/head:pull/20257

PR: https://git.openjdk.org/jdk/pull/20257

From szaldana at openjdk.org  Fri Jul 19 20:17:46 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Fri, 19 Jul 2024 20:17:46 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v2]
In-Reply-To: <HxQcEMgEFzfrvupDsbiTmOutUnDui2kZf7STpf0xw0U=.6c6c3464-3d04-49a3-9ee3-e12d58d0bf9b@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
 <HxQcEMgEFzfrvupDsbiTmOutUnDui2kZf7STpf0xw0U=.6c6c3464-3d04-49a3-9ee3-e12d58d0bf9b@github.com>
Message-ID: <xUkrZDkazLNKwKpT73X8XgL282IU-MdYffjc0AER8hU=.be1e4387-8b0c-4f72-bf0d-ef47eca81022@github.com>

On Fri, 19 Jul 2024 19:21:05 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

>> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Ensuring test case deletes file in case of exception
>
> test/hotspot/jtreg/serviceability/dcmd/compiler/PerfMapTest.java line 124:
> 
>> 122:         output.shouldContain("Failed to create nonexistent/%s for perf map".formatted(test_dir));
>> 123:         output.shouldNotHaveExitValue(0);
>> 124:         Files.deleteIfExists(path);
> 
> If the file exists, that means the expected error message will not be found, which means an exception will be thrown before you get to the `Files.deleteIfExits(path)` call. If the file doesn't exist, then there is nothing to delete. So as things stand now this call will never delete anything. Maybe put it in a finally block so if the file does exist it will get deleted.

Makes sense, I added the finally block.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1684933809

From duke at openjdk.org  Fri Jul 19 21:30:01 2024
From: duke at openjdk.org (Henry Lin)
Date: Fri, 19 Jul 2024 21:30:01 GMT
Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime
 error: signed integer overflow: -9223372036854775808 - 1 cannot be
 represented in type 'long int'
Message-ID: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com>

Cast the result of `nth_bit(n)` to `uintptr_t` to prevent signed integer overflow error reported by `ubsan`. Unsigned overflow is not undefined behavior and is not checked by `ubsan`.

-------------

Commit messages:
 - 8332697: fix ubsan:shenandoahSimpleBitMap.inline.hpp runtime integer overflow

Changes: https://git.openjdk.org/jdk/pull/20164/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20164&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8332697
  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20164.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20164/head:pull/20164

PR: https://git.openjdk.org/jdk/pull/20164

From cjplummer at openjdk.org  Fri Jul 19 21:54:41 2024
From: cjplummer at openjdk.org (Chris Plummer)
Date: Fri, 19 Jul 2024 21:54:41 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v2]
In-Reply-To: <1IQ-_rNXXSFB9LAsP0kbK3MAQSOgKKrqZxFC8tZzrkc=.8b2c00dd-bddf-4362-9125-36ad2042e794@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
 <1IQ-_rNXXSFB9LAsP0kbK3MAQSOgKKrqZxFC8tZzrkc=.8b2c00dd-bddf-4362-9125-36ad2042e794@github.com>
Message-ID: <luz785NJsF0evZEELvZeNaG7KbdbbXN1RUqxVaVROXY=.8487e8c5-159c-4df3-bfde-e904ca8834b8@github.com>

On Fri, 19 Jul 2024 20:17:46 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to.
>> 
>> Testing: 
>> - [x] Added test case passes. 
>> 
>> Thanks, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Ensuring test case deletes file in case of exception

Looks good.

-------------

Marked as reviewed by cjplummer (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20257#pullrequestreview-2189401873

From dlong at openjdk.org  Fri Jul 19 22:43:36 2024
From: dlong at openjdk.org (Dean Long)
Date: Fri, 19 Jul 2024 22:43:36 GMT
Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23:
 runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be
 represented in type 'long int'
In-Reply-To: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com>
References: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com>
Message-ID: <enP03_lP52TO93btMHFUgUECzw7mvDL2AHCnCH8pz00=.b92d9d89-14b1-4a06-9ff7-8b754aac9c1d@github.com>

On Fri, 12 Jul 2024 20:53:04 GMT, Henry Lin <duke at openjdk.org> wrote:

> Cast the result of `nth_bit(n)` to `uintptr_t` to prevent signed integer overflow error reported by `ubsan`. Unsigned overflow is not undefined behavior and is not checked by `ubsan`.

src/hotspot/share/utilities/globalDefinitions.hpp line 1069:

> 1067: // (note: #define used only so that they can be used in enum constant definitions)
> 1068: #define nth_bit(n)        (((n) >= BitsPerWord) ? 0 : (OneBit << (n)))
> 1069: #define right_n_bits(n)   ((uintptr_t) nth_bit(n) - 1)

This changes the return type of right_n_bits, which could break existing code.  If we need an unsigned version, I think it should have a different name.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20164#discussion_r1685072633

From stuefe at openjdk.org  Sat Jul 20 08:07:33 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sat, 20 Jul 2024 08:07:33 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v5]
In-Reply-To: <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
Message-ID: <znqG3L1DK9JuvU1fIlXfIEu8pp5t_LxIhITS1xvOPBc=.43cc6737-3086-4c2a-bad5-627ab1e91ca6@github.com>

On Fri, 19 Jul 2024 14:07:12 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Missing copyright header update

Moistly good too. Remarks inline.

src/hotspot/share/services/diagnosticArgument.cpp line 384:

> 382:     _value._name = nullptr;
> 383:   }
> 384: }

Whatever this `DCmdArgument<FileArgument>::destroy_value()` is supposed to do, it clearly isn't working, since we leak the memory.

src/hotspot/share/services/diagnosticArgument.hpp line 66:

> 64:   public:
> 65:     char *_name;
> 66: };

Something is off about this. What is the lifetime of this object?

You don't free it. Running a command in a loop will consume C-heap (you can check this with NMT: `jcmd VM.native_memory baseline`, then run a command 100 times, then `jcmd VM.native_memory summary.diff` will show you the leak in mtInternal.

I would probably just inline the string. E.g.


struct FileArgument {
  char name[max name len]
};


FileArguments sits as member inside DCmdArgument. DCmdArgument or DCmdArgumentWithParser sits as member in the various XXXDCmd classes. 

Those are created in DCmdFactory::create_local_DCmd(). Which is what, a static global list? So we only have one global XXXDCmd object instance per command, but for each command invocation re-parse the argument values? What a weird concept.

Man, this coding is way too convoluted for a little parser engine :( 

But anyway, inlining the filename array into FileArgument should be probably fine from a size standpoint. I would, however, not use JVM_MAXPATHLEN or anything that depends ultimately on PATH_MAX from system headers. We don't want the object to consume e.g. an MB if some crazy platform defines PATH_MAX as 1MB. Therefore I would use e.g. 1024 as limit for the path name.

(Note that PATH_MAX is an illusion anyway, there is never a guarantee that a path is smaller than that limit... See this good article: https://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html)

src/hotspot/share/services/diagnosticArgument.hpp line 113:

> 111:   void to_string(MemorySizeArgument f, char* buf, size_t len) const;
> 112:   void to_string(StringArrayArgument* s, char* buf, size_t len) const;
> 113:   void to_string(FileArgument f, char *buf, size_t len) const;

Here, and in all other places: Please use 'char* var', not 'char *var'.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2189782275
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685301655
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685297940
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685290041

From stuefe at openjdk.org  Sat Jul 20 08:07:34 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sat, 20 Jul 2024 08:07:34 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v5]
In-Reply-To: <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
 <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com>
Message-ID: <EIIwRleptsdSt_X7tJdQ1mbPYD4RhXcpqA3b4UBb8mU=.ed64e38d-1aed-4304-9d53-1aa3ed434f89@github.com>

On Fri, 19 Jul 2024 20:00:28 GMT, Leonid Mesnik <lmesnik at openjdk.org> wrote:

>> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Missing copyright header update
>
> src/hotspot/share/services/diagnosticArgument.cpp line 366:
> 
>> 364:     _value._name = NEW_C_HEAP_ARRAY(char, JVM_MAXPATHLEN, mtInternal);
>> 365:     if (!Arguments::copy_expand_pid(str, len, _value._name, JVM_MAXPATHLEN)) {
>> 366:       fatal("Invalid file path: %s", str);
> 
> As I understand the 'copy_expand_pid' might fail if very long line is used. This cause jvm crash.,
> So there is possibility that user might crash jvm accidentally invoking jcmd command. 
> It doesn't look safe, I believe it would be better to throw Exception like for any other invalid command, see
> " THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(),"
> 
> The 'fatal" owuld make sense only if failing of 'copy_expand_pid' means some unrecoverable jvm bug.

Yes. In this file, other commands use `fatal` only where reading the hard-coded default values - in the various `init_...` functions. Hard-coded values should be valid, obviously, otherwise the JVM developer messed up. Other values are passed in by the end user via jcmd and should not crash the JVM.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685299871

From stuefe at openjdk.org  Sat Jul 20 08:07:34 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sat, 20 Jul 2024 08:07:34 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v5]
In-Reply-To: <EIIwRleptsdSt_X7tJdQ1mbPYD4RhXcpqA3b4UBb8mU=.ed64e38d-1aed-4304-9d53-1aa3ed434f89@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
 <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com>
 <EIIwRleptsdSt_X7tJdQ1mbPYD4RhXcpqA3b4UBb8mU=.ed64e38d-1aed-4304-9d53-1aa3ed434f89@github.com>
Message-ID: <SDkUb2GIxTALba_5Dc-t3qK_zov3SEzBIMWzOsb2T0M=.1f420a00-732a-4917-a111-a0124240d4f6@github.com>

On Sat, 20 Jul 2024 07:38:25 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/services/diagnosticArgument.cpp line 366:
>> 
>>> 364:     _value._name = NEW_C_HEAP_ARRAY(char, JVM_MAXPATHLEN, mtInternal);
>>> 365:     if (!Arguments::copy_expand_pid(str, len, _value._name, JVM_MAXPATHLEN)) {
>>> 366:       fatal("Invalid file path: %s", str);
>> 
>> As I understand the 'copy_expand_pid' might fail if very long line is used. This cause jvm crash.,
>> So there is possibility that user might crash jvm accidentally invoking jcmd command. 
>> It doesn't look safe, I believe it would be better to throw Exception like for any other invalid command, see
>> " THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(),"
>> 
>> The 'fatal" owuld make sense only if failing of 'copy_expand_pid' means some unrecoverable jvm bug.
>
> Yes. In this file, other commands use `fatal` only where reading the hard-coded default values - in the various `init_...` functions. Hard-coded values should be valid, obviously, otherwise the JVM developer messed up. Other values are passed in by the end user via jcmd and should not crash the JVM.

I see the prevalent way to deal with runtime parse errors is to throw a java exception. That exception later is caught in the command processing loop at the entrance of the attach listener thread.

So, @SoniaZaldana, I would do this here too - when in Rome...

But is this not unnecessarily complex? It requires the AttachListener to be a java thread when in fact it does need no java - we just misuse java exception handling as a way to pass error information up the stack, with the simple ultimate goal of writing error information into the outputStream to be sent to the caller. We might just as well pass the outputStream* to the parse_xxx functions as third argument, and write directly and return some error code. This would make the attach listener thread a lot less dependent on Java and more robust - at least for jcmds that don't need Java (which jcmds need java?). 

After all, the attach listener is supposed to be super robust and always work even if the JVM misbehaves. @dholmes-ora @lmesnik what do you guys think, should we change that? (obviously in a different RFE)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685304522

From stuefe at openjdk.org  Sat Jul 20 08:07:35 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sat, 20 Jul 2024 08:07:35 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v5]
In-Reply-To: <znqG3L1DK9JuvU1fIlXfIEu8pp5t_LxIhITS1xvOPBc=.43cc6737-3086-4c2a-bad5-627ab1e91ca6@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
 <znqG3L1DK9JuvU1fIlXfIEu8pp5t_LxIhITS1xvOPBc=.43cc6737-3086-4c2a-bad5-627ab1e91ca6@github.com>
Message-ID: <CxwbysRpQzuTNFd7180Kh5neBkx9mMJOcct-eqCzrIQ=.5393f979-4fe1-445f-a99d-a49515eec5fe@github.com>

On Sat, 20 Jul 2024 07:30:55 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Missing copyright header update
>
> src/hotspot/share/services/diagnosticArgument.hpp line 66:
> 
>> 64:   public:
>> 65:     char *_name;
>> 66: };
> 
> Something is off about this. What is the lifetime of this object?
> 
> You don't free it. Running a command in a loop will consume C-heap (you can check this with NMT: `jcmd VM.native_memory baseline`, then run a command 100 times, then `jcmd VM.native_memory summary.diff` will show you the leak in mtInternal.
> 
> I would probably just inline the string. E.g.
> 
> 
> struct FileArgument {
>   char name[max name len]
> };
> 
> 
> FileArguments sits as member inside DCmdArgument. DCmdArgument or DCmdArgumentWithParser sits as member in the various XXXDCmd classes. 
> 
> Those are created in DCmdFactory::create_local_DCmd(). Which is what, a static global list? So we only have one global XXXDCmd object instance per command, but for each command invocation re-parse the argument values? What a weird concept.
> 
> Man, this coding is way too convoluted for a little parser engine :( 
> 
> But anyway, inlining the filename array into FileArgument should be probably fine from a size standpoint. I would, however, not use JVM_MAXPATHLEN or anything that depends ultimately on PATH_MAX from system headers. We don't want the object to consume e.g. an MB if some crazy platform defines PATH_MAX as 1MB. Therefore I would use e.g. 1024 as limit for the path name.
> 
> (Note that PATH_MAX is an illusion anyway, there is never a guarantee that a path is smaller than that limit... See this good article: https://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html)

Note that the reason for the leak is probably the fact that you don't clear old values on parse_value. See e.g. how char* does it. However, since you allocate with a constant size anyway, the buffer size never changes, you could just as well either follow my advice above (inlining), or just re-use the existing pointer.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685308132

From stuefe at openjdk.org  Sat Jul 20 10:52:31 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sat, 20 Jul 2024 10:52:31 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v2]
In-Reply-To: <1IQ-_rNXXSFB9LAsP0kbK3MAQSOgKKrqZxFC8tZzrkc=.8b2c00dd-bddf-4362-9125-36ad2042e794@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
 <1IQ-_rNXXSFB9LAsP0kbK3MAQSOgKKrqZxFC8tZzrkc=.8b2c00dd-bddf-4362-9125-36ad2042e794@github.com>
Message-ID: <NvPlnC4JRx-09-a8I0k1n3G917Z8cWMjKQDQ4yKyUTM=.88931077-7c70-4b10-9ef3-70ad66812890@github.com>

On Fri, 19 Jul 2024 20:17:46 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to.
>> 
>> Testing: 
>> - [x] Added test case passes. 
>> 
>> Thanks, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Ensuring test case deletes file in case of exception

src/hotspot/share/code/codeCache.hpp line 226:

> 224:   static void print_summary(outputStream* st, bool detailed = true); // Prints a summary of the code cache usage
> 225:   static void log_state(outputStream* st);
> 226:   LINUX_ONLY(static void write_perf_map(const char* filename, outputStream* st);)

Please add a comment about what the stream `st` is supposed to be.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1685374253

From jsjolen at openjdk.org  Sat Jul 20 11:22:32 2024
From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=)
Date: Sat, 20 Jul 2024 11:22:32 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v5]
In-Reply-To: <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
 <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com>
Message-ID: <lxIweFtdKy3V3X5w6Z0RlVPT0gLUjp1wr0RQQIfcfQw=.7d4c4d60-8bec-404e-8f71-c0357d81984d@github.com>

On Fri, 19 Jul 2024 19:17:54 GMT, Leonid Mesnik <lmesnik at openjdk.org> wrote:

>> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Missing copyright header update
>
> src/hotspot/share/prims/wbtestmethods/parserTests.cpp line 132:
> 
>> 130:    } else if (strcmp(type, "FILE") == 0) {
>> 131:      DCmdArgument<FileArgument> *argument =
>> 132:       new DCmdArgument<FileArgument>(name, desc, "FILE", mandatory);
> 
> Please check indentation.

On top: We hug the `*`s next to the type in Hotspot, not next to the var name. So `DCmdArgument<FileArgument>* argument`. This is something to check for all new code.

Pre-existing: The indentation of the if-block is wrong.

Also, @SoniaZaldana, would you mind changing the code to this (does not include your change), the repetition just made me cringe ?.

```c++
  DCmdArgument<char*>* argument = nullptr;
  if (strcmp(type, "STRING") == 0) {
    argument = new DCmdArgument<char*>(name, desc, "STRING", mandatory, default_value);
  } else if (strcmp(type, "NANOTIME") == 0) {
    DCmdArgument<NanoTimeArgument>* argument = new DCmdArgument<NanoTimeArgument>(name, desc, "NANOTIME", mandatory, default_value);
  } else if (strcmp(type, "JLONG") == 0) {
    DCmdArgument<jlong>* argument = new DCmdArgument<jlong>(name, desc, "JLONG", mandatory, default_value);
  } else if (strcmp(type, "BOOLEAN") == 0) {
    DCmdArgument<bool>* argument = new DCmdArgument<bool>(name, desc, "BOOLEAN", mandatory, default_value);
  } else if (strcmp(type, "MEMORYSIZE") == 0) {
    DCmdArgument<MemorySizeArgument>* argument = new DCmdArgument<MemorySizeArgument>(name, desc, "MEMORY SIZE", mandatory, default_value);
  } else if (strcmp(type, "STRINGARRAY") == 0) {
    DCmdArgument<StringArrayArgument*>* argument = new DCmdArgument<StringArrayArgument*>(name, desc, "STRING SET", mandatory);
  }

  if (argument != nullptr) {
    if (isarg) {
      parser->add_dcmd_argument(argument);
    } else {
      parser->add_dcmd_option(argument);
    }
  }

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685384131

From stuefe at openjdk.org  Sat Jul 20 12:09:31 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sat, 20 Jul 2024 12:09:31 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v5]
In-Reply-To: <lxIweFtdKy3V3X5w6Z0RlVPT0gLUjp1wr0RQQIfcfQw=.7d4c4d60-8bec-404e-8f71-c0357d81984d@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
 <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com>
 <lxIweFtdKy3V3X5w6Z0RlVPT0gLUjp1wr0RQQIfcfQw=.7d4c4d60-8bec-404e-8f71-c0357d81984d@github.com>
Message-ID: <ob8IA7dbncuc-wuqsfs0sIFK7bOSXm8qsvPEPSAGqtw=.c5b1f0eb-b706-4cee-bf97-109be01e22af@github.com>

On Sat, 20 Jul 2024 11:18:34 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> src/hotspot/share/prims/wbtestmethods/parserTests.cpp line 132:
>> 
>>> 130:    } else if (strcmp(type, "FILE") == 0) {
>>> 131:      DCmdArgument<FileArgument> *argument =
>>> 132:       new DCmdArgument<FileArgument>(name, desc, "FILE", mandatory);
>> 
>> Please check indentation.
>
> On top: We hug the `*`s next to the type in Hotspot, not next to the var name. So `DCmdArgument<FileArgument>* argument`. This is something to check for all new code.
> 
> Pre-existing: The indentation of the if-block is wrong.
> 
> Also, @SoniaZaldana, would you mind changing the code to this (does not include your change), the repetition just made me cringe ?.
> 
> ```c++
>   DCmdArgument<char*>* argument = nullptr;
>   if (strcmp(type, "STRING") == 0) {
>     argument = new DCmdArgument<char*>(name, desc, "STRING", mandatory, default_value);
>   } else if (strcmp(type, "NANOTIME") == 0) {
>     DCmdArgument<NanoTimeArgument>* argument = new DCmdArgument<NanoTimeArgument>(name, desc, "NANOTIME", mandatory, default_value);
>   } else if (strcmp(type, "JLONG") == 0) {
>     DCmdArgument<jlong>* argument = new DCmdArgument<jlong>(name, desc, "JLONG", mandatory, default_value);
>   } else if (strcmp(type, "BOOLEAN") == 0) {
>     DCmdArgument<bool>* argument = new DCmdArgument<bool>(name, desc, "BOOLEAN", mandatory, default_value);
>   } else if (strcmp(type, "MEMORYSIZE") == 0) {
>     DCmdArgument<MemorySizeArgument>* argument = new DCmdArgument<MemorySizeArgument>(name, desc, "MEMORY SIZE", mandatory, default_value);
>   } else if (strcmp(type, "STRINGARRAY") == 0) {
>     DCmdArgument<StringArrayArgument*>* argument = new DCmdArgument<StringArrayArgument*>(name, desc, "STRING SET", mandatory);
>   }
> 
>   if (argument != nullptr) {
>     if (isarg) {
>       parser->add_dcmd_argument(argument);
>     } else {
>       parser->add_dcmd_option(argument);
>     }
>   }

@jdksjolen 

> Also, @SoniaZaldana, would you mind changing the code to this 

Even simpler (did not test, but you get my drift):


#define ALL_TYPES_DO_XX(what) \
  what(char*, "STRING") \
  what(NanoTimeArgument, NANOTIME) \
  what(jlong, "JLONG") 
... etc

then


#define XX(TYPE, NAME) \
if (strcmp(type, NAME) == 0) { \
    DCmdArgument<TYPE>* argument = new DCmdArgument<TYPE>(name, desc, NAME, mandatory, mandatory, default_value); \
}
ALL_TYPES_DO_XX(XX)
#undef XX


;-)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685411741

From aph at openjdk.org  Sun Jul 21 08:18:37 2024
From: aph at openjdk.org (Andrew Haley)
Date: Sun, 21 Jul 2024 08:18:37 GMT
Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23:
 runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be
 represented in type 'long int'
In-Reply-To: <enP03_lP52TO93btMHFUgUECzw7mvDL2AHCnCH8pz00=.b92d9d89-14b1-4a06-9ff7-8b754aac9c1d@github.com>
References: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com>
 <enP03_lP52TO93btMHFUgUECzw7mvDL2AHCnCH8pz00=.b92d9d89-14b1-4a06-9ff7-8b754aac9c1d@github.com>
Message-ID: <wI7-7g5DlDaZ0yH29BAWRAvWSARzUWb0S_s_2VzcPkg=.6a1b3f63-d600-4c81-a459-08c00d8373bf@github.com>

On Fri, 19 Jul 2024 22:41:11 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Cast the result of `nth_bit(n)` to `uintptr_t` to prevent signed integer overflow error reported by `ubsan`. Unsigned overflow is not undefined behavior and is not checked by `ubsan`.
>
> src/hotspot/share/utilities/globalDefinitions.hpp line 1069:
> 
>> 1067: // (note: #define used only so that they can be used in enum constant definitions)
>> 1068: #define nth_bit(n)        (((n) >= BitsPerWord) ? 0 : (OneBit << (n)))
>> 1069: #define right_n_bits(n)   ((uintptr_t) nth_bit(n) - 1)
> 
> This changes the return type of right_n_bits, which could break existing code.  If we need an unsigned version, I think it should have a different name.

And this is at best a partial fix: `OneBit << 63` overflows.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20164#discussion_r1685678905

From mdoerr at openjdk.org  Sun Jul 21 08:39:39 2024
From: mdoerr at openjdk.org (Martin Doerr)
Date: Sun, 21 Jul 2024 08:39:39 GMT
Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2]
In-Reply-To: <sEEh7ndeGQznpxAqNtapGJU6dT96EXBNoS3QyVcOn_g=.e0a3525d-690c-4f2c-aca1-48c4975bfb65@github.com>
References: <ClHEkY_xzx37VNyLJr9F9eWSjXfdCRQcbmAhomsY7kU=.f4c3c125-caed-467f-b9fa-213d14f7908a@github.com>
 <sEEh7ndeGQznpxAqNtapGJU6dT96EXBNoS3QyVcOn_g=.e0a3525d-690c-4f2c-aca1-48c4975bfb65@github.com>
Message-ID: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com>

On Thu, 20 Jun 2024 04:17:30 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Build barrier data in G1BarrierSetC2::get_store_barrier() by adding, rather than removing, barrier tags

I have looked at the x86 implementation and I have some performance tuning ideas. Please take a look. I guess at least some of your code is performance critical.

src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 86:

> 84:     // an indirect memory operand) to reduce C2's scheduling and register
> 85:     // allocation pressure (fewer Mach nodes). The same holds for g1StoreN and
> 86:     // g1EncodePAndStoreN.

I'm not convinced that this is beneficial. We're wasting a temp register just for an addition?

src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 123:

> 121:     if ((barrier_data() & G1C2BarrierPost) != 0) {
> 122:       __ movl($tmp2$$Register, $src$$Register);
> 123:       if ((barrier_data() & G1C2BarrierPostNotNull) == 0) {

`decode_heap_oop` contains a null check in some cases which makes some of your code redundant. Optimization idea: In case of `(((barrier_data() & G1C2BarrierPostNotNull) == 0) && CompressedOops::base() != nullptr)` use a null check and bail out because there's nothing left to do if it's null. After that, we can always use `decode_heap_oop_not_null`.

src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 182:

> 180:                          $tmp2$$Register /* pre_val */,
> 181:                          $tmp3$$Register /* tmp */,
> 182:                          RegSet::of($mem$$Register, $newval$$Register, $oldval$$Register) /* preserve */);

The only value which can get overwritten is `oldval`. Optimization idea: Pass `oldval` to the SATB barrier. There is no load of the old value required.

src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 301:

> 299:                          RegSet::of($mem$$Register, $newval$$Register) /* preserve */);
> 300:     __ movq($tmp1$$Register, $newval$$Register);
> 301:     __ xchgq($newval$$Register, Address($mem$$Register, 0));

Optimization idea: Despite its name, `g1_pre_write_barrier` can be moved after the xchg operation because there's no safepoint within this MachNode. This allows avoiding loading the old value twice.

-------------

PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2190271351
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1685680587
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1685682308
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1685683332
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1685683768

From jsjolen at openjdk.org  Sun Jul 21 08:58:35 2024
From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=)
Date: Sun, 21 Jul 2024 08:58:35 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v5]
In-Reply-To: <ob8IA7dbncuc-wuqsfs0sIFK7bOSXm8qsvPEPSAGqtw=.c5b1f0eb-b706-4cee-bf97-109be01e22af@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
 <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com>
 <lxIweFtdKy3V3X5w6Z0RlVPT0gLUjp1wr0RQQIfcfQw=.7d4c4d60-8bec-404e-8f71-c0357d81984d@github.com>
 <ob8IA7dbncuc-wuqsfs0sIFK7bOSXm8qsvPEPSAGqtw=.c5b1f0eb-b706-4cee-bf97-109be01e22af@github.com>
Message-ID: <QuyVjUXmgR2l6pgq-3dvYkKCjLNi5raS7pwt7OagyeY=.783df525-878e-402c-820e-8ac7150dfa97@github.com>

On Sat, 20 Jul 2024 12:06:33 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> On top: We hug the `*`s next to the type in Hotspot, not next to the var name. So `DCmdArgument<FileArgument>* argument`. This is something to check for all new code.
>> 
>> Pre-existing: The indentation of the if-block is wrong.
>> 
>> Also, @SoniaZaldana, would you mind changing the code to this (does not include your change), the repetition just made me cringe ?.
>> 
>> ```c++
>>   DCmdArgument<char*>* argument = nullptr;
>>   if (strcmp(type, "STRING") == 0) {
>>     argument = new DCmdArgument<char*>(name, desc, "STRING", mandatory, default_value);
>>   } else if (strcmp(type, "NANOTIME") == 0) {
>>     DCmdArgument<NanoTimeArgument>* argument = new DCmdArgument<NanoTimeArgument>(name, desc, "NANOTIME", mandatory, default_value);
>>   } else if (strcmp(type, "JLONG") == 0) {
>>     DCmdArgument<jlong>* argument = new DCmdArgument<jlong>(name, desc, "JLONG", mandatory, default_value);
>>   } else if (strcmp(type, "BOOLEAN") == 0) {
>>     DCmdArgument<bool>* argument = new DCmdArgument<bool>(name, desc, "BOOLEAN", mandatory, default_value);
>>   } else if (strcmp(type, "MEMORYSIZE") == 0) {
>>     DCmdArgument<MemorySizeArgument>* argument = new DCmdArgument<MemorySizeArgument>(name, desc, "MEMORY SIZE", mandatory, default_value);
>>   } else if (strcmp(type, "STRINGARRAY") == 0) {
>>     DCmdArgument<StringArrayArgument*>* argument = new DCmdArgument<StringArrayArgument*>(name, desc, "STRING SET", mandatory);
>>   }
>> 
>>   if (argument != nullptr) {
>>     if (isarg) {
>>       parser->add_dcmd_argument(argument);
>>     } else {
>>       parser->add_dcmd_option(argument);
>>     }
>>   }
>
> @jdksjolen 
> 
>> Also, @SoniaZaldana, would you mind changing the code to this 
> 
> Even simpler (did not test, but you get my drift):
> 
> 
> #define ALL_TYPES_DO_XX(what) \
>   what(char*, "STRING") \
>   what(NanoTimeArgument, NANOTIME) \
>   what(jlong, "JLONG") 
> ... etc
> 
> then
> 
> 
> #define XX(TYPE, NAME) \
> if (strcmp(type, NAME) == 0) { \
>     DCmdArgument<TYPE>* argument = new DCmdArgument<TYPE>(name, desc, NAME, mandatory, mandatory, default_value); \
> }
> ALL_TYPES_DO_XX(XX)
> #undef XX
> 
> 
> ;-)

Sonia, my bad if you already know this stuff but since it's fairly esoteric knowledge nowadays I'd like to help you out in advance: Thomas is proposing the usage of a X macro https://en.wikipedia.org/wiki/X_macro

These can be found throughout Hotspot, you can find an example definition and usage in `logTag.hpp` and `logTag.cpp`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685688287

From stuefe at openjdk.org  Sun Jul 21 10:11:31 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sun, 21 Jul 2024 10:11:31 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v5]
In-Reply-To: <QuyVjUXmgR2l6pgq-3dvYkKCjLNi5raS7pwt7OagyeY=.783df525-878e-402c-820e-8ac7150dfa97@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
 <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com>
 <lxIweFtdKy3V3X5w6Z0RlVPT0gLUjp1wr0RQQIfcfQw=.7d4c4d60-8bec-404e-8f71-c0357d81984d@github.com>
 <ob8IA7dbncuc-wuqsfs0sIFK7bOSXm8qsvPEPSAGqtw=.c5b1f0eb-b706-4cee-bf97-109be01e22af@github.com>
 <QuyVjUXmgR2l6pgq-3dvYkKCjLNi5raS7pwt7OagyeY=.783df525-878e-402c-820e-8ac7150dfa97@github.com>
Message-ID: <A3pXePPLllhBqQUHUSx6sR7iEZm9rB0nOFr90TXKHMQ=.57917ac7-d787-42b3-aaad-2c9e1285725f@github.com>

On Sun, 21 Jul 2024 08:55:35 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> @jdksjolen 
>> 
>>> Also, @SoniaZaldana, would you mind changing the code to this 
>> 
>> Even simpler (did not test, but you get my drift):
>> 
>> 
>> #define ALL_TYPES_DO_XX(what) \
>>   what(char*, "STRING") \
>>   what(NanoTimeArgument, NANOTIME) \
>>   what(jlong, "JLONG") 
>> ... etc
>> 
>> then
>> 
>> 
>> #define XX(TYPE, NAME) \
>> if (strcmp(type, NAME) == 0) { \
>>     DCmdArgument<TYPE>* argument = new DCmdArgument<TYPE>(name, desc, NAME, mandatory, mandatory, default_value); \
>> }
>> ALL_TYPES_DO_XX(XX)
>> #undef XX
>> 
>> 
>> ;-)
>
> Sonia, my bad if you already know this stuff but since it's fairly esoteric knowledge nowadays I'd like to help you out in advance: Thomas is proposing the usage of a X macro https://en.wikipedia.org/wiki/X_macro
> 
> These can be found throughout Hotspot, you can find an example definition and usage in `logTag.hpp` and `logTag.cpp`.

@SoniaZaldana Note that this is very much optional.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685701779

From dholmes at openjdk.org  Mon Jul 22 01:23:39 2024
From: dholmes at openjdk.org (David Holmes)
Date: Mon, 22 Jul 2024 01:23:39 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v5]
In-Reply-To: <SDkUb2GIxTALba_5Dc-t3qK_zov3SEzBIMWzOsb2T0M=.1f420a00-732a-4917-a111-a0124240d4f6@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
 <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com>
 <EIIwRleptsdSt_X7tJdQ1mbPYD4RhXcpqA3b4UBb8mU=.ed64e38d-1aed-4304-9d53-1aa3ed434f89@github.com>
 <SDkUb2GIxTALba_5Dc-t3qK_zov3SEzBIMWzOsb2T0M=.1f420a00-732a-4917-a111-a0124240d4f6@github.com>
Message-ID: <lciD2DICObBi9SOnTBJY35-VpJUtAJqLzPc_Ciei5ac=.9ac61a07-bb47-4d91-8d93-365a9d3c4f05@github.com>

On Sat, 20 Jul 2024 07:50:46 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Yes. In this file, other commands use `fatal` only where reading the hard-coded default values - in the various `init_...` functions. Hard-coded values should be valid, obviously, otherwise the JVM developer messed up. Other values are passed in by the end user via jcmd and should not crash the JVM.
>
> I see the prevalent way to deal with runtime parse errors is to throw a java exception. That exception later is caught in the command processing loop at the entrance of the attach listener thread.
> 
> So, @SoniaZaldana, I would do this here too - when in Rome...
> 
> But is this not unnecessarily complex? It requires the AttachListener to be a java thread when in fact it does need no java - we just misuse java exception handling as a way to pass error information up the stack, with the simple ultimate goal of writing error information into the outputStream to be sent to the caller. We might just as well pass the outputStream* to the parse_xxx functions as third argument, and write directly and return some error code. This would make the attach listener thread a lot less dependent on Java and more robust - at least for jcmds that don't need Java (which jcmds need java?). 
> 
> After all, the attach listener is supposed to be super robust and always work even if the JVM misbehaves. @dholmes-ora @lmesnik what do you guys think, should we change that? (obviously in a different RFE)

If the attach listener thread doesn't actually need to be a Java thread then you could look into changing that. Not sure it would really buy us that much in terms of added robustness though.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685861533

From dholmes at openjdk.org  Mon Jul 22 01:27:42 2024
From: dholmes at openjdk.org (David Holmes)
Date: Mon, 22 Jul 2024 01:27:42 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v5]
In-Reply-To: <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
Message-ID: <63Fx1KzzGDy5E1pMf0HOb3P9o6gD6Rtps3YJYu-MLyY=.539c62f1-c36a-4fe4-8c57-ab44d932d4cb@github.com>

On Fri, 19 Jul 2024 14:07:12 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Missing copyright header update

src/hotspot/share/services/diagnosticArgument.cpp line 361:

> 359: void DCmdArgument<FileArgument>::parse_value(const char *str, size_t len,
> 360:                                                    TRAPS) {
> 361:   if (str == NULL) {

s/NULL/nullptr

src/hotspot/share/services/diagnosticArgument.cpp line 372:

> 370: 
> 371: template <> void DCmdArgument<FileArgument>::init_value(TRAPS) {
> 372:   if (has_default() && _default_string != NULL) {

s/NULL/nullptr

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685862082
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1685862613

From mli at openjdk.org  Mon Jul 22 07:06:38 2024
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 22 Jul 2024 07:06:38 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v4]
In-Reply-To: <FZMjsZWO9NKx4v5svo8qQPE5HKqvoiM-lc0oiDCah80=.2d250429-524a-4e93-a453-bf1db0238626@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
 <FZMjsZWO9NKx4v5svo8qQPE5HKqvoiM-lc0oiDCah80=.2d250429-524a-4e93-a453-bf1db0238626@github.com>
Message-ID: <T_c96_qdjC0dGxSe6JWkdvzfBOy3fiszOcTowErU2TQ=.398cdb88-cf71-4a45-ab22-7a34947db7e0@github.com>

On Tue, 2 Jul 2024 14:16:35 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> 
>> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it.
>> 
>> Thanks.
>> 
>> ## Test
>> benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32)
>> 
>> I've tried several implementations, respectively with vector group
>> * m2+m1+scalar
>> * m2+scalar
>> * m1+scalar
>> * pure scalar
>> The best one is combination of m2+m1, it have best performance in all source size.
>> 
>> ### K230
>> 
>> this implementation (m2+m1)
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic
>> -- | -- | -- | -- | -- | -- | -- | -- | --
>> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063
>> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443
>> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364
>> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856
>> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146
>> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423
>> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862
>> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469
>> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631
>> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698
>> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876
>> 
>> </google-sheets-html-origin>
>> 
>> vector with only m2
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 4...
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   move label

Hi, Can I get another review of this pr? Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2242234520

From shade at openjdk.org  Mon Jul 22 08:49:12 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 22 Jul 2024 08:49:12 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2]
In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
Message-ID: <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com>

> See bug for more discussion.
> 
> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
> 
> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
> 
> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
> 
> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
> 
> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
> 
> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
> 
> Additional testing:
>  - [x] New IR tests
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, jcstre...

Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:

 - Merge branch 'master' into JDK-8333791-stable-field-barrier
 - Variant 2: Only final-field like semantics for stable inits
 - Variant 3: Handle everything, including reads by compilers

-------------

Changes: https://git.openjdk.org/jdk/pull/19635/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19635&range=01
  Stats: 1063 lines in 16 files changed: 1023 ins; 20 del; 20 mod
  Patch: https://git.openjdk.org/jdk/pull/19635.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19635/head:pull/19635

PR: https://git.openjdk.org/jdk/pull/19635

From shade at openjdk.org  Mon Jul 22 08:49:12 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 22 Jul 2024 08:49:12 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields
In-Reply-To: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
Message-ID: <a7AAzVU7g5n4oMfJNoSjp3xbQsJ9GavCWtaurdpdrWA=.083ab17f-52ad-4141-b672-a4ca5ab960d0@github.com>

On Mon, 10 Jun 2024 18:05:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> See bug for more discussion.
> 
> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
> 
> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
> 
> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
> 
> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
> 
> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
> 
> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
> 
> Additional testing:
>  - [x] New IR tests
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, jcstre...

Still waiting for formal reviews, please :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2242415734

From thartmann at openjdk.org  Mon Jul 22 11:31:32 2024
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Mon, 22 Jul 2024 11:31:32 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
Message-ID: <LQaSi1l1PrOsNt4-4DWLguJQH9wxiywwTCh8TkGSo4U=.6b642f32-f804-4aad-8e07-aaea9ed23cc1@github.com>

On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote:

> In the cases like:
> 
>   UNSAFE.putLong(address + off1 + 1030, lseed);
>   UNSAFE.putLong(address + 1023, lseed);
>   UNSAFE.putLong(address + off2 + 1001, lseed);
> 
> 
> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like:
> 
>   ldr  R10, [R15, #120]    # int ! Field: address
>   ldr  R11, [R16, #136]    # int ! Field: off1
>   ldr  R12, [R16, #144]    # int ! Field: off2
>   add  R11, R11, R10
>   mov R11, R11    # long -> ptr
>   add  R12, R12, R10
>   mov R10, R10    # long -> ptr
>   add R11, R11, #1030    # ptr
>   str  R17, [R11]    # int
>   add R10, R10, #1023    # ptr
>   str  R17, [R10]    # int
>   mov R10, R12    # long -> ptr
>   add R10, R10, #1001    # ptr
>   str  R17, [R10]    # int
> 
> 
> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly:
> 
>   ldr    x10, [x15,#120]
>   ldp    x11, x12, [x16,#136]
>   add    x11, x11, x10
>   add    x12, x12, x10
>   add    x11, x11, #0x406
>   str    x17, [x11]
>   add    x10, x10, #0x3ff
>   str    x17, [x10]
>   mov    x10, x12  <--- extra register copy
>   add    x10, x10, #0x3e9
>   str    x17, [x10]
> 
> 
> There is still one extra register copy, which we're trying to remove in this patch.
> 
> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition.
> 
> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so
> 
> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906

All tests passed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2242730776

From szaldana at openjdk.org  Mon Jul 22 13:45:49 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Mon, 22 Jul 2024 13:45:49 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v3]
In-Reply-To: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
Message-ID: <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com>

> Hi all, 
> 
> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to.
> 
> Testing: 
> - [x] Added test case passes. 
> 
> Thanks, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  Adding comment

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20257/files
  - new: https://git.openjdk.org/jdk/pull/20257/files/6129d87c..237f751a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20257&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20257&range=01-02

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20257.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20257/head:pull/20257

PR: https://git.openjdk.org/jdk/pull/20257

From aph at openjdk.org  Mon Jul 22 14:03:34 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 22 Jul 2024 14:03:34 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v3]
In-Reply-To: <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com>
Message-ID: <cfKy-VTUht4Fbtb5-paKJZvCVAar1mq6Y0d0pDbkFQE=.1aa56b36-3067-4357-89ed-d1d8c3f64426@github.com>

On Thu, 11 Jul 2024 22:53:42 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> src/hotspot/share/oops/instanceKlass.cpp line 1410:
>> 
>>> 1408:     return nullptr;
>>> 1409:   } else if (num_extra_slots == 0) {
>>> 1410:     if (num_extra_slots == 0 && interfaces->length() <= 1) {
>> 
>> Since `secondary_supers` are hashed unconditionally now,  is `interfaces->length() <= 1` check still needed?
>
> Also, `num_extra_slots == 0` check is redundant.

> Since `secondary_supers` are hashed unconditionally now, is `interfaces->length() <= 1` check still needed?

I don't think so, no. Our incoming `transitive_interfaces` is formed by concatenating the interface lists of our superclasses and superinterfaces.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1686607068

From aph at openjdk.org  Mon Jul 22 14:18:33 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 22 Jul 2024 14:18:33 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v3]
In-Reply-To: <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
Message-ID: <WOY02iAdeWi-IgqSfHkkydfPyRxH1TpsYPYvFD8sRv0=.befb015d-0622-492a-87ab-fe52d0b1fa64@github.com>

On Fri, 5 Jul 2024 22:30:09 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Andrew Haley has updated the pull request incrementally with four additional commits since the last revision:
>> 
>>  - Review feedback
>>  - Review feedback
>>  - Review feedback
>>  - Cleanup check_klass_subtype_fast_path for AArch64, deleting dead code
>
> src/hotspot/share/oops/klass.cpp line 175:
> 
>> 173:     if (secondary_supers()->at(i) == k) {
>> 174:       if (UseSecondarySupersCache) {
>> 175:         ((Klass*)this)->set_secondary_super_cache(k);
> 
> Does it make sense to assert `UseSecondarySupersCache` in `Klass::set_secondary_super_cache()`?

I kinda hate this because we're casting away `const`, which is UB. I think I'd just take it out, but once I do that, I don't think anything sets `_secondary_super_cache`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1686631030

From kevinw at openjdk.org  Mon Jul 22 14:32:33 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Mon, 22 Jul 2024 14:32:33 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v3]
In-Reply-To: <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
 <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com>
Message-ID: <f9HcTrrpOw3OimtNWfyRqIXWKwBm8E8Ft0xFv_8d6jc=.481ec6f2-95a3-4eeb-9744-3cfefdf9cb84@github.com>

On Mon, 22 Jul 2024 13:45:49 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to.
>> 
>> Testing: 
>> - [x] Added test case passes. 
>> 
>> Thanks, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adding comment

src/hotspot/share/code/codeCache.cpp line 1804:

> 1802:   fileStream fs(filename, "w");
> 1803:   if (!fs.is_open()) {
> 1804:     st->print_cr("Failed to create %s for perf map", filename);

Hi -- as this used to be "log_warning" and print something like:
  [1129077.636s][warning][codecache] Failed to create /x for perf map
..it should probably say:
  Warning: Failed to...etc...

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1686652944

From kevinw at openjdk.org  Mon Jul 22 14:35:32 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Mon, 22 Jul 2024 14:35:32 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v3]
In-Reply-To: <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
 <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com>
Message-ID: <AVNqzuDJA0GcYQgHi3y8FhuPH30ZD0dKdOFrWj6Iaxc=.bcb850de-a604-4441-9977-fd720118bd7b@github.com>

On Mon, 22 Jul 2024 13:45:49 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to.
>> 
>> Testing: 
>> - [x] Added test case passes. 
>> 
>> Thanks, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adding comment

Looks good.  
Yes the DCmds in diagnosticCommand.cpp tend to use outputStream* output (or out) as a param, this may help make it more obvious what the stream is for.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20257#issuecomment-2243113330

From kevinw at openjdk.org  Mon Jul 22 14:43:33 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Mon, 22 Jul 2024 14:43:33 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v3]
In-Reply-To: <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
 <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com>
Message-ID: <bQkYgB9eODkqc5_zOIerg-TkznUX9-IKQQiVbnpO_b0=.328dc96f-1945-4e17-8e6a-aa9aee15e2fb@github.com>

On Mon, 22 Jul 2024 13:45:49 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to.
>> 
>> Testing: 
>> - [x] Added test case passes. 
>> 
>> Thanks, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adding comment

test/hotspot/jtreg/serviceability/dcmd/compiler/PerfMapTest.java line 124:

> 122:             OutputAnalyzer output = new JMXExecutor().execute("Compiler.perfmap %s".formatted(path));
> 123:             output.shouldContain("Failed to create nonexistent/%s for perf map".formatted(test_dir));
> 124:             output.shouldNotHaveExitValue(0);

I'm curious if this exit value check works, as jcmd failures like this show "Command executed successfully" and return 0 for success.
These compiler tests have chosen JMXExecutor and PidJcmdExecutor which might be relevant.  Interested to know if JMXExecutor returns a non-zero exit value for this?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1686674484

From aph at openjdk.org  Mon Jul 22 14:59:33 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 22 Jul 2024 14:59:33 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v3]
In-Reply-To: <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
Message-ID: <wgY2erz716MCi6K6DcUKEqLyd6E82ArMlba9qHdAA9o=.de21daa5-b078-4469-a6eb-df548f699f65@github.com>

On Thu, 11 Jul 2024 23:07:43 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Andrew Haley has updated the pull request incrementally with four additional commits since the last revision:
>> 
>>  - Review feedback
>>  - Review feedback
>>  - Review feedback
>>  - Cleanup check_klass_subtype_fast_path for AArch64, deleting dead code
>
> src/hotspot/share/oops/klass.inline.hpp line 117:
> 
>> 115: }
>> 116: 
>> 117: inline bool Klass::search_secondary_supers(Klass *k) const {
> 
> I see you moved `Klass::search_secondary_supers` in `klass.inline.hpp`, but I'm not sure how it interacts with `Klass::is_subtype_of` (the sole caller) being declared in `klass.hpp`. 
> 
> Will the inlining still happen if `Klass::is_subtype_of()` callers include `klass.hpp`?

Presumably this question applies to every function in `klass.inline.hpp`?
Practically everything does `#include "oops/klass.inline.hpp"`. It's inlined in about 120 files, as far as I can see everywhere such queries are made.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1686697935

From aph at openjdk.org  Mon Jul 22 15:08:44 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 22 Jul 2024 15:08:44 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v3]
In-Reply-To: <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com>
 <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com>
 <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com>
Message-ID: <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com>

On Thu, 18 Jul 2024 20:11:03 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> Alternatively, `Klass::is_subtype_of()` can unconditionally perform linear search over secondary_supers array.
> 
> Even though I very much like to see table lookup written in C++ (accompanying heavily optimized platform-specific MacroAssembler variants), it would make C++ runtime even simpler.

It would, but there is something to be said for being able to provide a fast "no" answer for interface membership. I'll agree it's probably not a huge difference. I guess `is_cloneable_fast()` exists only because searching the interfaces is slow.
Also, if table lookup is written in C++ but not used, it will rot.
Also also, `Klass::is_subtype_of()` is used for C1 runtime.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1686707925

From szaldana at openjdk.org  Mon Jul 22 15:36:47 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Mon, 22 Jul 2024 15:36:47 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v4]
In-Reply-To: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
Message-ID: <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com>

> Hi all, 
> 
> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to.
> 
> Testing: 
> - [x] Added test case passes. 
> 
> Thanks, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  Updating warning message

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20257/files
  - new: https://git.openjdk.org/jdk/pull/20257/files/237f751a..a2e46173

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20257&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20257&range=02-03

  Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20257.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20257/head:pull/20257

PR: https://git.openjdk.org/jdk/pull/20257

From szaldana at openjdk.org  Mon Jul 22 15:36:47 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Mon, 22 Jul 2024 15:36:47 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v3]
In-Reply-To: <bQkYgB9eODkqc5_zOIerg-TkznUX9-IKQQiVbnpO_b0=.328dc96f-1945-4e17-8e6a-aa9aee15e2fb@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
 <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com>
 <bQkYgB9eODkqc5_zOIerg-TkznUX9-IKQQiVbnpO_b0=.328dc96f-1945-4e17-8e6a-aa9aee15e2fb@github.com>
Message-ID: <uQtz_rHgh64P0E9HIBQQ54DALvBp2HfegG22ZDFimqI=.5ba67d11-37ec-456b-93af-3a8d080dd4c3@github.com>

On Mon, 22 Jul 2024 14:41:16 GMT, Kevin Walls <kevinw at openjdk.org> wrote:

>> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Adding comment
>
> test/hotspot/jtreg/serviceability/dcmd/compiler/PerfMapTest.java line 124:
> 
>> 122:             OutputAnalyzer output = new JMXExecutor().execute("Compiler.perfmap %s".formatted(path));
>> 123:             output.shouldContain("Failed to create nonexistent/%s for perf map".formatted(test_dir));
>> 124:             output.shouldNotHaveExitValue(0);
> 
> I'm curious if this exit value check works, as jcmd failures like this show "Command executed successfully" and return 0 for success.
> These compiler tests have chosen JMXExecutor and PidJcmdExecutor which might be relevant.  Interested to know if JMXExecutor returns a non-zero exit value for this?

Hi Kevin, 

Yes, I noticed the test was exiting with a non-zero value when I was testing. After giving it a bit more thought, that check is not the main purpose of the test and I'm not entirely sure why the JMXExecutor behaves that way. I'll just remove the exit value check to avoid confusion.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1686750734

From fgao at openjdk.org  Mon Jul 22 16:01:38 2024
From: fgao at openjdk.org (Fei Gao)
Date: Mon, 22 Jul 2024 16:01:38 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <LQaSi1l1PrOsNt4-4DWLguJQH9wxiywwTCh8TkGSo4U=.6b642f32-f804-4aad-8e07-aaea9ed23cc1@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
 <LQaSi1l1PrOsNt4-4DWLguJQH9wxiywwTCh8TkGSo4U=.6b642f32-f804-4aad-8e07-aaea9ed23cc1@github.com>
Message-ID: <urZOIbDXPwEJSdpYPYCaDsqbB87cm3sfeSAVMFHAUeU=.6dd2957d-7f07-4880-8864-32249a5e74bc@github.com>

On Mon, 22 Jul 2024 11:29:07 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

> All tests passed.

@TobiHartmann thanks for your testing!

Jcstress also passed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2243302556

From kevinw at openjdk.org  Mon Jul 22 16:34:32 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Mon, 22 Jul 2024 16:34:32 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v3]
In-Reply-To: <uQtz_rHgh64P0E9HIBQQ54DALvBp2HfegG22ZDFimqI=.5ba67d11-37ec-456b-93af-3a8d080dd4c3@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
 <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com>
 <bQkYgB9eODkqc5_zOIerg-TkznUX9-IKQQiVbnpO_b0=.328dc96f-1945-4e17-8e6a-aa9aee15e2fb@github.com>
 <uQtz_rHgh64P0E9HIBQQ54DALvBp2HfegG22ZDFimqI=.5ba67d11-37ec-456b-93af-3a8d080dd4c3@github.com>
Message-ID: <z_BmsLBrn2IUlx6mTNBXLAnwLwRit04MVeLcmBxz3Eg=.d451fe73-e101-439e-98d8-b3c891a37b38@github.com>

On Mon, 22 Jul 2024 15:33:27 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> test/hotspot/jtreg/serviceability/dcmd/compiler/PerfMapTest.java line 124:
>> 
>>> 122:             OutputAnalyzer output = new JMXExecutor().execute("Compiler.perfmap %s".formatted(path));
>>> 123:             output.shouldContain("Failed to create nonexistent/%s for perf map".formatted(test_dir));
>>> 124:             output.shouldNotHaveExitValue(0);
>> 
>> I'm curious if this exit value check works, as jcmd failures like this show "Command executed successfully" and return 0 for success.
>> These compiler tests have chosen JMXExecutor and PidJcmdExecutor which might be relevant.  Interested to know if JMXExecutor returns a non-zero exit value for this?
>
> Hi Kevin, 
> 
> Yes, I noticed the test was exiting with a non-zero value when I was testing. After giving it a bit more thought, that check is not the main purpose of the test and I'm not entirely sure why the JMXExecutor behaves that way. I'll just remove the exit value check to avoid confusion.

I think it was returning an "exit value" of -1, the test/lib/jdk/test/lib/process/OutputBuffer.java default.

JMXExecutor doesn't set one as there isn't an exit code...   That could be clearer, actually there must be various issues in that area.  But yes just don't check exit code for a JMX Executor, and jcmd would return zero but we don't want to embed that as a requirement, it's really a failure.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1686829542

From aph at openjdk.org  Mon Jul 22 16:39:33 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 22 Jul 2024 16:39:33 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v3]
In-Reply-To: <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com>
 <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com>
 <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com>
 <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com>
Message-ID: <UAjH__AKdU3UMdJBkg7TlElKSA8mEFFE0MiElVrYexE=.4bc67a26-3383-4e4e-92b0-f1d3d33c5ce2@github.com>

On Mon, 22 Jul 2024 15:03:12 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Alternatively, `Klass::is_subtype_of()` can unconditionally perform linear search over secondary_supers array. 
>> 
>> Even though I very much like to see table lookup written in C++ (accompanying heavily optimized platform-specific MacroAssembler variants), it would make C++ runtime even simpler.
>
>> Alternatively, `Klass::is_subtype_of()` can unconditionally perform linear search over secondary_supers array.
>> 
>> Even though I very much like to see table lookup written in C++ (accompanying heavily optimized platform-specific MacroAssembler variants), it would make C++ runtime even simpler.
> 
> It would, but there is something to be said for being able to provide a fast "no" answer for interface membership. I'll agree it's probably not a huge difference. I guess `is_cloneable_fast()` exists only because searching the interfaces is slow.
> Also, if table lookup is written in C++ but not used, it will rot.
> Also also, `Klass::is_subtype_of()` is used for C1 runtime.

Thinking about it some more, I don't really mind. There may be some virtue to moving lookup_secondary_supers_table() to a comment in the back end(s), and the expansion of population_count() is rather bloaty.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1686835253

From aph at openjdk.org  Mon Jul 22 16:50:47 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 22 Jul 2024 16:50:47 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v4]
In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
Message-ID: <uHjma_H_iMAeOvm_sfZVc0ifNLwfGdgoV5JyIJTl7uA=.68facdd5-8681-4264-975c-d1a4b6a8eef4@github.com>

> This patch expands the use of a hash table for secondary superclasses
> to the interpreter, C1, and runtime. It also adds a C2 implementation
> of hashed lookup in cases where the superclass isn't known at compile
> time.
> 
> HotSpot shared runtime
> ----------------------
> 
> Building hashed secondary tables is now unconditional. It takes very
> little time, and now that the shared runtime always has the tables, it
> might as well take advantage of them. The shared code is easier to
> follow now, I think.
> 
> There might be a performance issue with x86-64 in that we build
> HotSpot for a default x86-64 target that does not support popcount.
> This means that HotSpot C++ runtime on x86 always uses a software
> emulation for popcount, even though the vast majority of machines made
> for the past 20 years can do popcount in a single instruction. It
> wouldn't be terribly hard to do something about that.
> 
> Having said that, the software popcount is really not bad.
> 
> x86
> ---
> 
> x86 is rather tricky, because we still support
> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
> well as 32- and 64-bit ports. There's some further complication in
> that only `RCX` can be used as a shift count, so there's some register
> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
> rather gnarly, with multiple levels of conditionals at compile time
> and runtime.
> 
> AArch64
> -------
> 
> AArch64 is considerably more straightforward. We always have a
> popcount instruction and (thankfully) no 32-bit code to worry about.
> 
> Generally
> ---------
> 
> I would dearly love simply to rip out the "old" secondary supers cache
> support, but I've left it in just in case someone has a performance
> regression.
> 
> The versions of `MacroAssembler::lookup_secondary_supers_table` that
> work with variable superclasses don't take a fixed set of temp
> registers, and neither do they call out to to a slow path subroutine.
> Instead, the slow patch is expanded inline.
> 
> I don't think this is necessarily bad. Apart from the very rare cases
> where C2 can't determine the superclass to search for at compile time,
> this code is only used for generating stubs, and it seemed to me
> ridiculous to have stubs calling other stubs.
> 
> I've followed the guidance from @iwanowww not to obsess too much about
> the performance of C1-compiled secondary supers lookups, and to prefer
> simplicity over absolute performance. Nonetheless, this is a
> complicated patch that touches many areas.

Andrew Haley has updated the pull request incrementally with two additional commits since the last revision:

 - Review comments
 - Review comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19989/files
  - new: https://git.openjdk.org/jdk/pull/19989/files/98f6b2b7..c252efcb

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=02-03

  Stats: 41 lines in 10 files changed: 9 ins; 17 del; 15 mod
  Patch: https://git.openjdk.org/jdk/pull/19989.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989

PR: https://git.openjdk.org/jdk/pull/19989

From aph at openjdk.org  Mon Jul 22 16:50:48 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 22 Jul 2024 16:50:48 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v4]
In-Reply-To: <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
Message-ID: <QRaDiTIgGhnhvj1km2MOoIDYXKGjnzC04OoEkYgUrxU=.cdd1b266-2380-4c72-884a-163ef267be74@github.com>

On Thu, 11 Jul 2024 23:57:27 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Andrew Haley has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Review comments
>>  - Review comments
>
> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4810:
> 
>> 4808:                                                          Label* L_success,
>> 4809:                                                          Label* L_failure) {
>> 4810:   // NB! Callers may assume that, when temp2_reg is a valid register,
> 
> Oh, that's a subtle point... Can we make it more evident at call sites?

Done. I think the only code that still depends on it is the C2 pattern that uses check_klass_subtype_slow_path_linear in x86_32.ad and x86_64.ad.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1686845837

From aph at openjdk.org  Mon Jul 22 17:10:33 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 22 Jul 2024 17:10:33 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v4]
In-Reply-To: <uHjma_H_iMAeOvm_sfZVc0ifNLwfGdgoV5JyIJTl7uA=.68facdd5-8681-4264-975c-d1a4b6a8eef4@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <uHjma_H_iMAeOvm_sfZVc0ifNLwfGdgoV5JyIJTl7uA=.68facdd5-8681-4264-975c-d1a4b6a8eef4@github.com>
Message-ID: <LVhS1u3QiLX3D5SyyTrlIIERslY6e1CDagmo0ngb7VE=.bf70305a-383d-4d6d-a4fa-40613767487f@github.com>

On Mon, 22 Jul 2024 16:50:47 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> Andrew Haley has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Review comments
>  - Review comments

All done apart from the questions of:

1. Should `Klass::linear_search_secondary_supers() const` call `set_secondary_super_cache()`? (Strong no from me. It's UB.)

2. Should we use a straight linear search for secondary C++ supers in the runtime, i.e.not changing it for now?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2243430000

From aph at openjdk.org  Mon Jul 22 17:19:46 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 22 Jul 2024 17:19:46 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v5]
In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
Message-ID: <6P17gX_V6nL3hsgbuPrGN4Y8nzyoQMs3fTLaiRaOzwA=.e3eb0ea0-d41c-4222-a1f3-65f9075dbb4d@github.com>

> This patch expands the use of a hash table for secondary superclasses
> to the interpreter, C1, and runtime. It also adds a C2 implementation
> of hashed lookup in cases where the superclass isn't known at compile
> time.
> 
> HotSpot shared runtime
> ----------------------
> 
> Building hashed secondary tables is now unconditional. It takes very
> little time, and now that the shared runtime always has the tables, it
> might as well take advantage of them. The shared code is easier to
> follow now, I think.
> 
> There might be a performance issue with x86-64 in that we build
> HotSpot for a default x86-64 target that does not support popcount.
> This means that HotSpot C++ runtime on x86 always uses a software
> emulation for popcount, even though the vast majority of machines made
> for the past 20 years can do popcount in a single instruction. It
> wouldn't be terribly hard to do something about that.
> 
> Having said that, the software popcount is really not bad.
> 
> x86
> ---
> 
> x86 is rather tricky, because we still support
> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
> well as 32- and 64-bit ports. There's some further complication in
> that only `RCX` can be used as a shift count, so there's some register
> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
> rather gnarly, with multiple levels of conditionals at compile time
> and runtime.
> 
> AArch64
> -------
> 
> AArch64 is considerably more straightforward. We always have a
> popcount instruction and (thankfully) no 32-bit code to worry about.
> 
> Generally
> ---------
> 
> I would dearly love simply to rip out the "old" secondary supers cache
> support, but I've left it in just in case someone has a performance
> regression.
> 
> The versions of `MacroAssembler::lookup_secondary_supers_table` that
> work with variable superclasses don't take a fixed set of temp
> registers, and neither do they call out to to a slow path subroutine.
> Instead, the slow patch is expanded inline.
> 
> I don't think this is necessarily bad. Apart from the very rare cases
> where C2 can't determine the superclass to search for at compile time,
> this code is only used for generating stubs, and it seemed to me
> ridiculous to have stubs calling other stubs.
> 
> I've followed the guidance from @iwanowww not to obsess too much about
> the performance of C1-compiled secondary supers lookups, and to prefer
> simplicity over absolute performance. Nonetheless, this is a
> complicated patch that touches many areas.

Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:

  Review comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19989/files
  - new: https://git.openjdk.org/jdk/pull/19989/files/c252efcb..02cfd130

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=03-04

  Stats: 18 lines in 6 files changed: 1 ins; 0 del; 17 mod
  Patch: https://git.openjdk.org/jdk/pull/19989.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989

PR: https://git.openjdk.org/jdk/pull/19989

From duke at openjdk.org  Mon Jul 22 18:57:00 2024
From: duke at openjdk.org (Henry Lin)
Date: Mon, 22 Jul 2024 18:57:00 GMT
Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23:
 runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be
 represented in type 'long int' [v2]
In-Reply-To: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com>
References: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com>
Message-ID: <XwiF_fbIIjpihvHwT3ZjBry0ZvI-SRwNbY5Mp79yEa4=.77273086-6e03-4cef-bbd2-f3f1517f0430@github.com>

> Cast the result of `nth_bit(n)` to `uintptr_t` to prevent signed integer overflow error reported by `ubsan`. Unsigned overflow is not undefined behavior and is not checked by `ubsan`.

Henry Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:

 - Merge branch 'openjdk:master' into 833269790-overflow
 - 8332697: fix ubsan:shenandoahSimpleBitMap.inline.hpp runtime integer overflow

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20164/files
  - new: https://git.openjdk.org/jdk/pull/20164/files/ed2797fe..75d921cc

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20164&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20164&range=00-01

  Stats: 6383 lines in 291 files changed: 4501 ins; 914 del; 968 mod
  Patch: https://git.openjdk.org/jdk/pull/20164.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20164/head:pull/20164

PR: https://git.openjdk.org/jdk/pull/20164

From szaldana at openjdk.org  Mon Jul 22 19:49:08 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Mon, 22 Jul 2024 19:49:08 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v6]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <Q1D2x__4N9ElYI5FHhkgxNT9elpOvYcjSyim00C0EfE=.c241f5bd-840c-40ff-8157-9be769e8ef99@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with three additional commits since the last revision:

 - Fixing memory leak
 - Fixing pointer style, s/NULL/nullptr, and exception
 - Cleaning up parserTests.cpp

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/cdf1d457..801fc582

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=04-05

  Stats: 78 lines in 4 files changed: 9 ins; 16 del; 53 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From szaldana at openjdk.org  Mon Jul 22 20:03:08 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Mon, 22 Jul 2024 20:03:08 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v7]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <5UHSCkGbA7jXwwEfE8ou0LzvPd5flc7M9ZwbNhZFFvM=.c677a49b-c98c-42c9-81df-b366379aefa9@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  Error messaging format

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/801fc582..517db0cd

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=05-06

  Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From szaldana at openjdk.org  Mon Jul 22 20:06:33 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Mon, 22 Jul 2024 20:06:33 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v5]
In-Reply-To: <A3pXePPLllhBqQUHUSx6sR7iEZm9rB0nOFr90TXKHMQ=.57917ac7-d787-42b3-aaad-2c9e1285725f@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
 <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com>
 <lxIweFtdKy3V3X5w6Z0RlVPT0gLUjp1wr0RQQIfcfQw=.7d4c4d60-8bec-404e-8f71-c0357d81984d@github.com>
 <ob8IA7dbncuc-wuqsfs0sIFK7bOSXm8qsvPEPSAGqtw=.c5b1f0eb-b706-4cee-bf97-109be01e22af@github.com>
 <QuyVjUXmgR2l6pgq-3dvYkKCjLNi5raS7pwt7OagyeY=.783df525-878e-402c-820e-8ac7150dfa97@github.com>
 <A3pXePPLllhBqQUHUSx6sR7iEZm9rB0nOFr90TXKHMQ=.57917ac7-d787-42b3-aaad-2c9e1285725f@github.com>
Message-ID: <_frxtxQ56OXre5svEw_F8AOS7t-bOT0wSP1rcIh_hOI=.b8544cfb-66bd-4431-a63c-0599b2a38f08@github.com>

On Sun, 21 Jul 2024 10:08:38 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Sonia, my bad if you already know this stuff but since it's fairly esoteric knowledge nowadays I'd like to help you out in advance: Thomas is proposing the usage of a X macro https://en.wikipedia.org/wiki/X_macro
>> 
>> These can be found throughout Hotspot, you can find an example definition and usage in `logTag.hpp` and `logTag.cpp`.
>
> @SoniaZaldana Note that this is very much optional.

Hi folks, thanks for the pointers! I wasn't familiar with X macros and after some time toying around with them, I'm sad to report that I am not a fan (yet!).

I implemented it and ended up breaking part of the tests. I quickly realized that debugging these is a bit harder for less experienced c++ developers (like myself). 

So, just wanted to note: 
- I cleaned up the indentation in this function as it was all wrong. 
- I didn't get rid of the repetition. Tried to but quickly realized we can't pull the DCmdArgument out of the if statements as they're different types. 

And note to self, to keep reviewing X macros because they did shorten the code a lot when I implemented them. Perhaps I'll give it another go in a different RFE.

Sorry it's not what either of you hoped for!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1687080108

From szaldana at openjdk.org  Mon Jul 22 20:06:34 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Mon, 22 Jul 2024 20:06:34 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v5]
In-Reply-To: <CxwbysRpQzuTNFd7180Kh5neBkx9mMJOcct-eqCzrIQ=.5393f979-4fe1-445f-a99d-a49515eec5fe@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
 <znqG3L1DK9JuvU1fIlXfIEu8pp5t_LxIhITS1xvOPBc=.43cc6737-3086-4c2a-bad5-627ab1e91ca6@github.com>
 <CxwbysRpQzuTNFd7180Kh5neBkx9mMJOcct-eqCzrIQ=.5393f979-4fe1-445f-a99d-a49515eec5fe@github.com>
Message-ID: <5-itK2a-qgShmcLpo2Dvj1OUds2U1PF9uqgi8eO5Odk=.fc97e659-13d7-4c5e-800a-16ebdcf3e809@github.com>

On Sat, 20 Jul 2024 08:02:34 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/share/services/diagnosticArgument.hpp line 66:
>> 
>>> 64:   public:
>>> 65:     char *_name;
>>> 66: };
>> 
>> Something is off about this. What is the lifetime of this object?
>> 
>> You don't free it. Running a command in a loop will consume C-heap (you can check this with NMT: `jcmd VM.native_memory baseline`, then run a command 100 times, then `jcmd VM.native_memory summary.diff` will show you the leak in mtInternal.
>> 
>> I would probably just inline the string. E.g.
>> 
>> 
>> struct FileArgument {
>>   char name[max name len]
>> };
>> 
>> 
>> FileArguments sits as member inside DCmdArgument. DCmdArgument or DCmdArgumentWithParser sits as member in the various XXXDCmd classes. 
>> 
>> Those are created in DCmdFactory::create_local_DCmd(). Which is what, a static global list? So we only have one global XXXDCmd object instance per command, but for each command invocation re-parse the argument values? What a weird concept.
>> 
>> Man, this coding is way too convoluted for a little parser engine :( 
>> 
>> But anyway, inlining the filename array into FileArgument should be probably fine from a size standpoint. I would, however, not use JVM_MAXPATHLEN or anything that depends ultimately on PATH_MAX from system headers. We don't want the object to consume e.g. an MB if some crazy platform defines PATH_MAX as 1MB. Therefore I would use e.g. 1024 as limit for the path name.
>> 
>> (Note that PATH_MAX is an illusion anyway, there is never a guarantee that a path is smaller than that limit... See this good article: https://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html)
>
> Note that the reason for the leak is probably the fact that you don't clear old values on parse_value. See e.g. how char* does it. However, since you allocate with a constant size anyway, the buffer size never changes, you could just as well either follow my advice above (inlining), or just re-use the existing pointer.

Hi Thomas, 

Yes - this was an oversight on my end. I was not directly calling the `destroy_value()` function. I tried to follow more closely what?s done for `char*`, as I like the consistency throughout. 

I did a quick check and I don?t see any more leaks with NMT. Does the new change make sense to you as well? 

Thank you for the feedback!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1687081245

From szaldana at openjdk.org  Mon Jul 22 20:08:32 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Mon, 22 Jul 2024 20:08:32 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v3]
In-Reply-To: <z_BmsLBrn2IUlx6mTNBXLAnwLwRit04MVeLcmBxz3Eg=.d451fe73-e101-439e-98d8-b3c891a37b38@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
 <i39AX00lgta2Gqe3KDTcr2ahB1bIkBMvRS_gHq58w4g=.780a2f9f-0479-4546-9f85-5d6bc1e99da1@github.com>
 <bQkYgB9eODkqc5_zOIerg-TkznUX9-IKQQiVbnpO_b0=.328dc96f-1945-4e17-8e6a-aa9aee15e2fb@github.com>
 <uQtz_rHgh64P0E9HIBQQ54DALvBp2HfegG22ZDFimqI=.5ba67d11-37ec-456b-93af-3a8d080dd4c3@github.com>
 <z_BmsLBrn2IUlx6mTNBXLAnwLwRit04MVeLcmBxz3Eg=.d451fe73-e101-439e-98d8-b3c891a37b38@github.com>
Message-ID: <qsOX2Vh_5cBb-iHFz68RS1NbphfEmFcjV_bFcjbSArE=.2c6450df-2198-4523-a852-00d29d9a797c@github.com>

On Mon, 22 Jul 2024 16:31:32 GMT, Kevin Walls <kevinw at openjdk.org> wrote:

> I think it was returning an "exit value" of -1, the test/lib/jdk/test/lib/process/OutputBuffer.java default.

Correct, that was the exit value. 

> But yes just don't check exit code for a JMX Executor, and jcmd would return zero but we don't want to embed that as a requirement, it's really a failure.

Agreed. I removed the exit status check.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20257#discussion_r1687083247

From duke at openjdk.org  Mon Jul 22 20:29:03 2024
From: duke at openjdk.org (Henry Lin)
Date: Mon, 22 Jul 2024 20:29:03 GMT
Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23:
 runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be
 represented in type 'long int' [v3]
In-Reply-To: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com>
References: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com>
Message-ID: <DZ1Cgjr0Osj6By1c7iu2eROuYThWFRwYgnGgnqmxWmI=.9b961b43-4d57-4081-aeda-723968175328@github.com>

> Cast the result of `nth_bit(n)` to `uintptr_t` to prevent signed integer overflow error reported by `ubsan`. Unsigned overflow is not undefined behavior and is not checked by `ubsan`.

Henry Lin has updated the pull request incrementally with one additional commit since the last revision:

  revert right_n_bits and add unsigned right_n_bits to shenandoahSimpleBitMap.hpp

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20164/files
  - new: https://git.openjdk.org/jdk/pull/20164/files/75d921cc..f2011961

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20164&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20164&range=01-02

  Stats: 13 lines in 4 files changed: 6 ins; 1 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/20164.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20164/head:pull/20164

PR: https://git.openjdk.org/jdk/pull/20164

From duke at openjdk.org  Mon Jul 22 21:16:07 2024
From: duke at openjdk.org (Henry Lin)
Date: Mon, 22 Jul 2024 21:16:07 GMT
Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23:
 runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be
 represented in type 'long int' [v4]
In-Reply-To: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com>
References: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com>
Message-ID: <o3jkDOZH-LfQNZkoYyz2t1W1cqYWRdsmJY2L58OLO34=.50d80985-f269-482e-973d-fbc04668bd7a@github.com>

> Cast the result of `nth_bit(n)` to `uintptr_t` to prevent signed integer overflow error reported by `ubsan`. Unsigned overflow is not undefined behavior and is not checked by `ubsan`.

Henry Lin has updated the pull request incrementally with one additional commit since the last revision:

  formatting

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20164/files
  - new: https://git.openjdk.org/jdk/pull/20164/files/f2011961..9b798b37

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20164&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20164&range=02-03

  Stats: 2 lines in 2 files changed: 1 ins; 1 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20164.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20164/head:pull/20164

PR: https://git.openjdk.org/jdk/pull/20164

From lmesnik at openjdk.org  Mon Jul 22 21:49:31 2024
From: lmesnik at openjdk.org (Leonid Mesnik)
Date: Mon, 22 Jul 2024 21:49:31 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v4]
In-Reply-To: <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
 <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com>
Message-ID: <_w6qRLOoFJVbJ29HZIxuwJy-ikNOKsLLXfdgUmPQ-6M=.ea3356ed-e677-4aa9-8782-debc015b0084@github.com>

On Mon, 22 Jul 2024 15:36:47 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to.
>> 
>> Testing: 
>> - [x] Added test case passes. 
>> 
>> Thanks, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Updating warning message

Marked as reviewed by lmesnik (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20257#pullrequestreview-2192606124

From duke at openjdk.org  Mon Jul 22 22:09:31 2024
From: duke at openjdk.org (Henry Lin)
Date: Mon, 22 Jul 2024 22:09:31 GMT
Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23:
 runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be
 represented in type 'long int' [v4]
In-Reply-To: <o3jkDOZH-LfQNZkoYyz2t1W1cqYWRdsmJY2L58OLO34=.50d80985-f269-482e-973d-fbc04668bd7a@github.com>
References: <rs1CTZ0ODrwbGEqtAapeGhQtMkN1VyPeM-8O8385sPM=.824b60d9-0f40-4a30-8929-a0fcda9d7169@github.com>
 <o3jkDOZH-LfQNZkoYyz2t1W1cqYWRdsmJY2L58OLO34=.50d80985-f269-482e-973d-fbc04668bd7a@github.com>
Message-ID: <pYacwEisUE7HnGgH88lNFIp0zqFph6RsaivyoiOhnLY=.1017984f-a8f6-40e0-a13b-e36fcf82e632@github.com>

On Mon, 22 Jul 2024 21:16:07 GMT, Henry Lin <duke at openjdk.org> wrote:

>> Cast the result of `nth_bit(n)` to `uintptr_t` to prevent signed integer overflow error reported by `ubsan`. Unsigned overflow is not undefined behavior and is not checked by `ubsan`.
>
> Henry Lin has updated the pull request incrementally with one additional commit since the last revision:
> 
>   formatting

Thanks for the feedback. I reverted my changes in `globalDefinitions.hpp` and added an unsigned version of `right_n_bits` in `shenandoahSimpleBitMap.hpp`. This new unsigned version replaces the usages that caused undefined overflow behavior in the `shenandoahSimpleBitMap` files.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20164#issuecomment-2243893659

From stuefe at openjdk.org  Tue Jul 23 06:48:31 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 23 Jul 2024 06:48:31 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v4]
In-Reply-To: <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
 <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com>
Message-ID: <8VD-XmtZQBI9KdQVZfZIRjvQSgyAkXKy4TXK7jdLBa0=.07cc90b9-d704-445a-b4a8-f100e611d0aa@github.com>

On Mon, 22 Jul 2024 15:36:47 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to.
>> 
>> Testing: 
>> - [x] Added test case passes. 
>> 
>> Thanks, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Updating warning message

Looks good

-------------

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20257#pullrequestreview-2193108607

From kevinw at openjdk.org  Tue Jul 23 08:03:38 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Tue, 23 Jul 2024 08:03:38 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v4]
In-Reply-To: <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
 <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com>
Message-ID: <Ixj2Rbbz2GkbF0CUDu1TdPITELi56npMpMOYjM0-FPI=.5b081457-2b8b-4290-9fb9-2c77011dfa33@github.com>

On Mon, 22 Jul 2024 15:36:47 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to.
>> 
>> Testing: 
>> - [x] Added test case passes. 
>> 
>> Thanks, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Updating warning message

Marked as reviewed by kevinw (Reviewer).

Yes looks good. I only waited in case you were changing "outputStream* st" to be called out or output, to address Thomas' comment.  (I notice now the added comment.)

-------------

PR Review: https://git.openjdk.org/jdk/pull/20257#pullrequestreview-2193261838
PR Comment: https://git.openjdk.org/jdk/pull/20257#issuecomment-2244523438

From mbaesken at openjdk.org  Tue Jul 23 09:55:05 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Tue, 23 Jul 2024 09:55:05 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap'
Message-ID: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>

When running with ubsan - enabled binaries, some tests trigger the following report :

src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
    #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91
    #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106
    #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157
    #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63
    #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85
    #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319
    #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233
    #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199

Seems in case of (at least) class SmallRegisterMap we miss handling nullptr .

-------------

Commit messages:
 - JDK-8333354

Changes: https://git.openjdk.org/jdk/pull/20296/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20296&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8333354
  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20296.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20296/head:pull/20296

PR: https://git.openjdk.org/jdk/pull/20296

From mli at openjdk.org  Tue Jul 23 11:27:00 2024
From: mli at openjdk.org (Hamlin Li)
Date: Tue, 23 Jul 2024 11:27:00 GMT
Subject: RFR: 8335191: RISC-V: verify perf of chacha20
Message-ID: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com>

Hi,
Can you help to review this simple patch?

Previously, we implemented this intrinsic for chacha20 algo based on vector instructions, the latest test on real hardwares (k230, bananapi) shows that the implementation only bring more performance gain rather than regression when (vlenb == 32, on bananapi), when vlenb == 16 (on k230) it only bring regression in all test cases.
So, we should adjust when to turn on the intrinsic, ie. only when vlenb == 32.

Thanks


## Performance

### on k230
vlenb == 16
<google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
Benchmark - on k230, vlenb == 16 | (dataSize) | (keyLength) | (mode) | (padding) | (permutation) | Cnt | Score -no-intrinsic | Score +intrinsic | Error | Units | Non-intrinsic/intrinsic
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4642.694 | 5216.699 | 36.039 | ns/op | 0.89
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15719.612 | 17622.616 | 136.609 | ns/op | 0.892
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 59402.742 | 67124.28 | 651.011 | ns/op | 0.885
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 250056.475 | 269184.924 | 8591.727 | ns/op | 0.929
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4752.081 | 5131.094 | 38.917 | ns/op | 0.926
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15554.484 | 16992.339 | 106.583 | ns/op | 0.915
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 61446.365 | 67359.67 | 548.353 | ns/op | 0.912
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 241653.654 | 270189.531 | 3705.045 | ns/op | 0.894
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 256 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 17833.825 | 20610.118 | 688.668 | ns/op | 0.865
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 1024 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 32872.633 | 36427.148 | 4339.823 | ns/op | 0.902
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 4096 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 87398.821 | 96112.498 | 1028.342 | ns/op | 0.909
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 16384 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 314533.305 | 342115.144 | 13633.382 | ns/op | 0.919
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 256 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 12190.039 | 14844.154 | 111.009 | ns/op | 0.821
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 1024 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 25734.516 | 30267.139 | 326.158 | ns/op | 0.85
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 4096 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 81007.764 | 90623.578 | 572.987 | ns/op | 0.894
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 16384 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 308229.077 | 343146.562 | 18801.368 | ns/op | 0.898
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt | 16384 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 321267.148 | 340960.217 | 22253.659 | ns/op | 0.942
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt | 16384 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 307476.57 | 341029.841 | 13851.386 | ns/op | 0.902

</google-sheets-html-origin>

### on bananapi
vlenb == 32
<google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
Benchmark - on bananas, vlenb == 32 | (dataSize) | (keyLength) | (mode) | (padding) | (permutation) | Cnt | Score +intrinsic | Score -intrinsic | Error | Units | improvement
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4804.517 | 4154.869 | 2.951 | ns/op | 0.865
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 10782.788 | 14604.89 | 19.031 | ns/op | 1.354
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 39502.457 | 57211.53 | 69.436 | ns/op | 1.448
o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 166005.925 | 228615.833 | 22.311 | ns/op | 1.377
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 5040.652 | 4389.007 | 60.197 | ns/op | 0.871
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 11176.787 | 14530.768 | 12.192 | ns/op | 1.3
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 40875.87 | 56149.493 | 111.238 | ns/op | 1.374
o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 166459.572 | 221221.334 | 1078.792 | ns/op | 1.329
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 256 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 17781.57 | 14356.974 | 38.96 | ns/op | 0.807
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 1024 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 26098.932 | 27368.785 | 52.171 | ns/op | 1.049
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 4096 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 67351.38 | 82535.832 | 111.414 | ns/op | 1.225
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 16384 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 235767.096 | 295121.502 | 1443.64 | ns/op | 1.252
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 256 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 13634.202 | 10476.916 | 21.069 | ns/op | 0.768
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 1024 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 22209.959 | 24513.545 | 23.072 | ns/op | 1.104
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 4096 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 62540.238 | 78088.592 | 54.63 | ns/op | 1.249
o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt | 16384 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 225358.667 | 293718.246 | 314.449 | ns/op | 1.303
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt | 16384 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 237810.351 | 295495.242 | 412.976 | ns/op | 1.243
o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt | 16384 | 256 | None | NoPadding | ChaCha20-Poly1305 | 10 | 230771.689 | 290751.264 | 315.883 | ns/op | 1.26

</google-sheets-html-origin>

-------------

Commit messages:
 - Initial commit

Changes: https://git.openjdk.org/jdk/pull/20298/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20298&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335191
  Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20298.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20298/head:pull/20298

PR: https://git.openjdk.org/jdk/pull/20298

From jsjolen at openjdk.org  Tue Jul 23 12:11:34 2024
From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=)
Date: Tue, 23 Jul 2024 12:11:34 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v5]
In-Reply-To: <_frxtxQ56OXre5svEw_F8AOS7t-bOT0wSP1rcIh_hOI=.b8544cfb-66bd-4431-a63c-0599b2a38f08@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <csUmVTBvwjvNM6UkA9GGKOz07IhWbRzEyAUIJn-JCHk=.43c20c5e-b4ea-4c16-9cc8-4b2ae5df8cf5@github.com>
 <OtnhXPtAh2B02PSOOvQndDLToT517SqsTHcLQq_eeVM=.4111e376-a331-4aef-bce2-375f7dec5531@github.com>
 <lxIweFtdKy3V3X5w6Z0RlVPT0gLUjp1wr0RQQIfcfQw=.7d4c4d60-8bec-404e-8f71-c0357d81984d@github.com>
 <ob8IA7dbncuc-wuqsfs0sIFK7bOSXm8qsvPEPSAGqtw=.c5b1f0eb-b706-4cee-bf97-109be01e22af@github.com>
 <QuyVjUXmgR2l6pgq-3dvYkKCjLNi5raS7pwt7OagyeY=.783df525-878e-402c-820e-8ac7150dfa97@github.com>
 <A3pXePPLllhBqQUHUSx6sR7iEZm9rB0nOFr90TXKHMQ=.57917ac7-d787-42b3-aaad-2c9e1285725f@github.com>
 <_frxtxQ56OXre5svEw_F8AOS7t-bOT0wSP1rcIh_hOI=.b8544cfb-66bd-4431-a63c-0599b2a38f08@github.com>
Message-ID: <RW3IM-8JSTnZHhndCBdY_zqgkf2JUA_GYdZ99DM9ZPw=.7389febf-394a-4ac4-a06a-cc4efe9289bd@github.com>

On Mon, 22 Jul 2024 20:02:40 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> @SoniaZaldana Note that this is very much optional.
>
> Hi folks, thanks for the pointers! I wasn't familiar with X macros and after some time toying around with them, I'm sad to report that I am not a fan (yet!).
> 
> I implemented it and ended up breaking part of the tests. I quickly realized that debugging these is a bit harder for less experienced c++ developers (like myself). 
> 
> So, just wanted to note: 
> - I cleaned up the indentation in this function as it was all wrong. 
> - I didn't get rid of the repetition. Tried to but quickly realized we can't pull the DCmdArgument out of the if statements as they're different types. 
> 
> And note to self, to keep reviewing X macros because they did shorten the code a lot when I implemented them. Perhaps I'll give it another go in a different RFE.
> 
> Sorry it's not what either of you hoped for!

That's fine, thanks for having a go!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1687949210

From coleenp at openjdk.org  Tue Jul 23 12:37:42 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Tue, 23 Jul 2024 12:37:42 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6]
In-Reply-To: <u2VLk8hKBH5V6331fMIPCwusNARMd_v-q_wL_7r0AOA=.99b9b9f1-ac37-4cb6-9ad0-4e019fe3c1fe@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
 <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com>
 <u2VLk8hKBH5V6331fMIPCwusNARMd_v-q_wL_7r0AOA=.99b9b9f1-ac37-4cb6-9ad0-4e019fe3c1fe@github.com>
Message-ID: <7MDa4Z7FtvI5TG3rARV50PQckm3MSqOzBefku_lFwyc=.ead08ce2-1850-4803-a2eb-bd22cdcdd221@github.com>

On Mon, 15 Jul 2024 00:44:02 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> src/hotspot/share/oops/instanceKlass.cpp line 1090:
>> 
>>> 1088: 
>>> 1089:     // Step 2
>>> 1090:     // If we were to use wait() instead of waitUninterruptibly() then
>> 
>> This is a nice correction (even though, the actual call below is wait_uninterruptibly() ;-) ), but seems totally unrelated.
>
> I was thinking it was referring to `ObjectSynchronizer::waitUninterruptibly` added the same commit as the comment b3bf31a0a08da679ec2fd21613243fb17b1135a9

git backout restored the old wrong comment.  We should fix this separately.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1687985648

From duke at openjdk.org  Tue Jul 23 13:28:36 2024
From: duke at openjdk.org (duke)
Date: Tue, 23 Jul 2024 13:28:36 GMT
Subject: RFR: 8327054: DiagnosticCommand Compiler.perfmap does not log on
 output() [v4]
In-Reply-To: <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
 <ZIRiuNc2vsX1131QzFVFV8CzgGZHMKYICw8MSMppCiA=.1c8f4be7-35d7-41c6-9ec1-319d375411ae@github.com>
Message-ID: <GdyeMf2ruwa49XKOtvjjXcA8OJY9133NasLqD2h8D_c=.d510ba33-b25a-4601-8a65-c48727c9fcf5@github.com>

On Mon, 22 Jul 2024 15:36:47 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to.
>> 
>> Testing: 
>> - [x] Added test case passes. 
>> 
>> Thanks, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Updating warning message

@SoniaZaldana 
Your change (at version a2e46173e7d260f8fbc1a9372090ea5867a65a29) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20257#issuecomment-2245252290

From duke at openjdk.org  Tue Jul 23 13:57:36 2024
From: duke at openjdk.org (fitzsim)
Date: Tue, 23 Jul 2024 13:57:36 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <ml4dF0TwSQdlURT8ETSAN9RVnx3iMIVNDCWedq8lc1Y=.6b3da39c-41fc-460e-8632-d5a42be279ab@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
 <ml4dF0TwSQdlURT8ETSAN9RVnx3iMIVNDCWedq8lc1Y=.6b3da39c-41fc-460e-8632-d5a42be279ab@github.com>
Message-ID: <XSv3vwmQVv2abJLfDCmKsELqrY9d2Ohe_FemTMIzxXw=.3351b8ca-91d0-4a0b-9154-f758c293dcdc@github.com>

On Fri, 19 Jul 2024 09:18:13 GMT, Andrew Haley <aph at openjdk.org> wrote:

> Compared to current implementation in #19185, my bit concern about [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) is the future maintainence effort when we need to update the sleef source along with the cmake changes, also when new platforms support of sleef are added in jdk.

To check this, I [added](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-2/) the `riscv64` `CMake` steps to `SleefCommon.gmk`.

I had intended to factor out `SetupSleefHeader` anyway for `aarch64`, to eliminate copy-n-paste.

After that, there was one build step divergence for `riscv64` for the naming of the helper header.

The two `riscv64` commits are:

- [copy `helperrvv.h`](https://github.com/fitzsim/jdk/commit/bcd3813ca97f6308838ee93bcb5c02d9cd37375a)
- [add `riscv64` support to `SleefCommon.gmk`](https://github.com/fitzsim/jdk/commit/21e0369682095422f45015d817410d07c711b8c0)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2245327294

From ayang at openjdk.org  Tue Jul 23 14:18:58 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Tue, 23 Jul 2024 14:18:58 GMT
Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate
Message-ID: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com>

Simple obsoleting a Parallel GC product flag.

-------------

Commit messages:
 - pgc-obsolete-base-footprint

Changes: https://git.openjdk.org/jdk/pull/20299/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20299&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8337027
  Stats: 28 lines in 7 files changed: 0 ins; 25 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/20299.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20299/head:pull/20299

PR: https://git.openjdk.org/jdk/pull/20299

From szaldana at openjdk.org  Tue Jul 23 15:52:37 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Tue, 23 Jul 2024 15:52:37 GMT
Subject: Integrated: 8327054: DiagnosticCommand Compiler.perfmap does not log
 on output()
In-Reply-To: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
References: <zNGCDclJKzdxqROxbR1RmrZcehi2o2A0IPLEFFAWcGY=.6d4d3d03-da0e-4b9f-b730-ac75ae68c8fb@github.com>
Message-ID: <AJK-0iLZwFOt9O0VhLmlF5ph1dPY5QdDlEn1arTycDA=.d988bcf8-bea4-4f51-b1bc-a10cf870ce3d@github.com>

On Fri, 19 Jul 2024 15:07:39 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

> Hi all, 
> 
> This is a small patch to address [8327054](https://bugs.openjdk.org/browse/JDK-8327054) making `CodeCache::write_perf_map` aware of which output stream errors and warning message should be going to.
> 
> Testing: 
> - [x] Added test case passes. 
> 
> Thanks, 
> Sonia

This pull request has now been integrated.

Changeset: 8e1f17e3
Author:    Sonia Zaldana Calles <szaldana at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/8e1f17e351bc7949b318a0542a4a4cb30ead5a97
Stats:     18 lines in 5 files changed: 12 ins; 0 del; 6 mod

8327054: DiagnosticCommand Compiler.perfmap does not log on output()

Reviewed-by: lmesnik, stuefe, kevinw, cjplummer

-------------

PR: https://git.openjdk.org/jdk/pull/20257

From stuefe at openjdk.org  Tue Jul 23 15:59:35 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 23 Jul 2024 15:59:35 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v7]
In-Reply-To: <5UHSCkGbA7jXwwEfE8ou0LzvPd5flc7M9ZwbNhZFFvM=.c677a49b-c98c-42c9-81df-b366379aefa9@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <5UHSCkGbA7jXwwEfE8ou0LzvPd5flc7M9ZwbNhZFFvM=.c677a49b-c98c-42c9-81df-b366379aefa9@github.com>
Message-ID: <EtjIbzgWfPfmRdkLLbGD6dv_Cs4vxNGch1qG13lgxAM=.5f15b693-6b52-4252-b46a-d79cb980da64@github.com>

On Mon, 22 Jul 2024 20:03:08 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Error messaging format

Slowly getting there... :)

src/hotspot/share/prims/wbtestmethods/parserTests.cpp line 79:

> 77:      DCmdArgument<char*>* argument = new DCmdArgument<char*>(
> 78:      name, desc,
> 79:      "STRING", mandatory, default_value);

I would revert all these style-only changes and just keep the functional one (addition of FileArgument handling). 

Let's keep this for a follow-up.

src/hotspot/share/services/diagnosticArgument.cpp line 376:

> 374:       THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(), error_msg.base());
> 375:     }
> 376:   }

The realloc here is a bit pointless since if `_value._name` is set, it already points to a buffer of size JVM_MAXPATHLEN.

I would either one of these two:

- either inline the buffer into FileArgument as I wrote earlier; no need to allocate or deallocate then.
- or, in this function, allocate if _name is not null, use existing buffer otherwise

src/hotspot/share/services/diagnosticCommand.cpp line 524:

> 522:   HeapDumper dumper(!_all.value() /* request GC if _all is false*/);
> 523:   dumper.dump(_filename.value()._name, output(), (int)level, _overwrite.value(),
> 524:               (uint)parallel);

Please revert style-only changes, lets keep those for follow ups.

src/hotspot/share/services/diagnosticCommand.cpp line 1195:

> 1193: 
> 1194: void SystemDumpMapDCmd::execute(DCmdSource source, TRAPS) {
> 1195:   const char* name = _filename.value()._name;

This direct access to the member inside _filename is a bit awkward. I would make the buffer private and give the class some getters and setters, possibly like this:

class FileArgument {
  // private stuff
public:
  const char* get() const { 
    // return internal buffer
  }

  // returns true if parsing succeeded, false if not
  bool parse_value(const char* s, size_t len) {
     // call Arguments::copyexpand, target internal buffer, and return its return value
  }
}

-------------

PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2193132206
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1687527542
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1688310305
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1688264948
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1688316839

From szaldana at openjdk.org  Tue Jul 23 17:43:51 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Tue, 23 Jul 2024 17:43:51 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v8]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <vtqGm4iD9-utSITUJrGmlAo6W8KQvfrKR0GZIaYgyZY=.2780c1f3-1471-44f2-a9b6-2fb6bd1c1d66@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with three additional commits since the last revision:

 - Fixing formatting
 - Inlining buffer and making field private
 - Reverting to functional changes in parserTests.cpp

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/517db0cd..c898b1cf

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=06-07

  Stats: 80 lines in 4 files changed: 16 ins; 9 del; 55 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From szaldana at openjdk.org  Tue Jul 23 17:57:04 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Tue, 23 Jul 2024 17:57:04 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v9]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits:

 - Merge master
 - Fixing formatting
 - Inlining buffer and making field private
 - Reverting to functional changes in parserTests.cpp
 - Error messaging format
 - Fixing memory leak
 - Fixing pointer style, s/NULL/nullptr, and exception
 - Cleaning up parserTests.cpp
 - Missing copyright header update
 - Adding tests for file dcmd argument
 - ... and 5 more: https://git.openjdk.org/jdk/compare/2f2223d7...52ca557d

-------------

Changes: https://git.openjdk.org/jdk/pull/20198/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=08
  Stats: 195 lines in 11 files changed: 154 ins; 19 del; 22 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From szaldana at openjdk.org  Tue Jul 23 18:12:32 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Tue, 23 Jul 2024 18:12:32 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v9]
In-Reply-To: <0FaB5dyzz0jaa0RETfdT4wcbS3jPg4QzIzj1s-pPWvw=.805a55dc-d141-482f-b6aa-e6c4fdfbb97d@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <vluUCz7LJUc6FInntimxmXcyImSJfrxWkBOUWat-2zs=.7b3ab621-30a8-4e6d-89f2-77c3504dc432@github.com>
 <0FaB5dyzz0jaa0RETfdT4wcbS3jPg4QzIzj1s-pPWvw=.805a55dc-d141-482f-b6aa-e6c4fdfbb97d@github.com>
Message-ID: <_dK88nL0aH7z6iBgH3_pwwFfCfzom8_I5xGJU5L0swo=.11562964-b22b-47c4-8dde-8c3bc42009b4@github.com>

On Wed, 17 Jul 2024 14:21:05 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> * In all cases: please, in case of an error, don't THROW, don't do `warning`. Instead, just print to the `output()` of the DCmd. You want an error to appear to the user of the dcmd - so, to stdout or stderr of the jcmd process issuing the command. Not an exception in the target JVM process, nor a warning in the target JVM stderr stream

FYI, I filed [JDK-8337047](https://bugs.openjdk.org/browse/JDK-8337047) to track this.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2245923827

From vlivanov at openjdk.org  Tue Jul 23 19:03:33 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 23 Jul 2024 19:03:33 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v5]
In-Reply-To: <wgY2erz716MCi6K6DcUKEqLyd6E82ArMlba9qHdAA9o=.de21daa5-b078-4469-a6eb-df548f699f65@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <wgY2erz716MCi6K6DcUKEqLyd6E82ArMlba9qHdAA9o=.de21daa5-b078-4469-a6eb-df548f699f65@github.com>
Message-ID: <Ct5EunuM4nq5EUa-kDtCzKs-O4Z_wEnMq2_5W7GPaeY=.f475ee5b-3bea-48a7-97d1-7f71287e4fc9@github.com>

On Mon, 22 Jul 2024 14:56:31 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> src/hotspot/share/oops/klass.inline.hpp line 117:
>> 
>>> 115: }
>>> 116: 
>>> 117: inline bool Klass::search_secondary_supers(Klass *k) const {
>> 
>> I see you moved `Klass::search_secondary_supers` in `klass.inline.hpp`, but I'm not sure how it interacts with `Klass::is_subtype_of` (the sole caller) being declared in `klass.hpp`. 
>> 
>> Will the inlining still happen if `Klass::is_subtype_of()` callers include `klass.hpp`?
>
> Presumably this question applies to every function in `klass.inline.hpp`?
> Practically everything does `#include "oops/klass.inline.hpp"`. It's inlined in about 120 files, as far as I can see everywhere such queries are made.

My confusion arises from the following:
* `Klass::is_subtype_of()` is declared in `klass.hpp`
* `Klass::is_subtype_of()` calls `Klass::search_secondary_supers()`
* `Klass::search_secondary_supers()` is declared in `klass.inline.hpp`
* `klass.inline.hpp` includes `klass.hpp`

What happens when users include `klass.hpp`, but not `klass.inline.hpp`? How does it affect generated code? 

I suspect that `Klass::search_secondary_supers()` won't be inlinined in such case.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1688559463

From vlivanov at openjdk.org  Tue Jul 23 19:09:40 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 23 Jul 2024 19:09:40 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v5]
In-Reply-To: <cfKy-VTUht4Fbtb5-paKJZvCVAar1mq6Y0d0pDbkFQE=.1aa56b36-3067-4357-89ed-d1d8c3f64426@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <7JeIjy2PKvI4EZpDain1vd0dBRlWjgjp42xPeY0bHMs=.fee63987-dd85-486d-b7d3-67e52fdbee6f@github.com>
 <cfKy-VTUht4Fbtb5-paKJZvCVAar1mq6Y0d0pDbkFQE=.1aa56b36-3067-4357-89ed-d1d8c3f64426@github.com>
Message-ID: <QlxMOE2D1aUYQodcEmvPd_V4oF2H9OXrQSi0du4gIpg=.9d6539a4-20ed-4bb3-b2f1-a9ee9bb816ab@github.com>

On Mon, 22 Jul 2024 14:00:35 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Also, `num_extra_slots == 0` check is redundant.
>
>> Since `secondary_supers` are hashed unconditionally now, is `interfaces->length() <= 1` check still needed?
> 
> I don't think so, no. Our incoming `transitive_interfaces` is formed by concatenating the interface lists of our superclasses and superinterfaces.

Right, I forgot the details. It requires us to hash transitive interfaces list first.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1688568321

From vlivanov at openjdk.org  Tue Jul 23 19:09:41 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 23 Jul 2024 19:09:41 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v5]
In-Reply-To: <WOY02iAdeWi-IgqSfHkkydfPyRxH1TpsYPYvFD8sRv0=.befb015d-0622-492a-87ab-fe52d0b1fa64@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <WOY02iAdeWi-IgqSfHkkydfPyRxH1TpsYPYvFD8sRv0=.befb015d-0622-492a-87ab-fe52d0b1fa64@github.com>
Message-ID: <tCRApgUEbhhxlWBKS56kjCZPeUVX2PjbHOFDfu7vyPM=.a6a65521-a4a6-4bdb-b1ca-046ffa8464dd@github.com>

On Mon, 22 Jul 2024 14:16:05 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> src/hotspot/share/oops/klass.cpp line 175:
>> 
>>> 173:     if (secondary_supers()->at(i) == k) {
>>> 174:       if (UseSecondarySupersCache) {
>>> 175:         ((Klass*)this)->set_secondary_super_cache(k);
>> 
>> Does it make sense to assert `UseSecondarySupersCache` in `Klass::set_secondary_super_cache()`?
>
> I kinda hate this because we're casting away `const`, which is UB. I think I'd just take it out, but once I do that, I don't think anything sets `_secondary_super_cache`.

IMO it's OK if C++ runtime omits `_secondary_super_cache` accesses irrespective of whether `UseSecondarySupersCache` is set or not. I'm fine with addressing it separately.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1688565818

From coleenp at openjdk.org  Tue Jul 23 19:09:43 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Tue, 23 Jul 2024 19:09:43 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
Message-ID: <1ivG4ii0OclIXn9-0Ihh4udD4WUu5Oe64ovWDY1xSJ4=.731721e8-806e-4e34-9ec0-3188b81f9f41@github.com>

On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). 
>> 
>> This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision:
> 
>  - Remove try_read
>  - Add explicit to single parameter constructors
>  - Remove superfluous access specifier
>  - Remove unused include
>  - Update assert message OMCache::set_monitor
>  - Fix indentation
>  - Remove outdated comment LightweightSynchronizer::exit
>  - Remove logStream include
>  - Remove strange comment
>  - Fix javaThread include

I have some suggestions that hopefully you can click on if you agree.  Also, some comments.

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 67:

> 65:     }
> 66:     static void* allocate_node(void* context, size_t size, Value const& value) {
> 67:       reinterpret_cast<ObjectMonitorWorld*>(context)->inc_table_count();

Suggestion:

      reinterpret_cast<ObjectMonitorWorld*>(context)->inc_items_count();

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 71:

> 69:     };
> 70:     static void free_node(void* context, void* memory, Value const& value) {
> 71:       reinterpret_cast<ObjectMonitorWorld*>(context)->dec_table_count();

Suggestion:

      reinterpret_cast<ObjectMonitorWorld*>(context)->dec_items_count();

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 125:

> 123:   };
> 124: 
> 125:   void inc_table_count() {

Suggestion:

  void inc_items_count() {

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 126:

> 124: 
> 125:   void inc_table_count() {
> 126:     Atomic::inc(&_table_count);

Suggestion:

    Atomic::inc(&_items_count);

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 129:

> 127:   }
> 128: 
> 129:   void dec_table_count() {

Suggestion:

  void dec_items_count() {

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 130:

> 128: 
> 129:   void dec_table_count() {
> 130:     Atomic::inc(&_table_count);

Suggestion:

    Atomic::inc(&_items_count);

src/hotspot/share/runtime/lightweightSynchronizer.cpp line 134:

> 132: 
> 133:   double get_load_factor() {
> 134:     return (double)_table_count/(double)_table_size;

Suggestion:

    return (double)_items_count/(double)_table_size;

-------------

PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2193868194
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688563846
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688563501
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688565196
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688565561
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688565947
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688566411
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688566752

From coleenp at openjdk.org  Tue Jul 23 19:09:43 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Tue, 23 Jul 2024 19:09:43 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6]
In-Reply-To: <7MDa4Z7FtvI5TG3rARV50PQckm3MSqOzBefku_lFwyc=.ead08ce2-1850-4803-a2eb-bd22cdcdd221@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
 <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com>
 <u2VLk8hKBH5V6331fMIPCwusNARMd_v-q_wL_7r0AOA=.99b9b9f1-ac37-4cb6-9ad0-4e019fe3c1fe@github.com>
 <7MDa4Z7FtvI5TG3rARV50PQckm3MSqOzBefku_lFwyc=.ead08ce2-1850-4803-a2eb-bd22cdcdd221@github.com>
Message-ID: <JrmblfUl8jxfWwZHI8MIO0V5OOIn4a0M0A6sWS6J08Y=.cc047802-93a4-49a4-b646-6201dbd4403b@github.com>

On Tue, 23 Jul 2024 12:34:45 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> I was thinking it was referring to `ObjectSynchronizer::waitUninterruptibly` added the same commit as the comment b3bf31a0a08da679ec2fd21613243fb17b1135a9
>
> git backout restored the old wrong comment.  We should fix this separately.

Suggestion:

    // If we were to use wait() instead of waitInterruptibly() then

>> I think I was thinking of the names as a prefix to refer to the `Count of the table` and `Size of the table`. And not the `Number of tables`. But I can see the confusion. 
>> 
>> `ConcurrentHashTable` tracks no statistics except for JFR which added some counters directly into the implementation. All statistics are for the users to manage, even if there are helpers for gather these statistics. 
>> 
>> The current implementation is based on what we do for the StringTable and SymbolTable
>
> In the other tables, it's called _items_count and it determines the load_factor for triggering concurrent work.  We should rename this field items_count to match, and also since it's consistent.

Suggestion:

  volatile size_t _items_count;

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1687990861
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688564267

From coleenp at openjdk.org  Tue Jul 23 19:09:45 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Tue, 23 Jul 2024 19:09:45 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v6]
In-Reply-To: <u2VLk8hKBH5V6331fMIPCwusNARMd_v-q_wL_7r0AOA=.99b9b9f1-ac37-4cb6-9ad0-4e019fe3c1fe@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <wRW8TABXS8LovbQ9qF8fosFD7FxYzpJdrG2LOvR6xDk=.19d62ec7-b2e4-41a1-8443-0480761288bf@github.com>
 <H1xx5Q5Wsuz3cl0FP1fwX4kL-jYdqbQ3skKwYcd54vo=.bd7abee8-0300-4253-a8b4-428ae8da1a0e@github.com>
 <u2VLk8hKBH5V6331fMIPCwusNARMd_v-q_wL_7r0AOA=.99b9b9f1-ac37-4cb6-9ad0-4e019fe3c1fe@github.com>
Message-ID: <C-YWboakmVWLsm8fDywpBlSsKQyiB31SXInOEh2qY5o=.c954ee95-79f4-48cf-8a49-5ebabd1325c7@github.com>

On Mon, 15 Jul 2024 00:44:31 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> src/hotspot/share/runtime/basicLock.cpp line 37:
>> 
>>> 35:     if (mon != nullptr) {
>>> 36:       mon->print_on(st);
>>> 37:     }
>> 
>> I am not sure if we wanted to do this, but we know the owner, therefore we could also look-up the OM from the table, and print it. It wouldn't have all that much to do with the BasicLock, though.
>
> Yeah maybe it is unwanted. Not sure how we should treat these prints of the frames. My thinking was that there is something in the cache, print it. But maybe just treating it as some internal data, maybe print "monitor { <Cached ObjectMonitor* address> }" or similar is better.

It seems generally useful to print the monitor in the cache if it's there.  I don't think we should do a table search here.  I think this looks fine as it is, and might be helpful for debugging if it turns out to be the wrong monitor.

>> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 80:
>> 
>>> 78: 
>>> 79:   ConcurrentTable* _table;
>>> 80:   volatile size_t _table_count;
>> 
>> Looks like a misnomer to me. We only have one table, but we do have N entries/nodes. This is counted when new nodes are allocated or old nodes are freed. Consider renaming this to '_entry_count' or '_node_count'? I'm actually a bit surprised if ConcurrentHashTable doesn't already track this...
>
> I think I was thinking of the names as a prefix to refer to the `Count of the table` and `Size of the table`. And not the `Number of tables`. But I can see the confusion. 
> 
> `ConcurrentHashTable` tracks no statistics except for JFR which added some counters directly into the implementation. All statistics are for the users to manage, even if there are helpers for gather these statistics. 
> 
> The current implementation is based on what we do for the StringTable and SymbolTable

In the other tables, it's called _items_count and it determines the load_factor for triggering concurrent work.  We should rename this field items_count to match, and also since it's consistent.

>> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 159:
>> 
>>> 157:   static size_t min_log_size() {
>>> 158:     // ~= log(AvgMonitorsPerThreadEstimate default)
>>> 159:     return 10;
>> 
>> Uh wait - are we assuming that threads hold 1024 monitors *on average* ? Isn't this a bit excessive? I would have thought maybe 8 monitors/thread. Yes there are workloads that are bonkers. Or maybe the comment/flag name does not say what I think it says.
>> 
>> Or why not use AvgMonitorsPerThreadEstimate directly?
>
> Maybe that is resonable. I believe I had that at some point but it had to deal with how to handle extreme values of `AvgMonitorsPerThreadEstimate` as well as what to do when `AvgMonitorsPerThreadEstimate` was disabled `=0`. One 4 / 8 KB allocation seems harmless.
> 
> But this was very arbitrary. This will probably be changed when/if the resizing of the table becomes more synchronised with deflation, allowing for shrinking the table.

Shrinking the table is NYI.  Maybe we should revisit this initial value then.

>> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 563:
>> 
>>> 561:     assert(locking_thread == current || locking_thread->is_obj_deopt_suspend(), "locking_thread may not run concurrently");
>>> 562:     if (_no_safepoint) {
>>> 563:       ::new (&_nsv) NoSafepointVerifier();
>> 
>> I'm thinking that it might be easier and cleaner to just re-do what the NoSafepointVerifier does? It just calls thread->inc/dec
>> _no_safepoint_count().
>
> I wanted to avoid having to add `NoSafepointVerifier` implementation details in the synchroniser code. I guess `ContinuationWrapper` already does this. 
> 
> Simply creating a `NoSafepointVerifier` when you expect no safepoint is more obvious to me, shows the intent better.

This looks strange to me also, but it's be better than changing the no_safepoint_count directly, since NSV handles when the current thread isn't a JavaThread, so you'd have to duplicate that in this VerifyThreadState code too.

    NoSafepointVerifier::NoSafepointVerifier() : _thread(Thread::current()) {
      if (_thread->is_Java_thread()) {
        JavaThread::cast(_thread)->inc_no_safepoint_count();
      }
    }

>> src/hotspot/share/runtime/lightweightSynchronizer.hpp line 68:
>> 
>>> 66:   static void exit(oop object, JavaThread* current);
>>> 67: 
>>> 68:   static ObjectMonitor* inflate_into_object_header(Thread* current, JavaThread* inflating_thread, oop object, const ObjectSynchronizer::InflateCause cause);
>> 
>> My IDE flags this with a warning 'Parameter 'cause' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions' *shrugs*
>
> Yeah. The only effect is has is that you cannot reassign the variable. It was the style taken from [synchronizer.hpp](https://github.com/openjdk/jdk/blob/15997bc3dfe9dddf21f20fa189f97291824892de/src/hotspot/share/runtime/synchronizer.hpp) where all `InflateCause` parameters are const.

Do you get this for inflate_fast_locked_object also?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688011833
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688162915
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688378429
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688385921
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688397480

From coleenp at openjdk.org  Tue Jul 23 19:09:47 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Tue, 23 Jul 2024 19:09:47 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <Wj5uaRxDmVYqDnt2V1PgErk7dI10LCro6WSfAm4Q6BU=.6fd91b51-ec40-438f-95a4-d2fbf593a288@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
 <Wj5uaRxDmVYqDnt2V1PgErk7dI10LCro6WSfAm4Q6BU=.6fd91b51-ec40-438f-95a4-d2fbf593a288@github.com>
Message-ID: <RmSdsqxnTjwB53zn49RvKRIRkIuz_jiRjOCwwLhEm-g=.6e9238c5-bbbb-4666-82b0-fef3235a12b6@github.com>

On Wed, 17 Jul 2024 06:35:34 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision:
>> 
>>  - Remove try_read
>>  - Add explicit to single parameter constructors
>>  - Remove superfluous access specifier
>>  - Remove unused include
>>  - Update assert message OMCache::set_monitor
>>  - Fix indentation
>>  - Remove outdated comment LightweightSynchronizer::exit
>>  - Remove logStream include
>>  - Remove strange comment
>>  - Fix javaThread include
>
> src/hotspot/share/runtime/basicLock.hpp line 44:
> 
>> 42:   // a sentinel zero value indicating a recursive stack-lock.
>> 43:   // * For LM_LIGHTWEIGHT
>> 44:   // Used as a cache the ObjectMonitor* used when locking. Must either
> 
> The first sentence doesn't read correctly.

Suggestion:

  // Used as a cache of the ObjectMonitor* used when locking. Must either

> src/hotspot/share/runtime/deoptimization.cpp line 1641:
> 
>> 1639:               assert(fr.is_deoptimized_frame(), "frame must be scheduled for deoptimization");
>> 1640:               if (LockingMode == LM_LEGACY) {
>> 1641:                 mon_info->lock()->set_displaced_header(markWord::unused_mark());
> 
> In the existing code how is this restricted to the LM_LEGACY case?? It appears to be unconditional which suggests you are changing the non-UOMT LM_LIGHTWEIGHT logic. ??

Only legacy locking uses the displaced header, I believe, which isn't clear in this code at all.  This seems like a fix.  We should probably assert that only legacy locking uses this field as a displaced header.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 62:
> 
>> 60: class ObjectMonitorWorld : public CHeapObj<MEMFLAGS::mtObjectMonitor> {
>> 61:   struct Config {
>> 62:     using Value = ObjectMonitor*;
> 
> Does this alias really help? We don't state the type that many times and it looks odd to end up with a mix of `Value` and `ObjectMonitor*` in the same code.

This alias is present in the other CHT implementations, alas as a typedef in StringTable and SymbolTable so this follows the pattern and allows cut/paste of the allocate_node, get_hash, and other functions.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 102:
> 
>> 100:       assert(*value != nullptr, "must be");
>> 101:       return (*value)->object_is_cleared();
>> 102:     }
> 
> The `is_dead` functions seem oddly placed given they do not relate to the object stored in the wrapper. Why are they here? And what is the difference between `object_is_cleared` and `object_is_dead` (as used by `LookupMonitor`) ?

This is a good question.  When we look up the Monitor, we don't want to find any that the GC has marked dead, so that's why we call object_is_dead.   When we look up with the object to find the Monitor, the object won't be dead (since we're using it to look up).  But we don't want to find one that we've cleared because the Monitor was  deflated?  I don't see where we would clear it though.  We clear the WeakHandle in the destructor after the Monitor has been removed from the table.

> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 105:
> 
>> 103:   };
>> 104: 
>> 105:   class LookupMonitor : public StackObj {
> 
> I'm not understanding why we need this little wrapper class.

It's a two way lookup.  The plain Lookup class is used to lookup the Monitor given the object.  This LookupMonitor class is used to lookup the object given the Monitor.  The CHT takes these wrapper classes.  Maybe we should rename LookupObject to be more clear?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688013308
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688041218
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688051557
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688375335
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688168626

From coleenp at openjdk.org  Tue Jul 23 19:09:48 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Tue, 23 Jul 2024 19:09:48 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <U3cg8IdnKu5Eeg-52muJuU0vEGJTRaX4jhKCOB3DVtk=.a1acc8fc-c3b7-4d38-ace8-dd39eff6c139@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
 <0Dwv0GUezG25Soj6iG3Ti4NCm_RQJdF7psmnDoUAdRU=.c38a44c6-f6e6-4e2a-84ef-45c32d145a13@github.com>
 <U3cg8IdnKu5Eeg-52muJuU0vEGJTRaX4jhKCOB3DVtk=.a1acc8fc-c3b7-4d38-ace8-dd39eff6c139@github.com>
Message-ID: <dxKBS7LJqVnSvlkQODDU3JXzgevL9LZ6cYGRZPj8Bmk=.38114de3-d511-43b5-b81a-fd686c13c0b8@github.com>

On Wed, 17 Jul 2024 06:40:31 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> src/hotspot/share/runtime/basicLock.hpp line 46:
>> 
>>> 44:   // Used as a cache the ObjectMonitor* used when locking. Must either
>>> 45:   // be nullptr or the ObjectMonitor* used when locking.
>>> 46:   volatile uintptr_t _metadata;
>> 
>> The displaced header/markword terminology was very well known to people, whereas "metadata" is really abstract - people will always need to go and find out what it actually refers to. Could we not define a union here to support the legacy and lightweight modes more explicitly and keep the existing terminology for the setters/getters for the code that uses it?
>
> I should have read ahead. I see you do keep the setters/getters.

When we remove legacy locking in a couple of releases, we could rename this field cached_monitor.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688015247

From coleenp at openjdk.org  Tue Jul 23 19:09:49 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Tue, 23 Jul 2024 19:09:49 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <OCv6QKq_A8dUaKUbnzSdEnlEqrMIcb6pUyLfObBFq-o=.1d78e62f-151c-403d-a291-fbab38c5f4d6@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
 <OCv6QKq_A8dUaKUbnzSdEnlEqrMIcb6pUyLfObBFq-o=.1d78e62f-151c-403d-a291-fbab38c5f4d6@github.com>
Message-ID: <3m5N_Fh65MVy7vRvO0wq3qFlzxjbCLHhbTBJe8OJorw=.eb61b3bd-5aca-45cd-8e88-389ae86a599b@github.com>

On Thu, 18 Jul 2024 11:30:27 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with 10 additional commits since the last revision:
>> 
>>  - Remove try_read
>>  - Add explicit to single parameter constructors
>>  - Remove superfluous access specifier
>>  - Remove unused include
>>  - Update assert message OMCache::set_monitor
>>  - Fix indentation
>>  - Remove outdated comment LightweightSynchronizer::exit
>>  - Remove logStream include
>>  - Remove strange comment
>>  - Fix javaThread include
>
> src/hotspot/share/runtime/lightweightSynchronizer.cpp line 77:
> 
>> 75:   using ConcurrentTable = ConcurrentHashTable<Config, MEMFLAGS::mtObjectMonitor>;
>> 76: 
>> 77:   ConcurrentTable* _table;
> 
> So you have a class ObjectMonitorWorld, which references the ConcurrentTable, which, internally also has its actual table. This is 3 dereferences to get to the actual table, if I counted correctly. I'd try to eliminate the outermost ObjectMonitorWorld class, or at least make it a global flat structure instead of a reference to a heap-allocated object. I think, because this is a structure that is global and would exist throughout the lifetime of the Java program anyway, it might be worth figuring out how to do the actual ConcurrentHashTable flat in the global structure, too.

This is a really good suggestion and might help a lot with the performance problems that we see with the table with heavily contended locking.  I think we should change this in a follow-on patch (which I'll work on).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688053792

From vlivanov at openjdk.org  Tue Jul 23 19:14:35 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 23 Jul 2024 19:14:35 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v5]
In-Reply-To: <QRaDiTIgGhnhvj1km2MOoIDYXKGjnzC04OoEkYgUrxU=.cdd1b266-2380-4c72-884a-163ef267be74@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <QRaDiTIgGhnhvj1km2MOoIDYXKGjnzC04OoEkYgUrxU=.cdd1b266-2380-4c72-884a-163ef267be74@github.com>
Message-ID: <UGTJYkZpXOFGtKUPy3EWk2VORIgoblwrEeXaymw_rZ4=.3c014718-7254-40f0-b332-8f5650b0ce9e@github.com>

On Mon, 22 Jul 2024 16:45:06 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4810:
>> 
>>> 4808:                                                          Label* L_success,
>>> 4809:                                                          Label* L_failure) {
>>> 4810:   // NB! Callers may assume that, when temp2_reg is a valid register,
>> 
>> Oh, that's a subtle point... Can we make it more evident at call sites?
>
> Done. I think the only code that still depends on it is the C2 pattern that uses check_klass_subtype_slow_path_linear in x86_32.ad and x86_64.ad.

Thanks. I revisited the code and now it seems like `temp2_reg_was_valid` duplicates `set_cond_codes` parameter in the original implementation. Am I missing something important here? Otherwise, why can't we rely on `set_cond_codes` flag instead?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1688579798

From vlivanov at openjdk.org  Tue Jul 23 19:18:34 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 23 Jul 2024 19:18:34 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v5]
In-Reply-To: <6P17gX_V6nL3hsgbuPrGN4Y8nzyoQMs3fTLaiRaOzwA=.e3eb0ea0-d41c-4222-a1f3-65f9075dbb4d@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <6P17gX_V6nL3hsgbuPrGN4Y8nzyoQMs3fTLaiRaOzwA=.e3eb0ea0-d41c-4222-a1f3-65f9075dbb4d@github.com>
Message-ID: <okwRz9n9XLujkQc_Il_J5otkYuUpAGaYRBg5Ln0tZNk=.bb0d230a-c4d3-46eb-a431-213bfc321b28@github.com>

On Mon, 22 Jul 2024 17:19:46 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review comments

FYI I did a merge with mainline and submitted the patch for testing.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2246099889

From vlivanov at openjdk.org  Tue Jul 23 19:18:34 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 23 Jul 2024 19:18:34 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v5]
In-Reply-To: <UAjH__AKdU3UMdJBkg7TlElKSA8mEFFE0MiElVrYexE=.4bc67a26-3383-4e4e-92b0-f1d3d33c5ce2@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com>
 <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com>
 <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com>
 <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com>
 <UAjH__AKdU3UMdJBkg7TlElKSA8mEFFE0MiElVrYexE=.4bc67a26-3383-4e4e-92b0-f1d3d33c5ce2@github.com>
Message-ID: <M5xQ14pzHdBEr7yAdAqIVUsY_o8tXUgN9HpKxjkZznw=.f2262137-2fec-4297-ae1e-89b11874266f@github.com>

On Mon, 22 Jul 2024 16:36:25 GMT, Andrew Haley <aph at openjdk.org> wrote:

>>> Alternatively, `Klass::is_subtype_of()` can unconditionally perform linear search over secondary_supers array.
>>> 
>>> Even though I very much like to see table lookup written in C++ (accompanying heavily optimized platform-specific MacroAssembler variants), it would make C++ runtime even simpler.
>> 
>> It would, but there is something to be said for being able to provide a fast "no" answer for interface membership. I'll agree it's probably not a huge difference. I guess `is_cloneable_fast()` exists only because searching the interfaces is slow.
>> Also, if table lookup is written in C++ but not used, it will rot.
>> Also also, `Klass::is_subtype_of()` is used for C1 runtime.
>
> Thinking about it some more, I don't really mind. There may be some virtue to moving lookup_secondary_supers_table() to a comment in the back end(s), and the expansion of population_count() is rather bloaty.

> Also also, Klass::is_subtype_of() is used for C1 runtime.

Can you elaborate, please? What I'm seeing in `Runtime1::generate_code_for()` for `slow_subtype_check` is a call into `MacroAssembler::check_klass_subtype_slow_path()`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1688586602

From vlivanov at openjdk.org  Tue Jul 23 19:21:33 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 23 Jul 2024 19:21:33 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v5]
In-Reply-To: <6P17gX_V6nL3hsgbuPrGN4Y8nzyoQMs3fTLaiRaOzwA=.e3eb0ea0-d41c-4222-a1f3-65f9075dbb4d@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <6P17gX_V6nL3hsgbuPrGN4Y8nzyoQMs3fTLaiRaOzwA=.e3eb0ea0-d41c-4222-a1f3-65f9075dbb4d@github.com>
Message-ID: <aKMZwN8ncRaVRHjYsexOaHM5VCdZJsnOpiuCAbuAxw0=.29e4b6de-883e-40ac-b4d1-a11b414aa1a9@github.com>

On Mon, 22 Jul 2024 17:19:46 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review comments

My take on the questions you raised.

> Should Klass::linear_search_secondary_supers() const call set_secondary_super_cache()? (Strong no from me. It's UB.)

Agree. I'm fine with addressing that separately (as I mentioned earlier).

> Should we use a straight linear search for secondary C++ supers in the runtime, i.e.not changing it for now?

Slightly in favor of keeping `Klass::is_subtype_of()` simple, but I'm fine with it either way.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2246108523

From coleenp at openjdk.org  Tue Jul 23 20:24:37 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Tue, 23 Jul 2024 20:24:37 GMT
Subject: RFR: 8315884: New Object to ObjectMonitor mapping [v9]
In-Reply-To: <RmSdsqxnTjwB53zn49RvKRIRkIuz_jiRjOCwwLhEm-g=.6e9238c5-bbbb-4666-82b0-fef3235a12b6@github.com>
References: <kDoJ_F8U3ie4XyLwRlIbwqaH2jyVUt61fMs8fsFDpA8=.23d22903-a08b-4f7d-a3e5-d65a98a1b6e0@github.com>
 <zu91N4ZznHQPPm9sqN2BI4wu2_xbh5LPYTGPgSwSfB4=.2e309b58-8feb-4d91-8236-275715854e51@github.com>
 <Wj5uaRxDmVYqDnt2V1PgErk7dI10LCro6WSfAm4Q6BU=.6fd91b51-ec40-438f-95a4-d2fbf593a288@github.com>
 <RmSdsqxnTjwB53zn49RvKRIRkIuz_jiRjOCwwLhEm-g=.6e9238c5-bbbb-4666-82b0-fef3235a12b6@github.com>
Message-ID: <zYO70HQfI8-f1CcQ_N7H0FtHecJdJ26uGt3DBJeMaYg=.9477511e-59c1-4eb4-9d7f-40c5e0b7555c@github.com>

On Tue, 23 Jul 2024 13:12:23 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> src/hotspot/share/runtime/deoptimization.cpp line 1641:
>> 
>>> 1639:               assert(fr.is_deoptimized_frame(), "frame must be scheduled for deoptimization");
>>> 1640:               if (LockingMode == LM_LEGACY) {
>>> 1641:                 mon_info->lock()->set_displaced_header(markWord::unused_mark());
>> 
>> In the existing code how is this restricted to the LM_LEGACY case?? It appears to be unconditional which suggests you are changing the non-UOMT LM_LIGHTWEIGHT logic. ??
>
> Only legacy locking uses the displaced header, I believe, which isn't clear in this code at all.  This seems like a fix.  We should probably assert that only legacy locking uses this field as a displaced header.

Update: yes, this code change does assert if you use BasicLock's displaced header for locking modes other than LM_LEGACY.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1688668887

From asmehra at openjdk.org  Tue Jul 23 21:50:58 2024
From: asmehra at openjdk.org (Ashutosh Mehra)
Date: Tue, 23 Jul 2024 21:50:58 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic
Message-ID: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>

Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031)

Testing:
  test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java
  test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java

-------------

Commit messages:
 - Update comments in tests to reflect new output format
 - 8337031: Improvements to CompilationMemoryStatistic

Changes: https://git.openjdk.org/jdk/pull/20304/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20304&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8337031
  Stats: 173 lines in 6 files changed: 77 ins; 21 del; 75 mod
  Patch: https://git.openjdk.org/jdk/pull/20304.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20304/head:pull/20304

PR: https://git.openjdk.org/jdk/pull/20304

From asmehra at openjdk.org  Tue Jul 23 21:50:58 2024
From: asmehra at openjdk.org (Ashutosh Mehra)
Date: Tue, 23 Jul 2024 21:50:58 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic
In-Reply-To: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
Message-ID: <Sfs3IHOQoH2YVhJBFpV6XWYC2AaZZQPq1foUBtTVlF0=.2f8dddab-a5bb-4a3b-a758-ec5dbb466f63@github.com>

On Tue, 23 Jul 2024 21:46:50 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote:

> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031)
> 
> Testing:
>   test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java
>   test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java

@tstuefe fyi

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20304#issuecomment-2246372292

From stuefe at openjdk.org  Wed Jul 24 06:32:35 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 24 Jul 2024 06:32:35 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v9]
In-Reply-To: <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com>
Message-ID: <n8iJ1xBSSKiRJDnwa1flyz8itZaVZdPYyT6Dmh0RuQU=.3e4a14b9-a5f2-4d73-8e3b-84c0d6f7012c@github.com>

On Tue, 23 Jul 2024 17:57:04 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits:
> 
>  - Merge master
>  - Fixing formatting
>  - Inlining buffer and making field private
>  - Reverting to functional changes in parserTests.cpp
>  - Error messaging format
>  - Fixing memory leak
>  - Fixing pointer style, s/NULL/nullptr, and exception
>  - Cleaning up parserTests.cpp
>  - Missing copyright header update
>  - Adding tests for file dcmd argument
>  - ... and 5 more: https://git.openjdk.org/jdk/compare/2f2223d7...52ca557d

src/hotspot/share/services/diagnosticArgument.cpp line 365:

> 363:     if (!_value.parse_value(str, len)) {
> 364:       stringStream error_msg;
> 365:       error_msg.print("Invalid file path: %s", str);

In all likelyhood the only reason Argument::copy_expand... is ever going to fail would be if the expanded string would not fit the buffer in FileArgument. I'd consider a clearer warning here, therefore ("File path invalid or too long: ")

src/hotspot/share/services/diagnosticCommand.cpp line 1018:

> 1016:   // of the default, not the actual default.
> 1017:   FileArgument file_arg = _filename.value();
> 1018:   const char *file = _filename.is_set() ? file_arg.get() : nullptr;

Style nit: const char*, not const char *

src/hotspot/share/services/diagnosticCommand.cpp line 1197:

> 1195: void SystemDumpMapDCmd::execute(DCmdSource source, TRAPS) {
> 1196:   FileArgument file_arg = _filename.value();
> 1197:   const char *name = file_arg.get();

pointer style

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1689204690
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1689206254
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1689208226

From galder at openjdk.org  Wed Jul 24 08:18:32 2024
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Wed, 24 Jul 2024 08:18:32 GMT
Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and
 Math.min(long,long)
In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com>
References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com>
Message-ID: <dM6XmjncreYrCU2rzCosr7BNsN96DI1w-qREFtC7h2s=.e73dea1a-0c97-4950-908d-636867153626@github.com>

On Tue, 9 Jul 2024 12:07:37 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance.
> 
> Currently vectorization does not kick in for loops containing either of these calls because of the following error:
> 
> 
> VLoop::check_preconditions: failed: control flow in loop not allowed
> 
> 
> The control flow is due to the java implementation for these methods, e.g.
> 
> 
> public static long max(long a, long b) {
>     return (a >= b) ? a : b;
> }
> 
> 
> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively.
> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization.
> E.g.
> 
> 
> SuperWord::transform_loop:
>     Loop: N518/N126  counted [int,int),+4 (1025 iters)  main has_sfpt strip_mined
>  518  CountedLoop  === 518 246 126  [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21)
> 
> 
> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1):
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR
>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>                                                          1     1     0     0
> ==============================
> TEST SUCCESS
> 
> long min   1155
> long max   1173
> 
> 
> After the patch, on darwin/aarch64 (M1):
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR
>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>                                                          1     1     0     0
> ==============================
> TEST SUCCESS
> 
> long min   1042
> long max   1042
> 
> 
> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes.
> Therefore, it still relies on the macro expansion to transform those into CMoveL.
> 
> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results:
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR
>    jtreg:test/hotspot/jtreg:tier1                     2500  2500     0     0
>>> jtreg:test/jdk:tier1                     ...

I've been working on some JMH benchmarks and I'm seeing some strange results that I need to investigate further. I will update the PR when I have found the reason(s).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2247189103

From tschatzl at openjdk.org  Wed Jul 24 08:38:31 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Wed, 24 Jul 2024 08:38:31 GMT
Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate
In-Reply-To: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com>
References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com>
Message-ID: <m1m4tg8JMMdfUAzclYXGgtfUI58nn18ZJ8u8XZxi0R8=.2413d07f-bd8c-45df-a7bd-3f0769e93e6b@github.com>

On Tue, 23 Jul 2024 14:11:20 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

> Simple obsoleting a Parallel GC product flag.

The flag needs to be added to the obsolete flags table too, not only removed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20299#issuecomment-2247230042

From mli at openjdk.org  Wed Jul 24 08:44:37 2024
From: mli at openjdk.org (Hamlin Li)
Date: Wed, 24 Jul 2024 08:44:37 GMT
Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations
 with SLEEF [v11]
In-Reply-To: <XSv3vwmQVv2abJLfDCmKsELqrY9d2Ohe_FemTMIzxXw=.3351b8ca-91d0-4a0b-9154-f758c293dcdc@github.com>
References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com>
 <6PPEFLvbIhR73kj_1lijO4yThv-Md3I3YbmyNTvbq1s=.5d7b03af-aedc-49a5-848c-1e9bc1e1ed4b@github.com>
 <ml4dF0TwSQdlURT8ETSAN9RVnx3iMIVNDCWedq8lc1Y=.6b3da39c-41fc-460e-8632-d5a42be279ab@github.com>
 <XSv3vwmQVv2abJLfDCmKsELqrY9d2Ohe_FemTMIzxXw=.3351b8ca-91d0-4a0b-9154-f758c293dcdc@github.com>
Message-ID: <xZpBFKhBc-4qqdhKoAxzHltOd-Pk3AuG6dWScvpqj64=.1feef605-1927-482b-a4a0-fcc6334f9a73@github.com>

On Tue, 23 Jul 2024 13:55:06 GMT, fitzsim <duke at openjdk.org> wrote:

> To check this, I [added](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-2/) the `riscv64` `CMake` steps to `SleefCommon.gmk`.
> 
> I had intended to factor out `SetupSleefHeader` anyway for `aarch64`, to eliminate copy-n-paste.
> 
> After that, there was one build step divergence for `riscv64` for the naming of the helper header.
> 
> The two `riscv64` commits are:
> 
> * [copy `helperrvv.h`](https://github.com/fitzsim/jdk/commit/bcd3813ca97f6308838ee93bcb5c02d9cd37375a)
> * [add `riscv64` support to `SleefCommon.gmk`](https://github.com/fitzsim/jdk/commit/21e0369682095422f45015d817410d07c711b8c0)

Thanks for your effort, this is much better.

Just one question in my mind. If there is no major refactoring in sleef in the future, I think we're fine. In case there is such refactoring in sleef's implementation, the maintanance will not be a minor work, as in [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) we need to migrate some process inside sleef into jdk?
But I'm not sure, maybe others can comment on this question.

And I think we can move the discussion about [This branch](https://github.com/fitzsim/jdk/commits/regenerate-sleef-headers-1/) to https://github.com/openjdk/jdk/pull/19185, as finally this part of code will be pushed into jdk via that pr (because of legal process reason), I hope persons involved in that pr do not miss the discussion and information here.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2247244342

From aph at openjdk.org  Wed Jul 24 09:05:38 2024
From: aph at openjdk.org (Andrew Haley)
Date: Wed, 24 Jul 2024 09:05:38 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v5]
In-Reply-To: <M5xQ14pzHdBEr7yAdAqIVUsY_o8tXUgN9HpKxjkZznw=.f2262137-2fec-4297-ae1e-89b11874266f@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com>
 <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com>
 <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com>
 <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com>
 <UAjH__AKdU3UMdJBkg7TlElKSA8mEFFE0MiElVrYexE=.4bc67a26-3383-4e4e-92b0-f1d3d33c5ce2@github.com>
 <M5xQ14pzHdBEr7yAdAqIVUsY_o8tXUgN9HpKxjkZznw=.f2262137-2fec-4297-ae1e-89b11874266f@github.com>
Message-ID: <YxBy1Mx7Di5EDfJkCTfcaIuTzCv5KdzBzKMcE3iIeak=.2a56f436-8e14-4a22-a85d-cd06209e2c01@github.com>

On Tue, 23 Jul 2024 19:14:57 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> > Also also, Klass::is_subtype_of() is used for C1 runtime.
> 
> Can you elaborate, please?

Sorry, that was rather vague. In C1-compiled code, the Java method `Class::isInstance(Object)`calls `Klass::is_subtype_of()`. 

In general, I find it difficult to decide how much work, if any, should be done to improve C1 performance. Clearly, if C1 exists only to help with startup time in a tiered compilation system, the answer is "not much".

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1689414385

From ayang at openjdk.org  Wed Jul 24 09:11:13 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Wed, 24 Jul 2024 09:11:13 GMT
Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate [v2]
In-Reply-To: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com>
References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com>
Message-ID: <VQeOV8bJxKRoDHOg5MkGa8ukguwU0SaiB3SpL3gq3_g=.4b4386f8-fc8e-4bd7-ac15-c089c59fb05c@github.com>

> Simple obsoleting a Parallel GC product flag.

Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision:

  review

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20299/files
  - new: https://git.openjdk.org/jdk/pull/20299/files/59f96d13..10720a6d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20299&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20299&range=00-01

  Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20299.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20299/head:pull/20299

PR: https://git.openjdk.org/jdk/pull/20299

From rrich at openjdk.org  Wed Jul 24 09:21:30 2024
From: rrich at openjdk.org (Richard Reingruber)
Date: Wed, 24 Jul 2024 09:21:30 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap'
In-Reply-To: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
Message-ID: <8nHc_TnY9HDJBPodhK-8koS35dtVS7H-dBXZQCosz9A=.6e8ceb41-d795-46b7-8b05-c74416e9a313@github.com>

On Tue, 23 Jul 2024 09:49:38 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> When running with ubsan - enabled binaries, some tests trigger the following report :
> 
> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
>     #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91
>     #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106
>     #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157
>     #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63
>     #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85
>     #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319
>     #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233
>     #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199
> 
> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr .

Hi Matthias,

I think this is intended. No instances of SmallRegisterMap are ever created.

Instead [SmallRegisterMap::instance](https://github.com/openjdk/jdk/blob/5b4824cf9aba297fa6873ebdadc0e9545647e90d/src/hotspot/cpu/x86/smallRegisterMap_x86.inline.hpp#L34C20-L34C36) is used:

```C++
  static constexpr SmallRegisterMap* instance = nullptr;


The type is the only information that is actually used.

I guess you could fix the undefined behavior by replacing all uses of SmallRegisterMap::instance with the address of a stack allocated temporary SmallRegisterMap().

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2247337815

From thartmann at openjdk.org  Wed Jul 24 10:34:59 2024
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Wed, 24 Jul 2024 10:34:59 GMT
Subject: RFR: 8336999: Verification for resource area allocated data structures
 in C2
Message-ID: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>

Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption.

This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime.

While testing, I hit the verification code from:

V  [libjvm.so+0x5c1ceb]  ReallocMark::check(Arena*)+0x7b  (allocation.cpp:244)
V  [libjvm.so+0x6df2da]  Block_Array::grow(unsigned int)+0x1a  (block.cpp:43)
V  [libjvm.so+0xb88679]  PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159  (block.hpp:72)
V  [libjvm.so+0xb88b6b]  PhaseCFG::build_dominator_tree()+0xab  (domgraph.cpp:74)
V  [libjvm.so+0xd75791]  PhaseCFG::do_global_code_motion()+0x11  (gcm.cpp:1635)
V  [libjvm.so+0x9f4fd4]  Compile::Code_Gen()+0x2a4  (compile.cpp:2949)
V  [libjvm.so+0x9f5f16]  Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6  (compile.cpp:991)


It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing.

This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`.

We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015).

Thanks,
Tobias

-------------

Commit messages:
 - Reverted fix for 8336095
 - Small fix + refactoring
 - First prototype - tier1-3 pass

Changes: https://git.openjdk.org/jdk/pull/20311/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20311&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8336999
  Stats: 56 lines in 14 files changed: 33 ins; 8 del; 15 mod
  Patch: https://git.openjdk.org/jdk/pull/20311.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20311/head:pull/20311

PR: https://git.openjdk.org/jdk/pull/20311

From kevinw at openjdk.org  Wed Jul 24 10:40:36 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Wed, 24 Jul 2024 10:40:36 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v9]
In-Reply-To: <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com>
Message-ID: <uZeEnxjF6MkDWWOYSdfSUsP9VHParHx7dRweSXjaeM0=.3f3654c8-bf79-4cf3-88de-ba5530276cd9@github.com>

On Tue, 23 Jul 2024 17:57:04 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits:
> 
>  - Merge master
>  - Fixing formatting
>  - Inlining buffer and making field private
>  - Reverting to functional changes in parserTests.cpp
>  - Error messaging format
>  - Fixing memory leak
>  - Fixing pointer style, s/NULL/nullptr, and exception
>  - Cleaning up parserTests.cpp
>  - Missing copyright header update
>  - Adding tests for file dcmd argument
>  - ... and 5 more: https://git.openjdk.org/jdk/compare/2f2223d7...52ca557d

Is the help output working OK?
Do these commands' help outputs show the new %p filename?
I think it's good that they would.  We should expect users of these commands to implicity understand a %p although we can still explain it, e.g. in a separate update in the man page.

I just think we should be explicit that the help output is changing.

In src/hotspot/share/runtime/java.cpp:    if (DumpPerfMapAtExit) { CodeCache::write_perf_map(....

It may need to pass DEFAULT_PERFMAP_FILENAME (and tty).

Do you have the change from JDK-8327054 merged into this branch?

src/hotspot/share/services/diagnosticArgument.hpp line 65:

> 63: class FileArgument {
> 64:   private:
> 65:     char _name[1024];

Probably JVM_MAXPATHLEN (which might also be 1024).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2247558224
PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2247560119
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1689545224

From stuefe at openjdk.org  Wed Jul 24 10:47:32 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 24 Jul 2024 10:47:32 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic
In-Reply-To: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
Message-ID: <yqXqOJCnOBQToVnGiTvMv9SRVECCZuArbWqfiVEj6VE=.eb63b66f-63a5-4c51-8757-87f2694afd98@github.com>

On Tue, 23 Jul 2024 21:46:50 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote:

> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031)
> 
> Testing:
>   test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java
>   test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java

I plan to look at this later this week.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20304#issuecomment-2247575688

From thartmann at openjdk.org  Wed Jul 24 12:08:32 2024
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Wed, 24 Jul 2024 12:08:32 GMT
Subject: RFR: 8336999: Verification for resource area allocated data
 structures in C2
In-Reply-To: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
Message-ID: <-ivSR7ytOee0BB9h1RzdWBVUZyyW9G2ANy1gLpyFCSE=.598ece24-fcc9-447e-8f0f-bc5df7bfe903@github.com>

On Wed, 24 Jul 2024 10:29:32 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

> Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption.
> 
> This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime.
> 
> While testing, I hit the verification code from:
> 
> V  [libjvm.so+0x5c1ceb]  ReallocMark::check(Arena*)+0x7b  (allocation.cpp:244)
> V  [libjvm.so+0x6df2da]  Block_Array::grow(unsigned int)+0x1a  (block.cpp:43)
> V  [libjvm.so+0xb88679]  PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159  (block.hpp:72)
> V  [libjvm.so+0xb88b6b]  PhaseCFG::build_dominator_tree()+0xab  (domgraph.cpp:74)
> V  [libjvm.so+0xd75791]  PhaseCFG::do_global_code_motion()+0x11  (gcm.cpp:1635)
> V  [libjvm.so+0x9f4fd4]  Compile::Code_Gen()+0x2a4  (compile.cpp:2949)
> V  [libjvm.so+0x9f5f16]  Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6  (compile.cpp:991)
> 
> 
> It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing.
> 
> This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`.
> 
> We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015).
> 
> Thanks,
> Tobias

Github testing failed because [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) is not yet integrated. Testing with the fix all passed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20311#issuecomment-2247742221

From mbaesken at openjdk.org  Wed Jul 24 12:57:46 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Wed, 24 Jul 2024 12:57:46 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap' [v2]
In-Reply-To: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
Message-ID: <OMBPXRc-bFS9BXXLhEKgP7VMhvji7SNPi3j-WPV5zx4=.1b8919be-6c92-4328-b4f7-ea5c367a4731@github.com>

> When running with ubsan - enabled binaries, some tests trigger the following report :
> 
> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
>     #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91
>     #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106
>     #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157
>     #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63
>     #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85
>     #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319
>     #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233
>     #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199
> 
> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr .

Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:

  avoid the nullptr checks, this causes aserts in jtreg tests

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20296/files
  - new: https://git.openjdk.org/jdk/pull/20296/files/6e063a11..436648c2

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20296&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20296&range=00-01

  Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20296.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20296/head:pull/20296

PR: https://git.openjdk.org/jdk/pull/20296

From mbaesken at openjdk.org  Wed Jul 24 13:14:32 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Wed, 24 Jul 2024 13:14:32 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap'
In-Reply-To: <8nHc_TnY9HDJBPodhK-8koS35dtVS7H-dBXZQCosz9A=.6e8ceb41-d795-46b7-8b05-c74416e9a313@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
 <8nHc_TnY9HDJBPodhK-8koS35dtVS7H-dBXZQCosz9A=.6e8ceb41-d795-46b7-8b05-c74416e9a313@github.com>
Message-ID: <hm7KZuwHxMUaRLOqwzOzbxuSasnYmlgDkjzODF92bw4=.6e9ab6d2-8529-4cfc-8a5f-f3e92fb1c6a4@github.com>

On Wed, 24 Jul 2024 09:19:22 GMT, Richard Reingruber <rrich at openjdk.org> wrote:

> I think this is intended. No instances of SmallRegisterMap are ever created.
Probably it is better just to switch off ubsan for the method, if it's intended.

On the other hand for SmallRegisterMap  in_cont  returns always false (so we could at least avoid this  `reg_map->in_cont()`  call for SmallRegisterMap) 
cpu/aarch64/smallRegisterMap_aarch64.inline.hpp:80:  bool in_cont()       const { return false; }
cpu/arm/smallRegisterMap_arm.inline.hpp:73:  bool in_cont()       const { return false; }
cpu/ppc/smallRegisterMap_ppc.inline.hpp:79:  bool in_cont()       const { return false; }
cpu/s390/smallRegisterMap_s390.inline.hpp:73:  bool in_cont()       const { return false; }
cpu/x86/smallRegisterMap_x86.inline.hpp:80:  bool in_cont()       const { return false; }
cpu/zero/smallRegisterMap_zero.inline.hpp:73:  bool in_cont()       const { return false; }
cpu/riscv/smallRegisterMap_riscv.inline.hpp:80:  bool in_cont()       const { return false; }

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2247888008

From mbaesken at openjdk.org  Wed Jul 24 13:59:44 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Wed, 24 Jul 2024 13:59:44 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap' [v3]
In-Reply-To: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
Message-ID: <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com>

> When running with ubsan - enabled binaries, some tests trigger the following report :
> 
> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
>     #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91
>     #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106
>     #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157
>     #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63
>     #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85
>     #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319
>     #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233
>     #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199
> 
> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr .

Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:

  ATTRIBUTE_NO_UBSAN must be after template typename ...

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20296/files
  - new: https://git.openjdk.org/jdk/pull/20296/files/436648c2..390a2176

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20296&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20296&range=01-02

  Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20296.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20296/head:pull/20296

PR: https://git.openjdk.org/jdk/pull/20296

From mbaesken at openjdk.org  Wed Jul 24 14:13:31 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Wed, 24 Jul 2024 14:13:31 GMT
Subject: RFR: 8333144: docker tests do not work when ubsan is configured
In-Reply-To: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
Message-ID: <-jpIPJPQJV4xnJ0Oz-wme6x6vmyGwYVN7GtLBbp3YDM=.133e895a-d953-43f5-ad55-1b909294ad23@github.com>

On Wed, 26 Jun 2024 13:32:32 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work.
> 
> We find this in the test output
> 
> [STDOUT]
> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory
> 
> The container where the test is executed does not contain the ubsan package; we might skip the test in this case.

Hello, any input on this ?
Currently the situation with the jtreg tests is not good when running ubsan-enabled binaries .

So we have to check for ubsan (see the PR for an example) or install  ubsan into the docker (or podman or ...) container .

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2248077273

From aph at openjdk.org  Wed Jul 24 14:34:09 2024
From: aph at openjdk.org (Andrew Haley)
Date: Wed, 24 Jul 2024 14:34:09 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v6]
In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
Message-ID: <ClSaxeO-M405jBL-Y6YELu6xofwt7R8kp6PEdalyw_8=.c23b3958-85f4-4265-9744-125a286d75db@github.com>

> This patch expands the use of a hash table for secondary superclasses
> to the interpreter, C1, and runtime. It also adds a C2 implementation
> of hashed lookup in cases where the superclass isn't known at compile
> time.
> 
> HotSpot shared runtime
> ----------------------
> 
> Building hashed secondary tables is now unconditional. It takes very
> little time, and now that the shared runtime always has the tables, it
> might as well take advantage of them. The shared code is easier to
> follow now, I think.
> 
> There might be a performance issue with x86-64 in that we build
> HotSpot for a default x86-64 target that does not support popcount.
> This means that HotSpot C++ runtime on x86 always uses a software
> emulation for popcount, even though the vast majority of machines made
> for the past 20 years can do popcount in a single instruction. It
> wouldn't be terribly hard to do something about that.
> 
> Having said that, the software popcount is really not bad.
> 
> x86
> ---
> 
> x86 is rather tricky, because we still support
> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
> well as 32- and 64-bit ports. There's some further complication in
> that only `RCX` can be used as a shift count, so there's some register
> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
> rather gnarly, with multiple levels of conditionals at compile time
> and runtime.
> 
> AArch64
> -------
> 
> AArch64 is considerably more straightforward. We always have a
> popcount instruction and (thankfully) no 32-bit code to worry about.
> 
> Generally
> ---------
> 
> I would dearly love simply to rip out the "old" secondary supers cache
> support, but I've left it in just in case someone has a performance
> regression.
> 
> The versions of `MacroAssembler::lookup_secondary_supers_table` that
> work with variable superclasses don't take a fixed set of temp
> registers, and neither do they call out to to a slow path subroutine.
> Instead, the slow patch is expanded inline.
> 
> I don't think this is necessarily bad. Apart from the very rare cases
> where C2 can't determine the superclass to search for at compile time,
> this code is only used for generating stubs, and it seemed to me
> ridiculous to have stubs calling other stubs.
> 
> I've followed the guidance from @iwanowww not to obsess too much about
> the performance of C1-compiled secondary supers lookups, and to prefer
> simplicity over absolute performance. Nonetheless, this is a
> complicated patch that touches many areas.

Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:

  Review comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19989/files
  - new: https://git.openjdk.org/jdk/pull/19989/files/02cfd130..48e80a13

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=04-05

  Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/19989.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989

PR: https://git.openjdk.org/jdk/pull/19989

From aph at openjdk.org  Wed Jul 24 14:34:09 2024
From: aph at openjdk.org (Andrew Haley)
Date: Wed, 24 Jul 2024 14:34:09 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v6]
In-Reply-To: <Ct5EunuM4nq5EUa-kDtCzKs-O4Z_wEnMq2_5W7GPaeY=.f475ee5b-3bea-48a7-97d1-7f71287e4fc9@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <wgY2erz716MCi6K6DcUKEqLyd6E82ArMlba9qHdAA9o=.de21daa5-b078-4469-a6eb-df548f699f65@github.com>
 <Ct5EunuM4nq5EUa-kDtCzKs-O4Z_wEnMq2_5W7GPaeY=.f475ee5b-3bea-48a7-97d1-7f71287e4fc9@github.com>
Message-ID: <TxLB7H7lM8c1e-Hc5PvGAiuil1YKOfWqg_EJUwFp4O8=.554e044f-6e2a-4216-96cc-9d55b309280d@github.com>

On Tue, 23 Jul 2024 19:00:02 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> What happens when users include `klass.hpp`, but not `klass.inline.hpp`? How does it affect generated code?
> 
> I suspect that `Klass::search_secondary_supers()` won't be inlinined in such case.

That is true. I can't tell from this exchange whether you think it should. The same "wont be inlined" fact is also true of everything else in `klass.inline.hpp`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1689927770

From tschatzl at openjdk.org  Wed Jul 24 14:38:32 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Wed, 24 Jul 2024 14:38:32 GMT
Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate [v2]
In-Reply-To: <VQeOV8bJxKRoDHOg5MkGa8ukguwU0SaiB3SpL3gq3_g=.4b4386f8-fc8e-4bd7-ac15-c089c59fb05c@github.com>
References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com>
 <VQeOV8bJxKRoDHOg5MkGa8ukguwU0SaiB3SpL3gq3_g=.4b4386f8-fc8e-4bd7-ac15-c089c59fb05c@github.com>
Message-ID: <NHXk2evSAldw2eag8lB8erd2dVVoikl6Dm1dJOgCdu8=.cf0a7fea-6ca1-4ddb-a874-8f8860541127@github.com>

On Wed, 24 Jul 2024 09:11:13 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> Simple obsoleting a Parallel GC product flag.
>
> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review

Marked as reviewed by tschatzl (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20299#pullrequestreview-2196941028

From mbaesken at openjdk.org  Wed Jul 24 14:41:32 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Wed, 24 Jul 2024 14:41:32 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap' [v3]
In-Reply-To: <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
 <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com>
Message-ID: <QSl_zKdSoRYcDRQDiF5XlB9VxJWoL9il_rGqEO5ypbA=.56b1949d-5489-4a20-949d-86c3f10c9645@github.com>

On Wed, 24 Jul 2024 13:59:44 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> When running with ubsan - enabled binaries, some tests trigger the following report :
>> 
>> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
>>     #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91
>>     #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106
>>     #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157
>>     #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63
>>     #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85
>>     #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319
>>     #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233
>>     #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199
>> 
>> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr .
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   ATTRIBUTE_NO_UBSAN must be after template typename ...

When using the `ATTRIBUTE_NO_UBSAN` for  `frame::oopmapreg_to_location`  , we unfortunately run into another similar looking issue 
(e.g. when running jtreg test  java/net/vthread/HttpALot.java)

/jdk/src/hotspot/share/runtime/stackChunkFrameStream.inline.hpp:286:46: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
    #0 0x7febd955502d in void* StackChunkFrameStream<(ChunkFrames)1>::reg_to_loc<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/stackChunkFrameStream.inline.hpp:286
    #1 0x7febd955502d in void StackChunkFrameStream<(ChunkFrames)1>::iterate_oops<BarrierClosure<(stackChunkOopDesc::BarrierType)1, true>, SmallRegisterMap>(BarrierClosure<(stackChunkOopDesc::BarrierType)1, true>*, SmallRegisterMap const*) const src/hotspot/share/runtime/stackChunkFrameStream.inline.hpp:373
    #2 0x7febd955502d in void stackChunkOopDesc::do_barriers0<(stackChunkOopDesc::BarrierType)1, (ChunkFrames)1, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)1> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:375
    #3 0x7febd75a2121 in void stackChunkOopDesc::do_barriers<(stackChunkOopDesc::BarrierType)1, (ChunkFrames)1, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)1> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.inline.hpp:193
    #4 0x7febd75a2121 in ThawBase::recurse_thaw_compiled_frame(frame const&, frame&, int, bool) src/hotspot/share/runtime/continuationFreezeThaw.cpp:2246
    #5 0x7febd75a1f60 in bool ThawBase::recurse_thaw_java_frame<ContinuationHelper::CompiledFrame>(frame&, int) src/hotspot/share/runtime/continuationFreezeThaw.cpp:2092
    #6 0x7febd75a1f60 in ThawBase::recurse_thaw_compiled_frame(frame const&, frame&, int, bool) src/hotspot/share/runtime/continuationFreezeThaw.cpp:2249
    #7 0x7febd75a6aca in ThawBase::thaw_slow(stackChunkOopDesc*, bool) src/hotspot/share/runtime/continuationFreezeThaw.cpp:2040
    #8 0x7febd75aa156 in Thaw<Config<(oop_kind)0, G1BarrierSet> >::thaw(Continuation::thaw_kind) src/hotspot/share/runtime/continuationFreezeThaw.cpp:1825
    #9 0x7febd75aa156 in thaw_internal<Config<(oop_kind)0, G1BarrierSet> > src/hotspot/share/runtime/continuationFreezeThaw.cpp:2450
    #10 0x7febd75aa156 in Config<(oop_kind)0, G1BarrierSet>::thaw(JavaThread*, Continuation::thaw_kind) src/hotspot/share/runtime/continuationFreezeThaw.cpp:276
    #11 0x7febd75aa156 in thaw<Config<(oop_kind)0, G1BarrierSet> > src/hotspot/share/runtime/continuationFreezeThaw.cpp:253
    #12 0x7febbb89c526  (<unknown module>)


this time it is the map->location call through a nullptr 


template <ChunkFrames frame_kind>
template <typename RegisterMapT>
inline void* StackChunkFrameStream<frame_kind>::reg_to_loc(VMReg reg, const RegisterMapT* map) const {
  assert(!is_done(), "");
  return reg->is_reg() ? (void*)map->location(reg, sp()) // see frame::update_map_with_saved_link(&map, link_addr);
                       : (void*)((address)unextended_sp() + (reg->reg2stack() * VMRegImpl::stack_slot_size));
}

But SmallRegisterMap::location is for some platforms even UnImplemented so how does this work cross platform ?

https://github.com/openjdk/jdk/blob/332df83e7cb1f272c08f8e4955d6abaf3f091ace/src/hotspot/cpu/arm/smallRegisterMap_arm.inline.hpp#L56

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2248182106

From aph at openjdk.org  Wed Jul 24 15:54:35 2024
From: aph at openjdk.org (Andrew Haley)
Date: Wed, 24 Jul 2024 15:54:35 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v6]
In-Reply-To: <TxLB7H7lM8c1e-Hc5PvGAiuil1YKOfWqg_EJUwFp4O8=.554e044f-6e2a-4216-96cc-9d55b309280d@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <wgY2erz716MCi6K6DcUKEqLyd6E82ArMlba9qHdAA9o=.de21daa5-b078-4469-a6eb-df548f699f65@github.com>
 <Ct5EunuM4nq5EUa-kDtCzKs-O4Z_wEnMq2_5W7GPaeY=.f475ee5b-3bea-48a7-97d1-7f71287e4fc9@github.com>
 <TxLB7H7lM8c1e-Hc5PvGAiuil1YKOfWqg_EJUwFp4O8=.554e044f-6e2a-4216-96cc-9d55b309280d@github.com>
Message-ID: <sYvOt6BDxFIPAczwoEop5-nUNHQeOi-IH2hGlSVL0ww=.8f6ed07f-0251-44c9-a3a0-0742dabbc15c@github.com>

On Wed, 24 Jul 2024 14:29:09 GMT, Andrew Haley <aph at openjdk.org> wrote:

> I suspect that Klass::search_secondary_supers() won't be inlinined in such case.

That's true, but it's true of every other function in that file. Is it not deliberate?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1690061017

From rrich at openjdk.org  Wed Jul 24 16:13:31 2024
From: rrich at openjdk.org (Richard Reingruber)
Date: Wed, 24 Jul 2024 16:13:31 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap' [v3]
In-Reply-To: <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
 <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com>
Message-ID: <KbPe2MYqawbLoL6e-pHuh6OtViYi0Uy6UPbOpi5YHMw=.c587eee4-6e5a-401e-b265-16192ea4f893@github.com>

On Wed, 24 Jul 2024 13:59:44 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> When running with ubsan - enabled binaries, some tests trigger the following report :
>> 
>> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
>>     #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91
>>     #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106
>>     #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157
>>     #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63
>>     #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85
>>     #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319
>>     #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233
>>     #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199
>> 
>> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr .
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   ATTRIBUTE_NO_UBSAN must be after template typename ...

Continuations (-XX:+VMContinuations) haven't been ported to Arm32. `SmallRegisterMap` depends on it.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2248407753

From aph at openjdk.org  Wed Jul 24 16:17:34 2024
From: aph at openjdk.org (Andrew Haley)
Date: Wed, 24 Jul 2024 16:17:34 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v6]
In-Reply-To: <sYvOt6BDxFIPAczwoEop5-nUNHQeOi-IH2hGlSVL0ww=.8f6ed07f-0251-44c9-a3a0-0742dabbc15c@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <wgY2erz716MCi6K6DcUKEqLyd6E82ArMlba9qHdAA9o=.de21daa5-b078-4469-a6eb-df548f699f65@github.com>
 <Ct5EunuM4nq5EUa-kDtCzKs-O4Z_wEnMq2_5W7GPaeY=.f475ee5b-3bea-48a7-97d1-7f71287e4fc9@github.com>
 <TxLB7H7lM8c1e-Hc5PvGAiuil1YKOfWqg_EJUwFp4O8=.554e044f-6e2a-4216-96cc-9d55b309280d@github.com>
 <sYvOt6BDxFIPAczwoEop5-nUNHQeOi-IH2hGlSVL0ww=.8f6ed07f-0251-44c9-a3a0-0742dabbc15c@github.com>
Message-ID: <eMrlgijA4K3kj9F7-cj1RBWRQg0rc9faF13SR9UdEys=.4ca5c8f7-6462-4adf-8160-91c018985822@github.com>

On Wed, 24 Jul 2024 15:51:26 GMT, Andrew Haley <aph at openjdk.org> wrote:

>>> What happens when users include `klass.hpp`, but not `klass.inline.hpp`? How does it affect generated code?
>>> 
>>> I suspect that `Klass::search_secondary_supers()` won't be inlinined in such case.
>> 
>> That is true. I can't tell from this exchange whether you think it should. The same "wont be inlined" fact is also true of everything else in `klass.inline.hpp`.
>
>> I suspect that Klass::search_secondary_supers() won't be inlinined in such case.
> 
> That's true, but it's true of every other function in that file. Is it not deliberate?

FYI, somewhat related: AArch64 GCC inlines `lookup_secondary_supers_table()` 237 times (it's only a few instructions.)
x86-64 GCC, because it doesn't use a popcount intrinsic, decides that `lookup_secondary_supers_table()` is too big to be worth inlining in all but 3 cases. So the right thing happens, I think: where we can profit from fast lookups without bloating the runtime, we do.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1690096760

From szaldana at openjdk.org  Wed Jul 24 17:57:59 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Wed, 24 Jul 2024 17:57:59 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v9]
In-Reply-To: <uZeEnxjF6MkDWWOYSdfSUsP9VHParHx7dRweSXjaeM0=.3f3654c8-bf79-4cf3-88de-ba5530276cd9@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com>
 <uZeEnxjF6MkDWWOYSdfSUsP9VHParHx7dRweSXjaeM0=.3f3654c8-bf79-4cf3-88de-ba5530276cd9@github.com>
Message-ID: <tkO2QN5Nzk3njsiyCgolhdy7fzZ26PfDHe44LK3vUf8=.33b8e759-8725-4333-93ae-9d2a14c523b5@github.com>

On Wed, 24 Jul 2024 10:35:35 GMT, Kevin Walls <kevinw at openjdk.org> wrote:

>> Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits:
>> 
>>  - Merge master
>>  - Fixing formatting
>>  - Inlining buffer and making field private
>>  - Reverting to functional changes in parserTests.cpp
>>  - Error messaging format
>>  - Fixing memory leak
>>  - Fixing pointer style, s/NULL/nullptr, and exception
>>  - Cleaning up parserTests.cpp
>>  - Missing copyright header update
>>  - Adding tests for file dcmd argument
>>  - ... and 5 more: https://git.openjdk.org/jdk/compare/2f2223d7...52ca557d
>
> src/hotspot/share/services/diagnosticArgument.hpp line 65:
> 
>> 63: class FileArgument {
>> 64:   private:
>> 65:     char _name[1024];
> 
> Probably JVM_MAXPATHLEN (which might also be 1024).

Hi, I avoided JVM_MAXPATHLEN because of this comment https://github.com/openjdk/jdk/pull/20198#discussion_r1685297940

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1690215958

From szaldana at openjdk.org  Wed Jul 24 17:57:58 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Wed, 24 Jul 2024 17:57:58 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v10]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <M2iFmQIvcLoIJSBEaHBKzQs8Tku6MDwuJFJtKTYgB9I=.da1c7e91-c22a-47ca-a944-1ad8a489d920@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  fixing pointer style

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/52ca557d..dc1bfe1d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=09
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=08-09

  Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From szaldana at openjdk.org  Wed Jul 24 18:05:49 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Wed, 24 Jul 2024 18:05:49 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v11]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <5KMQ16ZAAx9VI9fpawUyrvQSqll5wzs9lCC1bRL62ow=.34c7def9-ab85-42b7-af4b-a433a5b5cf2c@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  Updating help text for VM.cds

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/dc1bfe1d..34e3f80a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=10
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=09-10

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From wkemper at openjdk.org  Wed Jul 24 18:12:40 2024
From: wkemper at openjdk.org (William Kemper)
Date: Wed, 24 Jul 2024 18:12:40 GMT
Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode
Message-ID: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>

We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development.

-------------

Commit messages:
 - Remove last vestiges of incremental update mode
 - Missed test, remove actual IU barrier flag
 - Remove missed iu_barrier usages for C1
 - Update test (all barriers can be enabled now for all modes)
 - WIP: Remove incremental update mode

Changes: https://git.openjdk.org/jdk/pull/20316/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20316&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8336685
  Stats: 1696 lines in 69 files changed: 4 ins; 1658 del; 34 mod
  Patch: https://git.openjdk.org/jdk/pull/20316.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20316/head:pull/20316

PR: https://git.openjdk.org/jdk/pull/20316

From amenkov at openjdk.org  Wed Jul 24 18:36:41 2024
From: amenkov at openjdk.org (Alex Menkov)
Date: Wed, 24 Jul 2024 18:36:41 GMT
Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations
Message-ID: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com>

Obsolete PreserveAllAnnotations flag which was deprecated in JDK 23.

Testing: tier1,tier2,tier3,tier4,hs-tier5-svc

-------------

Commit messages:
 - jcheck
 - fix

Changes: https://git.openjdk.org/jdk/pull/20315/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20315&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8330427
  Stats: 192 lines in 7 files changed: 3 ins; 153 del; 36 mod
  Patch: https://git.openjdk.org/jdk/pull/20315.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20315/head:pull/20315

PR: https://git.openjdk.org/jdk/pull/20315

From szaldana at openjdk.org  Wed Jul 24 18:40:16 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Wed, 24 Jul 2024 18:40:16 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v12]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <Hj_ISn8I6TozxRCyy2AqQs-F1rnETzYwqMmme-ih87M=.8a7b1a72-ae81-4d00-a6ac-11fa17ec978e@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  Adding default perfmap filename when invoked outside of diagnostic command

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/34e3f80a..d43d90d1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=11
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=10-11

  Stats: 5 lines in 3 files changed: 2 ins; 2 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From stuefe at openjdk.org  Wed Jul 24 18:40:16 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Wed, 24 Jul 2024 18:40:16 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v11]
In-Reply-To: <5KMQ16ZAAx9VI9fpawUyrvQSqll5wzs9lCC1bRL62ow=.34c7def9-ab85-42b7-af4b-a433a5b5cf2c@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <5KMQ16ZAAx9VI9fpawUyrvQSqll5wzs9lCC1bRL62ow=.34c7def9-ab85-42b7-af4b-a433a5b5cf2c@github.com>
Message-ID: <yWaSEDOfex_SXtjwbWE8A-0RwJQIoICkmdQXOdOaOp0=.5d84ef94-8e21-4933-b3ab-8216c73bdc90@github.com>

On Wed, 24 Jul 2024 18:05:49 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Updating help text for VM.cds

All good.

-------------

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2197495698

From szaldana at openjdk.org  Wed Jul 24 18:40:16 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Wed, 24 Jul 2024 18:40:16 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v9]
In-Reply-To: <uZeEnxjF6MkDWWOYSdfSUsP9VHParHx7dRweSXjaeM0=.3f3654c8-bf79-4cf3-88de-ba5530276cd9@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com>
 <uZeEnxjF6MkDWWOYSdfSUsP9VHParHx7dRweSXjaeM0=.3f3654c8-bf79-4cf3-88de-ba5530276cd9@github.com>
Message-ID: <8_dPuH2noHgNOFKzsBke96yBSdGoTwhBl0-pyXaoDhA=.e638cdb0-2ea1-42ca-bd8b-88eaf2b719ac@github.com>

On Wed, 24 Jul 2024 10:38:01 GMT, Kevin Walls <kevinw at openjdk.org> wrote:

> In src/hotspot/share/runtime/java.cpp: if (DumpPerfMapAtExit) { CodeCache::write_perf_map(....
> 
> It may need to pass DEFAULT_PERFMAP_FILENAME (and tty).
> 
> Do you have the change from JDK-8327054 merged into this branch?

Good point - I made an update to cover that invocation.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2248669712

From shade at openjdk.org  Wed Jul 24 18:44:31 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Wed, 24 Jul 2024 18:44:31 GMT
Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update
 mode
In-Reply-To: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
Message-ID: <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com>

On Wed, 24 Jul 2024 18:08:46 GMT, William Kemper <wkemper at openjdk.org> wrote:

> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development.
> 
> ## Testing
> * hotspot_gc_shenandoah
> * dacapo
> * diluvian
> * extremem
> * hyperalloc
> * specjbb2015
> * specjvm2008

Good riddance. I have to comb through this more accurately tomorrow, but first pass comments below.

src/hotspot/cpu/ppc/gc/shenandoah/shenandoahBarrierSetAssembler_ppc.cpp line 571:

> 569:   /* ==== Apply keep-alive barrier, if required (e.g., to inhibit weak reference resurrection) ==== */
> 570:   if (ShenandoahBarrierSet::need_keep_alive_barrier(decorators, type)) {
> 571:     if (ShenandoahSATBBarrier) {

A bit weird to replace IU with SATB barrier here.

src/hotspot/share/opto/classes.hpp line 327:

> 325: shmacro(ShenandoahWeakCompareAndSwapN)
> 326: shmacro(ShenandoahWeakCompareAndSwapP)
> 327: 

I think this newline is unnecessary.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20316#pullrequestreview-2197480695
PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690256658
PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690269614

From kdnilsen at openjdk.org  Wed Jul 24 18:55:32 2024
From: kdnilsen at openjdk.org (Kelvin Nilsen)
Date: Wed, 24 Jul 2024 18:55:32 GMT
Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update
 mode
In-Reply-To: <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com>
References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
 <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com>
Message-ID: <sdmP00bGZEd6m6TWUElgXdt1JL-IbSJNUFDAAcbv3bU=.0ce7b1f6-fc3c-49ed-b771-ca1311b208b0@github.com>

On Wed, 24 Jul 2024 18:25:38 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development.
>> 
>> ## Testing
>> * hotspot_gc_shenandoah
>> * dacapo
>> * diluvian
>> * extremem
>> * hyperalloc
>> * specjbb2015
>> * specjvm2008
>
> src/hotspot/cpu/ppc/gc/shenandoah/shenandoahBarrierSetAssembler_ppc.cpp line 571:
> 
>> 569:   /* ==== Apply keep-alive barrier, if required (e.g., to inhibit weak reference resurrection) ==== */
>> 570:   if (ShenandoahBarrierSet::need_keep_alive_barrier(decorators, type)) {
>> 571:     if (ShenandoahSATBBarrier) {
> 
> A bit weird to replace IU with SATB barrier here.

Will need_keep_alive_barrier() always be false in absence of IU mode support?  can we replace this with an assert?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690285365

From coleenp at openjdk.org  Wed Jul 24 18:56:33 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Wed, 24 Jul 2024 18:56:33 GMT
Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations
In-Reply-To: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com>
References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com>
Message-ID: <EwkVrk4h27-FwYktUeakBXODWJ43gCXl5s68QYLdh-I=.ab04c479-8c24-47ac-b84a-dccd0041ff31@github.com>

On Wed, 24 Jul 2024 18:01:15 GMT, Alex Menkov <amenkov at openjdk.org> wrote:

> Obsolete PreserveAllAnnotations flag which was deprecated in JDK 23.
> 
> Testing: tier1,tier2,tier3,tier4,hs-tier5-svc

This looks really good. I hadn't expected so much code we had to preserve these.  Nice cleanup!

-------------

Marked as reviewed by coleenp (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20315#pullrequestreview-2197529968

From kvn at openjdk.org  Wed Jul 24 19:00:33 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Wed, 24 Jul 2024 19:00:33 GMT
Subject: RFR: 8336999: Verification for resource area allocated data
 structures in C2
In-Reply-To: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
Message-ID: <YpSBc5RXGIWZ-W1Xu3XYqXedrbX7vnZashVrght5v4k=.62192fc8-77a7-49f5-a185-17317753a521@github.com>

On Wed, 24 Jul 2024 10:29:32 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

> Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption.
> 
> This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime.
> 
> While testing, I hit the verification code from:
> 
> V  [libjvm.so+0x5c1ceb]  ReallocMark::check(Arena*)+0x7b  (allocation.cpp:244)
> V  [libjvm.so+0x6df2da]  Block_Array::grow(unsigned int)+0x1a  (block.cpp:43)
> V  [libjvm.so+0xb88679]  PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159  (block.hpp:72)
> V  [libjvm.so+0xb88b6b]  PhaseCFG::build_dominator_tree()+0xab  (domgraph.cpp:74)
> V  [libjvm.so+0xd75791]  PhaseCFG::do_global_code_motion()+0x11  (gcm.cpp:1635)
> V  [libjvm.so+0x9f4fd4]  Compile::Code_Gen()+0x2a4  (compile.cpp:2949)
> V  [libjvm.so+0x9f5f16]  Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6  (compile.cpp:991)
> 
> 
> It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing.
> 
> This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`.
> 
> We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015).
> 
> Thanks,
> Tobias

Looks good to me.

-------------

Marked as reviewed by kvn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20311#pullrequestreview-2197536320

From kbarrett at openjdk.org  Wed Jul 24 19:02:32 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Wed, 24 Jul 2024 19:02:32 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap' [v3]
In-Reply-To: <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
 <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com>
Message-ID: <t1g-4dP38_LQWzBPFLqZlsHaDKKrLBc_4LzYSuH_Sc8=.8f4f92c1-2e49-40e0-88b1-ca1c37c1ec70@github.com>

On Wed, 24 Jul 2024 13:59:44 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> When running with ubsan - enabled binaries, some tests trigger the following report :
>> 
>> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
>>     #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91
>>     #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106
>>     #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157
>>     #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63
>>     #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85
>>     #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319
>>     #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233
>>     #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199
>> 
>> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr .
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   ATTRIBUTE_NO_UBSAN must be after template typename ...

Changes requested by kbarrett (Reviewer).

> I think this is intended. No instances of SmallRegisterMap are ever created.
> 
> Instead [SmallRegisterMap::instance](https://github.com/openjdk/jdk/blob/5b4824cf9aba297fa6873ebdadc0e9545647e90d/src/hotspot/cpu/x86/smallRegisterMap_x86.inline.hpp#L34C20-L34C36) is used:
> 
> ```c++
>   static constexpr SmallRegisterMap* instance = nullptr;
> ```
> 
> The type is the only information that is actually used.

Being intentional doesn't make it any less invalid.

Here's an untested change that I think will fix the problem.
https://github.com/openjdk/jdk/compare/master...kimbarrett:openjdk-jdk:smallregmap?expand=1

src/hotspot/share/runtime/frame.inline.hpp line 86:

> 84: 
> 85: template <typename RegisterMapT>
> 86: ATTRIBUTE_NO_UBSAN

That's not good enough.  Turning off the ubsan warning doesn't prevent the compiler from doing
unexpected and potentially bad things with invalid code.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20296#pullrequestreview-2197514292
PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2248704435
PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2248706749
PR Review Comment: https://git.openjdk.org/jdk/pull/20296#discussion_r1690277354

From vlivanov at openjdk.org  Wed Jul 24 19:12:37 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 24 Jul 2024 19:12:37 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v6]
In-Reply-To: <YxBy1Mx7Di5EDfJkCTfcaIuTzCv5KdzBzKMcE3iIeak=.2a56f436-8e14-4a22-a85d-cd06209e2c01@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com>
 <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com>
 <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com>
 <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com>
 <UAjH__AKdU3UMdJBkg7TlElKSA8mEFFE0MiElVrYexE=.4bc67a26-3383-4e4e-92b0-f1d3d33c5ce2@github.com>
 <M5xQ14pzHdBEr7yAdAqIVUsY_o8tXUgN9HpKxjkZznw=.f2262137-2fec-4297-ae1e-89b11874266f@github.com>
 <YxBy1Mx7Di5EDfJkCTfcaIuTzCv5KdzBzKMcE3iIeak=.2a56f436-8e14-4a22-a85d-cd06209e2c01@github.com>
Message-ID: <vw5vWKYgk45g7I9Yio_NTYLSL9fz3y6ptFHJyGNZJCE=.bb3c7d4e-9a5e-4c53-80b7-853dc74a611c@github.com>

On Wed, 24 Jul 2024 09:03:12 GMT, Andrew Haley <aph at openjdk.org> wrote:

>>> Also also, Klass::is_subtype_of() is used for C1 runtime.
>> 
>> Can you elaborate, please? What I'm seeing in `Runtime1::generate_code_for()` for `slow_subtype_check` is a call into `MacroAssembler::check_klass_subtype_slow_path()`.
>
>> > Also also, Klass::is_subtype_of() is used for C1 runtime.
>> 
>> Can you elaborate, please?
> 
> Sorry, that was rather vague. In C1-compiled code, the Java method `Class::isInstance(Object)`calls `Klass::is_subtype_of()`. 
> 
> In general, I find it difficult to decide how much work, if any, should be done to improve C1 performance. Clearly, if C1 exists only to help with startup time in a tiered compilation system, the answer is "not much".

Thanks, now I see that `Class::isInstance(Object)` is backed by `Runtime1::is_instance_of()` which uses `oopDesc::is_a()` to do the job.

If it turns out to be performance critical, the intrinsic implementation should be rewritten to exercise existing subtype checking support in C1. As it is implemented now, it's already quite inefficient.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1690303731

From wkemper at openjdk.org  Wed Jul 24 19:27:40 2024
From: wkemper at openjdk.org (William Kemper)
Date: Wed, 24 Jul 2024 19:27:40 GMT
Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update
 mode
In-Reply-To: <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com>
References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
 <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com>
Message-ID: <iExDnHB_1WSKaVnW8g2usSSiQUTMMQuGuc691noaUnA=.792236b9-a939-43c7-aa55-c200bf5e7d86@github.com>

On Wed, 24 Jul 2024 18:25:38 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development.
>> 
>> ## Testing
>> * hotspot_gc_shenandoah
>> * dacapo
>> * diluvian
>> * extremem
>> * hyperalloc
>> * specjbb2015
>> * specjvm2008
>
> src/hotspot/cpu/ppc/gc/shenandoah/shenandoahBarrierSetAssembler_ppc.cpp line 571:
> 
>> 569:   /* ==== Apply keep-alive barrier, if required (e.g., to inhibit weak reference resurrection) ==== */
>> 570:   if (ShenandoahBarrierSet::need_keep_alive_barrier(decorators, type)) {
>> 571:     if (ShenandoahSATBBarrier) {
> 
> A bit weird to replace IU with SATB barrier here.

@shipilev [The original code](https://github.com/openjdk/jdk/pull/20316/files#diff-cb01b36a8c7017c9e21645a0ff9075897e5bfa67ae37d4f0d69ccc582656ec31L71) used the function `iu_barrier` to emit the pre-write barrier for both SATB and IU modes. This only happened in the `ppc` port. Other platforms just invoke the function to emit the `satb_write_barrier` directly here (see [x86](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp#L580), for example).

@kdnilsen No, `need_keep_alive_barrier` may be true in SATB mode. The use of the barrier here is to make sure weak references that get loaded during mark are added to the SATB buffer.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690320078

From wkemper at openjdk.org  Wed Jul 24 19:31:04 2024
From: wkemper at openjdk.org (William Kemper)
Date: Wed, 24 Jul 2024 19:31:04 GMT
Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update
 mode [v2]
In-Reply-To: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
Message-ID: <03bSRAN8T28AU2-M4IzjsBygTwG4SHrc8HUIJYLM5TE=.e4299a87-b25f-471b-9f6e-2c08e741c6f2@github.com>

> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development.
> 
> ## Testing
> * hotspot_gc_shenandoah
> * dacapo
> * diluvian
> * extremem
> * hyperalloc
> * specjbb2015
> * specjvm2008

William Kemper has updated the pull request incrementally with one additional commit since the last revision:

  Remove unintentional new line

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20316/files
  - new: https://git.openjdk.org/jdk/pull/20316/files/41a2deb8..ec2d6b64

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20316&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20316&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20316.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20316/head:pull/20316

PR: https://git.openjdk.org/jdk/pull/20316

From wkemper at openjdk.org  Wed Jul 24 19:31:05 2024
From: wkemper at openjdk.org (William Kemper)
Date: Wed, 24 Jul 2024 19:31:05 GMT
Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update
 mode [v2]
In-Reply-To: <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com>
References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
 <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com>
Message-ID: <G2bCQdkWIBL7HfMTvz9gM07jZ2qAme4yFMMaXOW6tSc=.71fce9e0-0cfe-4e3a-bdee-6d138b5a4444@github.com>

On Wed, 24 Jul 2024 18:38:02 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> William Kemper has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Remove unintentional new line
>
> src/hotspot/share/opto/classes.hpp line 327:
> 
>> 325: shmacro(ShenandoahWeakCompareAndSwapN)
>> 326: shmacro(ShenandoahWeakCompareAndSwapP)
>> 327: 
> 
> I think this newline is unnecessary.

I agree.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690322953

From kbarrett at openjdk.org  Wed Jul 24 20:57:37 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Wed, 24 Jul 2024 20:57:37 GMT
Subject: RFR: 8316930: HotSpot should use noexcept instead of throw() [v4]
In-Reply-To: <zEG6cI-mqmOGle_U_psjrGOlJysZTex7Xw5fal3un-c=.e1beed32-9efa-4dbd-94c2-7ae6f9a24287@github.com>
References: <kc_cq_sBCqn-iAwHCEaTqgMVYrnT6tKsk3SZnD_qP-s=.1b5d24dd-a925-4f6d-aefb-67b4df6bddac@github.com>
 <VScr4niHiJs_5LG0Npi39l3fdOa47am0zftg3jO-IsQ=.0905fa3e-2b66-41a5-b015-2bbd9f7b3940@github.com>
 <2sW1ZQ333qSkLQRk_e-f4g-sOwvW2bKshy8cszUoDrw=.bca141f1-d832-4f95-ad68-47c5fc3d068b@github.com>
 <X522HC3ke8LtoI1xQspLXQDc2Drkc_gmYZKap_TS7pQ=.f0a5177a-187e-4f4e-88df-fc84dec67eb2@github.com>
 <zEG6cI-mqmOGle_U_psjrGOlJysZTex7Xw5fal3un-c=.e1beed32-9efa-4dbd-94c2-7ae6f9a24287@github.com>
Message-ID: <RTVs-cHpHTkrIGHu-qQQMNk55e2D-0lA4NJh6u5fBMU=.66ee5a8b-982e-4759-a104-db5721fc2d31@github.com>

On Sun, 4 Feb 2024 23:09:33 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> I think EXIT_OOM is an implementation detail. The key point is that these methods do not throw.
>
> Or do I have that backwards ... the key point of `no_except` is that callers of these method must check for `nullptr`, but with EXIT_OOM that is not the case - and we don't want it to appear that the allocation can actually fail and we continue execution!

It looks like there are a number of `operator new`s that have nothrow exception specs but shouldn't.
CompilationResourceObj in the immediately preceding file, for example.
A precursor cleanup (or maybe several) that removed those first would be nice.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/15910#discussion_r1690410887

From kbarrett at openjdk.org  Wed Jul 24 20:57:37 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Wed, 24 Jul 2024 20:57:37 GMT
Subject: RFR: 8316930: HotSpot should use noexcept instead of throw() [v4]
In-Reply-To: <RTVs-cHpHTkrIGHu-qQQMNk55e2D-0lA4NJh6u5fBMU=.66ee5a8b-982e-4759-a104-db5721fc2d31@github.com>
References: <kc_cq_sBCqn-iAwHCEaTqgMVYrnT6tKsk3SZnD_qP-s=.1b5d24dd-a925-4f6d-aefb-67b4df6bddac@github.com>
 <VScr4niHiJs_5LG0Npi39l3fdOa47am0zftg3jO-IsQ=.0905fa3e-2b66-41a5-b015-2bbd9f7b3940@github.com>
 <2sW1ZQ333qSkLQRk_e-f4g-sOwvW2bKshy8cszUoDrw=.bca141f1-d832-4f95-ad68-47c5fc3d068b@github.com>
 <X522HC3ke8LtoI1xQspLXQDc2Drkc_gmYZKap_TS7pQ=.f0a5177a-187e-4f4e-88df-fc84dec67eb2@github.com>
 <zEG6cI-mqmOGle_U_psjrGOlJysZTex7Xw5fal3un-c=.e1beed32-9efa-4dbd-94c2-7ae6f9a24287@github.com>
 <RTVs-cHpHTkrIGHu-qQQMNk55e2D-0lA4NJh6u5fBMU=.66ee5a8b-982e-4759-a104-db5721fc2d31@github.com>
Message-ID: <Q7mS5hx_DMDBevrTHtfZV_uVtg8WsrgXkhrd_baO_sg=.0a20afba-7270-4abc-877b-ee275b89b644@github.com>

On Wed, 24 Jul 2024 20:53:04 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> Or do I have that backwards ... the key point of `no_except` is that callers of these method must check for `nullptr`, but with EXIT_OOM that is not the case - and we don't want it to appear that the allocation can actually fail and we continue execution!
>
> It looks like there are a number of `operator new`s that have nothrow exception specs but shouldn't.
> CompilationResourceObj in the immediately preceding file, for example.
> A precursor cleanup (or maybe several) that removed those first would be nice.

It seems that https://bugs.openjdk.org/browse/JDK-8305590 only removed some of them.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/15910#discussion_r1690413889

From kvn at openjdk.org  Wed Jul 24 21:00:32 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Wed, 24 Jul 2024 21:00:32 GMT
Subject: RFR: 8334230: Optimize C2 classes layout
In-Reply-To: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>
References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>
Message-ID: <Cbw6Z7Zoa5krBIgdJ8_MF3IPox99JcSAEckvYStY1vM=.0e7d3fd5-0021-4694-8037-16e1ab91e644@github.com>

On Mon, 24 Jun 2024 15:53:24 GMT, Neethu Prasad <nprasad at openjdk.org> wrote:

> **Notes**
> 
> Rearrange C2 class fields to optimize footprint.
> 
> 
> **Verification**
> 
> 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests.
> 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes.
> 
> | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding |
> | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- |
> | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 ->  0 | 11 -> 0  | 56 bytes -> 48 | 0 -> 3 |
> | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 ->  0 | 5 -> 0  | 24 bytes -> 16 | 7 -> 4 |
> | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 ->  0 | 7 -> 0  | 56 bytes -> 48 | 7 -> 6 |
> | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 ->  0 | 8 -> 0  | 32 bytes -> 24 | 1 -> 1 |
> 
> class ArrayPointer {
> 	const class Node  *        _pointer;             /*     0     8 */
> 	const class Node  *        _base;                /*     8     8 */
> 	const jlong                _constant_offset;     /*    16     8 */
> 	const class Node  *        _int_offset;          /*    24     8 */
> 	const class GrowableArray<Node*>  * _other_offsets; /*    32     8 */
> 	const jint                 _int_offset_shift;    /*    40     4 */
> 	const bool                 _is_valid;            /*    44     1 */
> public:
> 
> 
> 	/* size: 48, cachelines: 1, members: 7 */
> 	/* padding: 3 */
> 	/* last cacheline: 48 bytes */
> };
> 
> 
> 
> class CallJavaNode : public CallNode {
> public:
> 
> 	/* class CallNode            <ancestor>; */      /*     0   128 */
> protected:
> 
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	class ciMethod *           _method;              /*   128     8 */
> 	bool                       _optimized_virtual;   /*   136     1 */
> 	bool                       _method_handle_invoke; /*   137     1 */
> 	bool                       _override_symbolic_info; /*   138     1 */
> 	bool                       _arg_escape;          /*   139     1 */
> public:
> 
> protected:
> 
> public:
> 
> 
> 	/* size: 144, cachelines: 3, members: 6 */
> 	/* padding: 4 */
> 	/* last cacheline: 16 bytes */
> 
> 	/* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */
> };
> 
> 
> 
> class C2Access : public StackObj {
> public:
> 
> 	/* class StackObj            <ancestor>; */      /*     0     0 */
> 
> 	/* XXX last struct has 1 byte of padding */
> 
> 	int ()(void) * *           _vptr.C2Access;       /*     0     8 */
> protected:
> 
> 	DecoratorSet               _decorators;          /*     8  ...

Good.

-------------

Marked as reviewed by kvn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19861#pullrequestreview-2197755523

From vlivanov at openjdk.org  Wed Jul 24 21:26:39 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 24 Jul 2024 21:26:39 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v6]
In-Reply-To: <eMrlgijA4K3kj9F7-cj1RBWRQg0rc9faF13SR9UdEys=.4ca5c8f7-6462-4adf-8160-91c018985822@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <wgY2erz716MCi6K6DcUKEqLyd6E82ArMlba9qHdAA9o=.de21daa5-b078-4469-a6eb-df548f699f65@github.com>
 <Ct5EunuM4nq5EUa-kDtCzKs-O4Z_wEnMq2_5W7GPaeY=.f475ee5b-3bea-48a7-97d1-7f71287e4fc9@github.com>
 <TxLB7H7lM8c1e-Hc5PvGAiuil1YKOfWqg_EJUwFp4O8=.554e044f-6e2a-4216-96cc-9d55b309280d@github.com>
 <sYvOt6BDxFIPAczwoEop5-nUNHQeOi-IH2hGlSVL0ww=.8f6ed07f-0251-44c9-a3a0-0742dabbc15c@github.com>
 <eMrlgijA4K3kj9F7-cj1RBWRQg0rc9faF13SR9UdEys=.4ca5c8f7-6462-4adf-8160-91c018985822@github.com>
Message-ID: <FmVMUWy97fFvsqi16zkq3xtZzftZqo6oa9YSxWUIr_E=.1a2e7ab6-1b85-47fe-ae4c-3b8705f65fd3@github.com>

On Wed, 24 Jul 2024 16:14:47 GMT, Andrew Haley <aph at openjdk.org> wrote:

>>> I suspect that Klass::search_secondary_supers() won't be inlinined in such case.
>> 
>> That's true, but it's true of every other function in that file. Is it not deliberate?
>
> FYI, somewhat related: AArch64 GCC inlines `lookup_secondary_supers_table()` 237 times (it's only a few instructions.)
> x86-64 GCC, because it doesn't use a popcount intrinsic, decides that `lookup_secondary_supers_table()` is too big to be worth inlining in all but 3 cases. So the right thing happens, I think: where we can profit from fast lookups without bloating the runtime, we do.

> That's true, but it's true of every other function in that file. Is it not deliberate?

IMO the fact that `Klass::search_secondary_supers()` is used in `klass.hpp` makes a difference here. 

After thinking more about it, I did a small experiment [1] and observed a build failure on AArch64 [2]. 
I think we don't see any more failures simply because `klass.inline.hpp` is included pervasively.

What do you think about moving `Klass::is_subtype_of()` to `klass.inline.hpp`? 

[1]

diff --git a/test/hotspot/gtest/oops/test_klass.cpp b/test/hotspot/gtest/oops/test_klass.cpp
new file mode 100644
index 00000000000..326a70f1f54
--- /dev/null
+++ b/test/hotspot/gtest/oops/test_klass.cpp
@@ -0,0 +1,9 @@
+#include "precompiled.hpp"
+#include "oops/klass.hpp"
+#include "unittest.hpp"
+
+TEST_VM(Klass, is_subtype_of) {
+  Klass* k = vmClasses::Object_klass();
+  ASSERT_TRUE(k->is_subtype_of(k));
+}


[2] 

Undefined symbols for architecture arm64:
  "Klass::search_secondary_supers(Klass*) const", referenced from:
      Klass_is_subtype_of_vm_Test::TestBody() in test_klass.o

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1690472489

From dholmes at openjdk.org  Thu Jul 25 01:20:33 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 25 Jul 2024 01:20:33 GMT
Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations
In-Reply-To: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com>
References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com>
Message-ID: <PRFkm94et3LVwe-Rq4m8vZYSzw7sKlvZIRu9ltxu0h4=.91dc77a3-8c50-4a9c-9529-320efcfa78c5@github.com>

On Wed, 24 Jul 2024 18:01:15 GMT, Alex Menkov <amenkov at openjdk.org> wrote:

> Obsolete PreserveAllAnnotations flag which was deprecated in JDK 23.
> 
> Testing: tier1,tier2,tier3,tier4,hs-tier5-svc

Great cleanup - good to see all that complexity go!

I think the test can be removed completely - see below.

Thanks

test/jdk/java/lang/instrument/RetransformRecordAnnotation.java line 32:

> 30:  * @run shell MakeJAR.sh retransformAgent
> 31:  * @run main/othervm -javaagent:retransformAgent.jar -Xlog:redefine+class=trace RetransformRecordAnnotation
> 32:  * @run main/othervm -javaagent:retransformAgent.jar -XX:+PreserveAllAnnotations -Xlog:redefine+class=trace RetransformRecordAnnotation

This test is described as:

 * @summary test that records with invisible annotation can be retransformed
 ```
which suggests to me the test can actually be deleted as it serves no purpose now there are no invisible annotations

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20315#pullrequestreview-2198083244
PR Review Comment: https://git.openjdk.org/jdk/pull/20315#discussion_r1690664957

From amenkov at openjdk.org  Thu Jul 25 01:53:13 2024
From: amenkov at openjdk.org (Alex Menkov)
Date: Thu, 25 Jul 2024 01:53:13 GMT
Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations [v2]
In-Reply-To: <PRFkm94et3LVwe-Rq4m8vZYSzw7sKlvZIRu9ltxu0h4=.91dc77a3-8c50-4a9c-9529-320efcfa78c5@github.com>
References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com>
 <PRFkm94et3LVwe-Rq4m8vZYSzw7sKlvZIRu9ltxu0h4=.91dc77a3-8c50-4a9c-9529-320efcfa78c5@github.com>
Message-ID: <WqY9zi3sI0V3Fr_MsLNqCube0o2-_2YaktYchKaz43U=.570d227b-9f19-4d2a-a9f2-beb519e3856a@github.com>

On Thu, 25 Jul 2024 01:17:14 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   remove test
>
> test/jdk/java/lang/instrument/RetransformRecordAnnotation.java line 32:
> 
>> 30:  * @run shell MakeJAR.sh retransformAgent
>> 31:  * @run main/othervm -javaagent:retransformAgent.jar -Xlog:redefine+class=trace RetransformRecordAnnotation
>> 32:  * @run main/othervm -javaagent:retransformAgent.jar -XX:+PreserveAllAnnotations -Xlog:redefine+class=trace RetransformRecordAnnotation
> 
> This test is described as:
> 
>  * @summary test that records with invisible annotation can be retransformed
>  ```
> which suggests to me the test can actually be deleted as it serves no purpose now there are no invisible annotations

Agree.
Removed the test.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20315#discussion_r1690681450

From amenkov at openjdk.org  Thu Jul 25 01:53:13 2024
From: amenkov at openjdk.org (Alex Menkov)
Date: Thu, 25 Jul 2024 01:53:13 GMT
Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations [v2]
In-Reply-To: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com>
References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com>
Message-ID: <yoYPcRiwlovmm5hdLcD8y1d25ABb3r5KUniSzzyfBzI=.be9f5747-fa03-4b13-ba53-4d868ea85989@github.com>

> Obsolete PreserveAllAnnotations flag which was deprecated in JDK 23.
> 
> Testing: tier1,tier2,tier3,tier4,hs-tier5-svc

Alex Menkov has updated the pull request incrementally with one additional commit since the last revision:

  remove test

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20315/files
  - new: https://git.openjdk.org/jdk/pull/20315/files/03aa9a76..89c83c60

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20315&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20315&range=00-01

  Stats: 186 lines in 1 file changed: 0 ins; 186 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20315.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20315/head:pull/20315

PR: https://git.openjdk.org/jdk/pull/20315

From dholmes at openjdk.org  Thu Jul 25 01:57:34 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 25 Jul 2024 01:57:34 GMT
Subject: RFR: 8333144: docker tests do not work when ubsan is configured
In-Reply-To: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
Message-ID: <6IlNzo_E9E-FWJrCiQZJRxD6rcTrBZ9pB86OjP0DzMU=.6562dd2f-eed5-4dd8-8fe7-b116e7932a3e@github.com>

On Wed, 26 Jun 2024 13:32:32 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work.
> 
> We find this in the test output
> 
> [STDOUT]
> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory
> 
> The container where the test is executed does not contain the ubsan package; we might skip the test in this case.

On the one hand this seems like a "Dr Dr it hurts when I do this" kind of problem. On the other hand it only affects the docker testing so I'm inclined to let it in, even though it is a bit of a blunt instrument (what if ubsan is installed in the container and someone wants to run with it enabled there?).

-------------

PR Review: https://git.openjdk.org/jdk/pull/19907#pullrequestreview-2198111143

From dholmes at openjdk.org  Thu Jul 25 01:58:31 2024
From: dholmes at openjdk.org (David Holmes)
Date: Thu, 25 Jul 2024 01:58:31 GMT
Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations [v2]
In-Reply-To: <yoYPcRiwlovmm5hdLcD8y1d25ABb3r5KUniSzzyfBzI=.be9f5747-fa03-4b13-ba53-4d868ea85989@github.com>
References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com>
 <yoYPcRiwlovmm5hdLcD8y1d25ABb3r5KUniSzzyfBzI=.be9f5747-fa03-4b13-ba53-4d868ea85989@github.com>
Message-ID: <aIkYNugMcYZSQq6-hQHBvVaojngb68xf8ZJxoMJsy5Y=.3e5ccf25-cde1-4956-8b75-cc80df4a8295@github.com>

On Thu, 25 Jul 2024 01:53:13 GMT, Alex Menkov <amenkov at openjdk.org> wrote:

>> Obsolete PreserveAllAnnotations flag which was deprecated in JDK 23.
>> 
>> Testing: tier1,tier2,tier3,tier4,hs-tier5-svc
>
> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove test

Marked as reviewed by dholmes (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20315#pullrequestreview-2198112296

From fyang at openjdk.org  Thu Jul 25 04:11:32 2024
From: fyang at openjdk.org (Fei Yang)
Date: Thu, 25 Jul 2024 04:11:32 GMT
Subject: RFR: 8335191: RISC-V: verify perf of chacha20
In-Reply-To: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com>
References: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com>
Message-ID: <PBX_MsEj4hzAW23ya5c1lfHuabJAJD-mKQfhqbz9ZzY=.ae162303-7a4a-456f-9e86-8f3852d6b35c@github.com>

On Tue, 23 Jul 2024 11:21:31 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Hi,
> Can you help to review this simple patch?
> 
> Previously, we implemented this intrinsic for chacha20 algo based on vector instructions, the latest test on real hardwares (k230, bananapi) shows that the implementation only bring more performance gain rather than regression when (vlenb == 32, on bananapi), when vlenb == 16 (on k230) it only bring regression in all test cases.
> So, we should adjust when to turn on the intrinsic, ie. only when vlenb == 32.
> 
> Thanks
> 
> 
> ## Performance
> 
> ### on k230
> vlenb == 16
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
> Benchmark - on k230, vlenb == 16 | (dataSize) | (keyLength) | (mode) | (padding) | (permutation) | Cnt | Score -no-intrinsic | Score +intrinsic | Error | Units | Non-intrinsic/intrinsic
> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4642.694 | 5216.699 | 36.039 | ns/op | 0.89
> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15719.612 | 17622.616 | 136.609 | ns/op | 0.892
> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 59402.742 | 67124.28 | 651.011 | ns/op | 0.885
> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 250056.475 | 269184.924 | 8591.727 | ns/op | 0.929
> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4752.081 | 5131.094 | 38.917 | ns/op | 0.926
> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15554.484 | 16992.339 | 106.583 | ns/op | 0.915
> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 61446.365 | 67359.67 | 548.353 | ns/op | 0.912
> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 241653.654 | 270189.531 | 3705.045 | ns/op | 0.894
> o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 256 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 17833.825 | 20610.118 | 688.668 | ns/op | 0.865
> o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 1024 | 256 | None | NoPadding ChaC | ha20-Poly1...

Thanks for carrying out the test.

-------------

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20298#pullrequestreview-2198265238

From kbarrett at openjdk.org  Thu Jul 25 04:29:35 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Thu, 25 Jul 2024 04:29:35 GMT
Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate [v2]
In-Reply-To: <VQeOV8bJxKRoDHOg5MkGa8ukguwU0SaiB3SpL3gq3_g=.4b4386f8-fc8e-4bd7-ac15-c089c59fb05c@github.com>
References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com>
 <VQeOV8bJxKRoDHOg5MkGa8ukguwU0SaiB3SpL3gq3_g=.4b4386f8-fc8e-4bd7-ac15-c089c59fb05c@github.com>
Message-ID: <14S7Ls4AoVfFjxSOeW4N42KZtcvhsJrIe25S9r5FEjg=.03e5738f-bcb6-42d5-831f-a2dc00c01c86@github.com>

On Wed, 24 Jul 2024 09:11:13 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> Simple obsoleting a Parallel GC product flag.
>
> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review

Looks good.

I assume you will be updating copyrights before integration?

-------------

Marked as reviewed by kbarrett (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20299#pullrequestreview-2198278577

From thartmann at openjdk.org  Thu Jul 25 05:05:31 2024
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Thu, 25 Jul 2024 05:05:31 GMT
Subject: RFR: 8336999: Verification for resource area allocated data
 structures in C2
In-Reply-To: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
Message-ID: <Z38oUi5mwF0KqgcyfFnDZjuvvQyY1UtIyrKeoOJHjk0=.18e09077-c469-4306-8956-7114e15a4dc7@github.com>

On Wed, 24 Jul 2024 10:29:32 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

> Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption.
> 
> This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime.
> 
> While testing, I hit the verification code from:
> 
> V  [libjvm.so+0x5c1ceb]  ReallocMark::check(Arena*)+0x7b  (allocation.cpp:244)
> V  [libjvm.so+0x6df2da]  Block_Array::grow(unsigned int)+0x1a  (block.cpp:43)
> V  [libjvm.so+0xb88679]  PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159  (block.hpp:72)
> V  [libjvm.so+0xb88b6b]  PhaseCFG::build_dominator_tree()+0xab  (domgraph.cpp:74)
> V  [libjvm.so+0xd75791]  PhaseCFG::do_global_code_motion()+0x11  (gcm.cpp:1635)
> V  [libjvm.so+0x9f4fd4]  Compile::Code_Gen()+0x2a4  (compile.cpp:2949)
> V  [libjvm.so+0x9f5f16]  Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6  (compile.cpp:991)
> 
> 
> It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing.
> 
> This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`.
> 
> We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015).
> 
> Thanks,
> Tobias

Thanks for the review, Vladimir!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20311#issuecomment-2249434446

From thartmann at openjdk.org  Thu Jul 25 05:06:31 2024
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Thu, 25 Jul 2024 05:06:31 GMT
Subject: RFR: 8334230: Optimize C2 classes layout
In-Reply-To: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>
References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>
Message-ID: <GrbzBV2RTD9A0wnUCt4Y0w-FescrLUfEvp2Lvzr8zxM=.c88db3dc-34ba-4142-847e-b790c92b557c@github.com>

On Mon, 24 Jun 2024 15:53:24 GMT, Neethu Prasad <nprasad at openjdk.org> wrote:

> **Notes**
> 
> Rearrange C2 class fields to optimize footprint.
> 
> 
> **Verification**
> 
> 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests.
> 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes.
> 
> | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding |
> | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- |
> | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 ->  0 | 11 -> 0  | 56 bytes -> 48 | 0 -> 3 |
> | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 ->  0 | 5 -> 0  | 24 bytes -> 16 | 7 -> 4 |
> | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 ->  0 | 7 -> 0  | 56 bytes -> 48 | 7 -> 6 |
> | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 ->  0 | 8 -> 0  | 32 bytes -> 24 | 1 -> 1 |
> 
> class ArrayPointer {
> 	const class Node  *        _pointer;             /*     0     8 */
> 	const class Node  *        _base;                /*     8     8 */
> 	const jlong                _constant_offset;     /*    16     8 */
> 	const class Node  *        _int_offset;          /*    24     8 */
> 	const class GrowableArray<Node*>  * _other_offsets; /*    32     8 */
> 	const jint                 _int_offset_shift;    /*    40     4 */
> 	const bool                 _is_valid;            /*    44     1 */
> public:
> 
> 
> 	/* size: 48, cachelines: 1, members: 7 */
> 	/* padding: 3 */
> 	/* last cacheline: 48 bytes */
> };
> 
> 
> 
> class CallJavaNode : public CallNode {
> public:
> 
> 	/* class CallNode            <ancestor>; */      /*     0   128 */
> protected:
> 
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	class ciMethod *           _method;              /*   128     8 */
> 	bool                       _optimized_virtual;   /*   136     1 */
> 	bool                       _method_handle_invoke; /*   137     1 */
> 	bool                       _override_symbolic_info; /*   138     1 */
> 	bool                       _arg_escape;          /*   139     1 */
> public:
> 
> protected:
> 
> public:
> 
> 
> 	/* size: 144, cachelines: 3, members: 6 */
> 	/* padding: 4 */
> 	/* last cacheline: 16 bytes */
> 
> 	/* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */
> };
> 
> 
> 
> class C2Access : public StackObj {
> public:
> 
> 	/* class StackObj            <ancestor>; */      /*     0     0 */
> 
> 	/* XXX last struct has 1 byte of padding */
> 
> 	int ()(void) * *           _vptr.C2Access;       /*     0     8 */
> protected:
> 
> 	DecoratorSet               _decorators;          /*     8  ...

Looks good to me too.

-------------

Marked as reviewed by thartmann (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19861#pullrequestreview-2198317310

From mbaesken at openjdk.org  Thu Jul 25 07:32:32 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Thu, 25 Jul 2024 07:32:32 GMT
Subject: RFR: 8333144: docker tests do not work when ubsan is configured
In-Reply-To: <6IlNzo_E9E-FWJrCiQZJRxD6rcTrBZ9pB86OjP0DzMU=.6562dd2f-eed5-4dd8-8fe7-b116e7932a3e@github.com>
References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
 <6IlNzo_E9E-FWJrCiQZJRxD6rcTrBZ9pB86OjP0DzMU=.6562dd2f-eed5-4dd8-8fe7-b116e7932a3e@github.com>
Message-ID: <hPNrVZ4OLxXqJFoSYb-rgaJI9buQomNkCUhpD6bM2JQ=.59e8a864-aa01-4c47-b01f-781acb275b74@github.com>

On Thu, 25 Jul 2024 01:54:33 GMT, David Holmes <dholmes at openjdk.org> wrote:

> what if ubsan is installed in the container and someone wants to run with it enabled there

We could also try to install the ubsan package into the test container, at least for the default container setup.
Do you prefer that ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2249647308

From ayang at openjdk.org  Thu Jul 25 07:44:45 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Thu, 25 Jul 2024 07:44:45 GMT
Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate [v3]
In-Reply-To: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com>
References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com>
Message-ID: <F7o8T0rJbkysjNnN-pcQuRNXYfvRM1EyHrPSKBVcQ0Q=.79de0f66-f206-4e2a-ba1c-9d0f06bd025e@github.com>

> Simple obsoleting a Parallel GC product flag.

Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision:

  copyright

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20299/files
  - new: https://git.openjdk.org/jdk/pull/20299/files/10720a6d..def4cff1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20299&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20299&range=01-02

  Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/20299.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20299/head:pull/20299

PR: https://git.openjdk.org/jdk/pull/20299

From mli at openjdk.org  Thu Jul 25 07:52:35 2024
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 25 Jul 2024 07:52:35 GMT
Subject: RFR: 8335191: RISC-V: verify perf of chacha20
In-Reply-To: <PBX_MsEj4hzAW23ya5c1lfHuabJAJD-mKQfhqbz9ZzY=.ae162303-7a4a-456f-9e86-8f3852d6b35c@github.com>
References: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com>
 <PBX_MsEj4hzAW23ya5c1lfHuabJAJD-mKQfhqbz9ZzY=.ae162303-7a4a-456f-9e86-8f3852d6b35c@github.com>
Message-ID: <kd3LsB0dmtwoCKHwd_k-UEaEU8Rs0Rhw4e5nEfYtmrk=.4291f5f9-c693-44a2-b4c1-12ad22963dfa@github.com>

On Thu, 25 Jul 2024 04:09:10 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> Hi,
>> Can you help to review this simple patch?
>> 
>> Previously, we implemented this intrinsic for chacha20 algo based on vector instructions, the latest test on real hardwares (k230, bananapi) shows that the implementation only bring more performance gain rather than regression when (vlenb == 32, on bananapi), when vlenb == 16 (on k230) it only bring regression in all test cases.
>> So, we should adjust when to turn on the intrinsic, ie. only when vlenb == 32.
>> 
>> Thanks
>> 
>> 
>> ## Performance
>> 
>> ### on k230
>> vlenb == 16
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark - on k230, vlenb == 16 | (dataSize) | (keyLength) | (mode) | (padding) | (permutation) | Cnt | Score -no-intrinsic | Score +intrinsic | Error | Units | Non-intrinsic/intrinsic
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4642.694 | 5216.699 | 36.039 | ns/op | 0.89
>> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15719.612 | 17622.616 | 136.609 | ns/op | 0.892
>> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 59402.742 | 67124.28 | 651.011 | ns/op | 0.885
>> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 250056.475 | 269184.924 | 8591.727 | ns/op | 0.929
>> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4752.081 | 5131.094 | 38.917 | ns/op | 0.926
>> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15554.484 | 16992.339 | 106.583 | ns/op | 0.915
>> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 61446.365 | 67359.67 | 548.353 | ns/op | 0.912
>> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 241653.654 | 270189.531 | 3705.045 | ns/op | 0.894
>> o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 256 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 17833.825 | 20610.118 | 688.668 | ns/op | 0.865
>> o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decry...
>
> Thanks for carrying out the test.

Thanks @RealFYang for your reviewing.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20298#issuecomment-2249676413

From mli at openjdk.org  Thu Jul 25 07:52:36 2024
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 25 Jul 2024 07:52:36 GMT
Subject: RFR: 8335191: RISC-V: verify perf of chacha20
In-Reply-To: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com>
References: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com>
Message-ID: <vxopaVYaEePjLH3lrz8Ce8qO545mRT8PxyVCVkiLK3k=.039fc662-1b1e-4073-a831-9fc0f34d13fa@github.com>

On Tue, 23 Jul 2024 11:21:31 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Hi,
> Can you help to review this simple patch?
> 
> Previously, we implemented this intrinsic for chacha20 algo based on vector instructions, the latest test on real hardwares (k230, bananapi) shows that the implementation only bring more performance gain rather than regression when (vlenb == 32, on bananapi), when vlenb == 16 (on k230) it only bring regression in all test cases.
> So, we should adjust when to turn on the intrinsic, ie. only when vlenb == 32.
> 
> Thanks
> 
> 
> ## Performance
> 
> ### on k230
> vlenb == 16
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
> Benchmark - on k230, vlenb == 16 | (dataSize) | (keyLength) | (mode) | (padding) | (permutation) | Cnt | Score -no-intrinsic | Score +intrinsic | Error | Units | Non-intrinsic/intrinsic
> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4642.694 | 5216.699 | 36.039 | ns/op | 0.89
> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15719.612 | 17622.616 | 136.609 | ns/op | 0.892
> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 59402.742 | 67124.28 | 651.011 | ns/op | 0.885
> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 250056.475 | 269184.924 | 8591.727 | ns/op | 0.929
> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4752.081 | 5131.094 | 38.917 | ns/op | 0.926
> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15554.484 | 16992.339 | 106.583 | ns/op | 0.915
> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 61446.365 | 67359.67 | 548.353 | ns/op | 0.912
> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 241653.654 | 270189.531 | 3705.045 | ns/op | 0.894
> o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 256 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 17833.825 | 20610.118 | 688.668 | ns/op | 0.865
> o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 1024 | 256 | None | NoPadding ChaC | ha20-Poly1...

As the change is minor and straight, I'll push it with one review.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20298#issuecomment-2249677415

From mli at openjdk.org  Thu Jul 25 07:52:36 2024
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 25 Jul 2024 07:52:36 GMT
Subject: Integrated: 8335191: RISC-V: verify perf of chacha20
In-Reply-To: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com>
References: <w9XXvU5jMWne42lO3SFUElmDRhEP28G2xS8qo0oATe8=.f3bd16e6-0ed8-4bd5-aced-d59dfadc571a@github.com>
Message-ID: <kSeuny690ejdLCLZxK2PYy1XLPPAMheb-YO0mLGyseI=.02eb4ea0-0888-499f-ae74-9bf64c1072b6@github.com>

On Tue, 23 Jul 2024 11:21:31 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Hi,
> Can you help to review this simple patch?
> 
> Previously, we implemented this intrinsic for chacha20 algo based on vector instructions, the latest test on real hardwares (k230, bananapi) shows that the implementation only bring more performance gain rather than regression when (vlenb == 32, on bananapi), when vlenb == 16 (on k230) it only bring regression in all test cases.
> So, we should adjust when to turn on the intrinsic, ie. only when vlenb == 32.
> 
> Thanks
> 
> 
> ## Performance
> 
> ### on k230
> vlenb == 16
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
> Benchmark - on k230, vlenb == 16 | (dataSize) | (keyLength) | (mode) | (padding) | (permutation) | Cnt | Score -no-intrinsic | Score +intrinsic | Error | Units | Non-intrinsic/intrinsic
> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4642.694 | 5216.699 | 36.039 | ns/op | 0.89
> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15719.612 | 17622.616 | 136.609 | ns/op | 0.892
> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 59402.742 | 67124.28 | 651.011 | ns/op | 0.885
> o.o.b.j.c.full.CipherBench.ChaCha20.decrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 250056.475 | 269184.924 | 8591.727 | ns/op | 0.929
> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 256 | 256 | None | NoPadding | ChaCha20 | 10 | 4752.081 | 5131.094 | 38.917 | ns/op | 0.926
> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 1024 | 256 | None | NoPadding | ChaCha20 | 10 | 15554.484 | 16992.339 | 106.583 | ns/op | 0.915
> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 4096 | 256 | None | NoPadding | ChaCha20 | 10 | 61446.365 | 67359.67 | 548.353 | ns/op | 0.912
> o.o.b.j.c.full.CipherBench.ChaCha20.encrypt | 16384 | 256 | None | NoPadding | ChaCha20 | 10 | 241653.654 | 270189.531 | 3705.045 | ns/op | 0.894
> o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 256 | 256 | None | NoPadding ChaC | ha20-Poly1305 | 10 | 17833.825 | 20610.118 | 688.668 | ns/op | 0.865
> o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt | 1024 | 256 | None | NoPadding ChaC | ha20-Poly1...

This pull request has now been integrated.

Changeset: 9d879186
Author:    Hamlin Li <mli at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/9d8791864ec48f3321707d7f7805cd3618fc3b51
Stats:     3 lines in 1 file changed: 2 ins; 0 del; 1 mod

8335191: RISC-V: verify perf of chacha20

Reviewed-by: fyang

-------------

PR: https://git.openjdk.org/jdk/pull/20298

From shade at openjdk.org  Thu Jul 25 08:33:31 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 25 Jul 2024 08:33:31 GMT
Subject: RFR: 8334230: Optimize C2 classes layout
In-Reply-To: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>
References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>
Message-ID: <xE7mDVrT8tLQxaDEmfmQUZT377ixCmnEVteB5paDt_4=.eacd0d55-c65a-4a7c-8d07-482bd8e704a8@github.com>

On Mon, 24 Jun 2024 15:53:24 GMT, Neethu Prasad <nprasad at openjdk.org> wrote:

> **Notes**
> 
> Rearrange C2 class fields to optimize footprint.
> 
> 
> **Verification**
> 
> 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests.
> 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes.
> 
> | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding |
> | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- |
> | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 ->  0 | 11 -> 0  | 56 bytes -> 48 | 0 -> 3 |
> | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 ->  0 | 5 -> 0  | 24 bytes -> 16 | 7 -> 4 |
> | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 ->  0 | 7 -> 0  | 56 bytes -> 48 | 7 -> 6 |
> | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 ->  0 | 8 -> 0  | 32 bytes -> 24 | 1 -> 1 |
> 
> class ArrayPointer {
> 	const class Node  *        _pointer;             /*     0     8 */
> 	const class Node  *        _base;                /*     8     8 */
> 	const jlong                _constant_offset;     /*    16     8 */
> 	const class Node  *        _int_offset;          /*    24     8 */
> 	const class GrowableArray<Node*>  * _other_offsets; /*    32     8 */
> 	const jint                 _int_offset_shift;    /*    40     4 */
> 	const bool                 _is_valid;            /*    44     1 */
> public:
> 
> 
> 	/* size: 48, cachelines: 1, members: 7 */
> 	/* padding: 3 */
> 	/* last cacheline: 48 bytes */
> };
> 
> 
> 
> class CallJavaNode : public CallNode {
> public:
> 
> 	/* class CallNode            <ancestor>; */      /*     0   128 */
> protected:
> 
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	class ciMethod *           _method;              /*   128     8 */
> 	bool                       _optimized_virtual;   /*   136     1 */
> 	bool                       _method_handle_invoke; /*   137     1 */
> 	bool                       _override_symbolic_info; /*   138     1 */
> 	bool                       _arg_escape;          /*   139     1 */
> public:
> 
> protected:
> 
> public:
> 
> 
> 	/* size: 144, cachelines: 3, members: 6 */
> 	/* padding: 4 */
> 	/* last cacheline: 16 bytes */
> 
> 	/* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */
> };
> 
> 
> 
> class C2Access : public StackObj {
> public:
> 
> 	/* class StackObj            <ancestor>; */      /*     0     0 */
> 
> 	/* XXX last struct has 1 byte of padding */
> 
> 	int ()(void) * *           _vptr.C2Access;       /*     0     8 */
> protected:
> 
> 	DecoratorSet               _decorators;          /*     8  ...

Looks fine, but I think we want to keep argument list in current order. Unless there is a good reason to change it, and I just don't see it?

src/hotspot/share/gc/shared/c2/barrierSetC2.hpp line 115:

> 113: public:
> 114:   C2Access(DecoratorSet decorators,
> 115:            Node* base, C2AccessValuePtr& addr, BasicType type) :

I think it would be cleaner to leave the argument order alone here, and only change the field order. This would guarantee we do not change anything in APIs, which simplifies future changes and backports.

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19861#pullrequestreview-2198664777
PR Review Comment: https://git.openjdk.org/jdk/pull/19861#discussion_r1691054895

From shade at openjdk.org  Thu Jul 25 08:34:34 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 25 Jul 2024 08:34:34 GMT
Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update
 mode [v2]
In-Reply-To: <iExDnHB_1WSKaVnW8g2usSSiQUTMMQuGuc691noaUnA=.792236b9-a939-43c7-aa55-c200bf5e7d86@github.com>
References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
 <kkDjLgSV3zILB-gIvYB-JrYVowr9zOFEQEZoicZrKB0=.5e1d86ce-cef5-4255-a887-3ffdd0f2b7c2@github.com>
 <iExDnHB_1WSKaVnW8g2usSSiQUTMMQuGuc691noaUnA=.792236b9-a939-43c7-aa55-c200bf5e7d86@github.com>
Message-ID: <Vyt0QUR0Atgv1Jud_L1dNXMCYezne6EmBGmfq8OfqSo=.cbc7d46f-aaed-40bd-8890-18f431caa0e1@github.com>

On Wed, 24 Jul 2024 19:25:24 GMT, William Kemper <wkemper at openjdk.org> wrote:

>> src/hotspot/cpu/ppc/gc/shenandoah/shenandoahBarrierSetAssembler_ppc.cpp line 571:
>> 
>>> 569:   /* ==== Apply keep-alive barrier, if required (e.g., to inhibit weak reference resurrection) ==== */
>>> 570:   if (ShenandoahBarrierSet::need_keep_alive_barrier(decorators, type)) {
>>> 571:     if (ShenandoahSATBBarrier) {
>> 
>> A bit weird to replace IU with SATB barrier here.
>
> @shipilev [The original code](https://github.com/openjdk/jdk/pull/20316/files#diff-cb01b36a8c7017c9e21645a0ff9075897e5bfa67ae37d4f0d69ccc582656ec31L71) used the function `iu_barrier` to emit the pre-write barrier for both SATB and IU modes. This only happened in the `ppc` port. Other platforms just invoke the function to emit the `satb_write_barrier` directly here (see [x86](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp#L580), for example).
> 
> @kdnilsen No, `need_keep_alive_barrier` may be true in SATB mode. The use of the barrier here is to make sure weak references that get loaded during mark are added to the SATB buffer.

Oh, okay. So that weirdness is pre-existing, fine.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1691057726

From fgao at openjdk.org  Thu Jul 25 08:46:32 2024
From: fgao at openjdk.org (Fei Gao)
Date: Thu, 25 Jul 2024 08:46:32 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
Message-ID: <ULCrYK98jgJuZnte0wUwt15lPOUS1gdnyiIF2EfpMUo=.a6f337ac-9736-4b81-b37e-7e04940e9040@github.com>

On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote:

> In the cases like:
> 
>   UNSAFE.putLong(address + off1 + 1030, lseed);
>   UNSAFE.putLong(address + 1023, lseed);
>   UNSAFE.putLong(address + off2 + 1001, lseed);
> 
> 
> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like:
> 
>   ldr  R10, [R15, #120]    # int ! Field: address
>   ldr  R11, [R16, #136]    # int ! Field: off1
>   ldr  R12, [R16, #144]    # int ! Field: off2
>   add  R11, R11, R10
>   mov R11, R11    # long -> ptr
>   add  R12, R12, R10
>   mov R10, R10    # long -> ptr
>   add R11, R11, #1030    # ptr
>   str  R17, [R11]    # int
>   add R10, R10, #1023    # ptr
>   str  R17, [R10]    # int
>   mov R10, R12    # long -> ptr
>   add R10, R10, #1001    # ptr
>   str  R17, [R10]    # int
> 
> 
> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly:
> 
>   ldr    x10, [x15,#120]
>   ldp    x11, x12, [x16,#136]
>   add    x11, x11, x10
>   add    x12, x12, x10
>   add    x11, x11, #0x406
>   str    x17, [x11]
>   add    x10, x10, #0x3ff
>   str    x17, [x10]
>   mov    x10, x12  <--- extra register copy
>   add    x10, x10, #0x3e9
>   str    x17, [x10]
> 
> 
> There is still one extra register copy, which we're trying to remove in this patch.
> 
> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition.
> 
> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so
> 
> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906

Can I have a second review please :-)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2249795560

From chagedorn at openjdk.org  Thu Jul 25 08:58:34 2024
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Thu, 25 Jul 2024 08:58:34 GMT
Subject: RFR: 8336999: Verification for resource area allocated data
 structures in C2
In-Reply-To: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
Message-ID: <WKPBNw86vLLxMZ2PpLH0zqvzVFf1QIIG4YG42PAKPx4=.7cd8981b-17da-4e1d-8ad9-2b516943dbe5@github.com>

On Wed, 24 Jul 2024 10:29:32 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

> Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption.
> 
> This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime.
> 
> While testing, I hit the verification code from:
> 
> V  [libjvm.so+0x5c1ceb]  ReallocMark::check(Arena*)+0x7b  (allocation.cpp:244)
> V  [libjvm.so+0x6df2da]  Block_Array::grow(unsigned int)+0x1a  (block.cpp:43)
> V  [libjvm.so+0xb88679]  PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159  (block.hpp:72)
> V  [libjvm.so+0xb88b6b]  PhaseCFG::build_dominator_tree()+0xab  (domgraph.cpp:74)
> V  [libjvm.so+0xd75791]  PhaseCFG::do_global_code_motion()+0x11  (gcm.cpp:1635)
> V  [libjvm.so+0x9f4fd4]  Compile::Code_Gen()+0x2a4  (compile.cpp:2949)
> V  [libjvm.so+0x9f5f16]  Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6  (compile.cpp:991)
> 
> 
> It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing.
> 
> This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`.
> 
> We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015).
> 
> Thanks,
> Tobias

Nice verification! Only one minor thing, otherwise, looks good to me, too.

src/hotspot/share/opto/block.cpp line 46:

> 44:     return; // No need to grow
> 45:   }
> 46:   assert(i >= Max(), "must be an overflow");

The assert is now not necessary anymore. I guess you can remove it.

Suggestion:

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20311#pullrequestreview-2198721155
PR Review Comment: https://git.openjdk.org/jdk/pull/20311#discussion_r1691090588

From shade at openjdk.org  Thu Jul 25 09:07:33 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 25 Jul 2024 09:07:33 GMT
Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update
 mode [v2]
In-Reply-To: <03bSRAN8T28AU2-M4IzjsBygTwG4SHrc8HUIJYLM5TE=.e4299a87-b25f-471b-9f6e-2c08e741c6f2@github.com>
References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
 <03bSRAN8T28AU2-M4IzjsBygTwG4SHrc8HUIJYLM5TE=.e4299a87-b25f-471b-9f6e-2c08e741c6f2@github.com>
Message-ID: <sGgIbbq6E71rOKNm-riTf5bbae_3igRFJFt3e7JR4oA=.b70ed09c-8029-4e05-82c9-871dc4a82f85@github.com>

On Wed, 24 Jul 2024 19:31:04 GMT, William Kemper <wkemper at openjdk.org> wrote:

>> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development.
>> 
>> ## Testing
>> * hotspot_gc_shenandoah
>> * dacapo
>> * diluvian
>> * extremem
>> * hyperalloc
>> * specjbb2015
>> * specjvm2008
>
> William Kemper has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove unintentional new line

I like this. Consider another thing to clean up:

src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 122:

> 120: 
> 121:       ShenandoahMarkRefsClosure<GENERATION> mark_cl(q, rp);
> 122:       ShenandoahSATBAndRemarkThreadsClosure tc(satb_mq_set, nullptr);

Looks like `ShenandoahSATBAndRemarkThreadsClosure` can be considerably simplified, now that we do not pass any closure to it.

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20316#pullrequestreview-2198733772
PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1691098698

From thartmann at openjdk.org  Thu Jul 25 09:13:08 2024
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Thu, 25 Jul 2024 09:13:08 GMT
Subject: RFR: 8336999: Verification for resource area allocated data
 structures in C2
In-Reply-To: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
Message-ID: <S8wy4l_RqxDyolZucDg6cQXaA1jNc9ECij35oVEtl0E=.d4bc4ef0-2ca5-4899-af0c-57812dd76112@github.com>

On Wed, 24 Jul 2024 10:29:32 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

> Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption.
> 
> This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime.
> 
> While testing, I hit the verification code from:
> 
> V  [libjvm.so+0x5c1ceb]  ReallocMark::check(Arena*)+0x7b  (allocation.cpp:244)
> V  [libjvm.so+0x6df2da]  Block_Array::grow(unsigned int)+0x1a  (block.cpp:43)
> V  [libjvm.so+0xb88679]  PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159  (block.hpp:72)
> V  [libjvm.so+0xb88b6b]  PhaseCFG::build_dominator_tree()+0xab  (domgraph.cpp:74)
> V  [libjvm.so+0xd75791]  PhaseCFG::do_global_code_motion()+0x11  (gcm.cpp:1635)
> V  [libjvm.so+0x9f4fd4]  Compile::Code_Gen()+0x2a4  (compile.cpp:2949)
> V  [libjvm.so+0x9f5f16]  Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6  (compile.cpp:991)
> 
> 
> It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing.
> 
> This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`.
> 
> We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015).
> 
> Thanks,
> Tobias

Thanks for the review, Christian! Updated.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20311#issuecomment-2249844785

From thartmann at openjdk.org  Thu Jul 25 09:13:07 2024
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Thu, 25 Jul 2024 09:13:07 GMT
Subject: RFR: 8336999: Verification for resource area allocated data
 structures in C2 [v2]
In-Reply-To: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
Message-ID: <oZKf2g2aJmibrx_nxm8YCuYH1QafKzeuy1FrSLmtglU=.02b61561-b686-4cc4-860c-e5e7c006913e@github.com>

> Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption.
> 
> This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime.
> 
> While testing, I hit the verification code from:
> 
> V  [libjvm.so+0x5c1ceb]  ReallocMark::check(Arena*)+0x7b  (allocation.cpp:244)
> V  [libjvm.so+0x6df2da]  Block_Array::grow(unsigned int)+0x1a  (block.cpp:43)
> V  [libjvm.so+0xb88679]  PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159  (block.hpp:72)
> V  [libjvm.so+0xb88b6b]  PhaseCFG::build_dominator_tree()+0xab  (domgraph.cpp:74)
> V  [libjvm.so+0xd75791]  PhaseCFG::do_global_code_motion()+0x11  (gcm.cpp:1635)
> V  [libjvm.so+0x9f4fd4]  Compile::Code_Gen()+0x2a4  (compile.cpp:2949)
> V  [libjvm.so+0x9f5f16]  Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6  (compile.cpp:991)
> 
> 
> It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing.
> 
> This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`.
> 
> We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015).
> 
> Thanks,
> Tobias

Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision:

  Update src/hotspot/share/opto/block.cpp
  
  Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20311/files
  - new: https://git.openjdk.org/jdk/pull/20311/files/b0f839b8..391dd920

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20311&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20311&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20311.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20311/head:pull/20311

PR: https://git.openjdk.org/jdk/pull/20311

From adinn at openjdk.org  Thu Jul 25 09:40:32 2024
From: adinn at openjdk.org (Andrew Dinn)
Date: Thu, 25 Jul 2024 09:40:32 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
Message-ID: <U17GOjlTnzMqNfLCJ3TOFe6qyJbFip49tCMooOR0V94=.ad5ec6c9-6401-4859-8a3c-e9ad917bd54a@github.com>

On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote:

> In the cases like:
> 
>   UNSAFE.putLong(address + off1 + 1030, lseed);
>   UNSAFE.putLong(address + 1023, lseed);
>   UNSAFE.putLong(address + off2 + 1001, lseed);
> 
> 
> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like:
> 
>   ldr  R10, [R15, #120]    # int ! Field: address
>   ldr  R11, [R16, #136]    # int ! Field: off1
>   ldr  R12, [R16, #144]    # int ! Field: off2
>   add  R11, R11, R10
>   mov R11, R11    # long -> ptr
>   add  R12, R12, R10
>   mov R10, R10    # long -> ptr
>   add R11, R11, #1030    # ptr
>   str  R17, [R11]    # int
>   add R10, R10, #1023    # ptr
>   str  R17, [R10]    # int
>   mov R10, R12    # long -> ptr
>   add R10, R10, #1001    # ptr
>   str  R17, [R10]    # int
> 
> 
> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly:
> 
>   ldr    x10, [x15,#120]
>   ldp    x11, x12, [x16,#136]
>   add    x11, x11, x10
>   add    x12, x12, x10
>   add    x11, x11, #0x406
>   str    x17, [x11]
>   add    x10, x10, #0x3ff
>   str    x17, [x10]
>   mov    x10, x12  <--- extra register copy
>   add    x10, x10, #0x3e9
>   str    x17, [x10]
> 
> 
> There is still one extra register copy, which we're trying to remove in this patch.
> 
> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition.
> 
> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so
> 
> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906

src/hotspot/cpu/aarch64/aarch64.ad line 4235:

> 4233: operand immLOffset()
> 4234: %{
> 4235:   predicate(n->get_long() >= -256 && n->get_long() <= 65520);

Why is this using hard wired constants rather than using Address::offset_ok_for_immed?

Also, why is the constant value 65520?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1691150336

From aph at openjdk.org  Thu Jul 25 09:56:32 2024
From: aph at openjdk.org (Andrew Haley)
Date: Thu, 25 Jul 2024 09:56:32 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <U17GOjlTnzMqNfLCJ3TOFe6qyJbFip49tCMooOR0V94=.ad5ec6c9-6401-4859-8a3c-e9ad917bd54a@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
 <U17GOjlTnzMqNfLCJ3TOFe6qyJbFip49tCMooOR0V94=.ad5ec6c9-6401-4859-8a3c-e9ad917bd54a@github.com>
Message-ID: <zEMJ521TvbtgVoiwHOW8dBvMw2_BzkRQ9g6H2rZafUc=.95f660a0-c710-41f2-a692-71368ce11865@github.com>

On Thu, 25 Jul 2024 09:37:42 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

>> In the cases like:
>> 
>>   UNSAFE.putLong(address + off1 + 1030, lseed);
>>   UNSAFE.putLong(address + 1023, lseed);
>>   UNSAFE.putLong(address + off2 + 1001, lseed);
>> 
>> 
>> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like:
>> 
>>   ldr  R10, [R15, #120]    # int ! Field: address
>>   ldr  R11, [R16, #136]    # int ! Field: off1
>>   ldr  R12, [R16, #144]    # int ! Field: off2
>>   add  R11, R11, R10
>>   mov R11, R11    # long -> ptr
>>   add  R12, R12, R10
>>   mov R10, R10    # long -> ptr
>>   add R11, R11, #1030    # ptr
>>   str  R17, [R11]    # int
>>   add R10, R10, #1023    # ptr
>>   str  R17, [R10]    # int
>>   mov R10, R12    # long -> ptr
>>   add R10, R10, #1001    # ptr
>>   str  R17, [R10]    # int
>> 
>> 
>> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly:
>> 
>>   ldr    x10, [x15,#120]
>>   ldp    x11, x12, [x16,#136]
>>   add    x11, x11, x10
>>   add    x12, x12, x10
>>   add    x11, x11, #0x406
>>   str    x17, [x11]
>>   add    x10, x10, #0x3ff
>>   str    x17, [x10]
>>   mov    x10, x12  <--- extra register copy
>>   add    x10, x10, #0x3e9
>>   str    x17, [x10]
>> 
>> 
>> There is still one extra register copy, which we're trying to remove in this patch.
>> 
>> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition.
>> 
>> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so
>> 
>> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906
>
> src/hotspot/cpu/aarch64/aarch64.ad line 4235:
> 
>> 4233: operand immLOffset()
>> 4234: %{
>> 4235:   predicate(n->get_long() >= -256 && n->get_long() <= 65520);
> 
> Why is this using hard wired constants rather than using Address::offset_ok_for_immed?
> 
> Also, why is the constant value 65520?

I think `Address::offset_ok_for_immed` is too restrictive: we want a predicate that is the superset of all possible address offsets.


jshell> ((1<<12)-1) <<4
$3 ==> 65520

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1691171119

From chagedorn at openjdk.org  Thu Jul 25 10:48:33 2024
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Thu, 25 Jul 2024 10:48:33 GMT
Subject: RFR: 8336999: Verification for resource area allocated data
 structures in C2 [v2]
In-Reply-To: <oZKf2g2aJmibrx_nxm8YCuYH1QafKzeuy1FrSLmtglU=.02b61561-b686-4cc4-860c-e5e7c006913e@github.com>
References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
 <oZKf2g2aJmibrx_nxm8YCuYH1QafKzeuy1FrSLmtglU=.02b61561-b686-4cc4-860c-e5e7c006913e@github.com>
Message-ID: <3htt-59G3IbpUxCqnYSaj-gkhcoswta47VVQqj-0wzQ=.f46427af-470d-4094-bb36-744e299ded7d@github.com>

On Thu, 25 Jul 2024 09:13:07 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

>> Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption.
>> 
>> This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime.
>> 
>> While testing, I hit the verification code from:
>> 
>> V  [libjvm.so+0x5c1ceb]  ReallocMark::check(Arena*)+0x7b  (allocation.cpp:244)
>> V  [libjvm.so+0x6df2da]  Block_Array::grow(unsigned int)+0x1a  (block.cpp:43)
>> V  [libjvm.so+0xb88679]  PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159  (block.hpp:72)
>> V  [libjvm.so+0xb88b6b]  PhaseCFG::build_dominator_tree()+0xab  (domgraph.cpp:74)
>> V  [libjvm.so+0xd75791]  PhaseCFG::do_global_code_motion()+0x11  (gcm.cpp:1635)
>> V  [libjvm.so+0x9f4fd4]  Compile::Code_Gen()+0x2a4  (compile.cpp:2949)
>> V  [libjvm.so+0x9f5f16]  Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6  (compile.cpp:991)
>> 
>> 
>> It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing.
>> 
>> This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`.
>> 
>> We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015).
>> 
>> Thanks,
>> Tobias
>
> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update src/hotspot/share/opto/block.cpp
>   
>   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>

Still good!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20311#pullrequestreview-2198961385

From adinn at openjdk.org  Thu Jul 25 10:50:31 2024
From: adinn at openjdk.org (Andrew Dinn)
Date: Thu, 25 Jul 2024 10:50:31 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <zEMJ521TvbtgVoiwHOW8dBvMw2_BzkRQ9g6H2rZafUc=.95f660a0-c710-41f2-a692-71368ce11865@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
 <U17GOjlTnzMqNfLCJ3TOFe6qyJbFip49tCMooOR0V94=.ad5ec6c9-6401-4859-8a3c-e9ad917bd54a@github.com>
 <zEMJ521TvbtgVoiwHOW8dBvMw2_BzkRQ9g6H2rZafUc=.95f660a0-c710-41f2-a692-71368ce11865@github.com>
Message-ID: <1n3zHLxSaMMYy7ViMvIvA0Dpo7LA7rOjY2ZKTNtp3xU=.446b0858-ec07-47df-985b-2cd8956974ff@github.com>

On Thu, 25 Jul 2024 09:53:45 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/aarch64.ad line 4235:
>> 
>>> 4233: operand immLOffset()
>>> 4234: %{
>>> 4235:   predicate(n->get_long() >= -256 && n->get_long() <= 65520);
>> 
>> Why is this using hard wired constants rather than using Address::offset_ok_for_immed?
>> 
>> Also, why is the constant value 65520?
>
> I think `Address::offset_ok_for_immed` is too restrictive: we want a predicate that is the superset of all possible address offsets.
> 
> 
> jshell> ((1<<12)-1) <<4
> $3 ==> 65520

Yes, I realise that this is 16 less than 65536. However, there are two things I don't follow.

In the original code immLoffset was only used to define indOffLN i.e. a long offset used with a narrow pointer. The use of Address::offset_ok_for_immed(n->get_long(), 0) in the predicate limited narrow pointer offsets to -256 <= offset <= (2^12 - 1). With this change the top end of the range is now (2^12 - 1) << 4. I am wondering why that is appropriate?

The change allows immLOffset to be used in the definition of indOffX2P. I am not clear why indOffX2P is not just defined using the existing operand immLoffset16 which has as its predicate Address::offset_ok_for_immed(n->get_long(), 4). The only difference I can see is that the alternative predicate used here will accept a positive offset that is not 16 byte aligned. Is that the intention of the redefinition? Again, why is that appropriate?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1691242965

From thartmann at openjdk.org  Thu Jul 25 10:56:34 2024
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Thu, 25 Jul 2024 10:56:34 GMT
Subject: RFR: 8336999: Verification for resource area allocated data
 structures in C2 [v2]
In-Reply-To: <oZKf2g2aJmibrx_nxm8YCuYH1QafKzeuy1FrSLmtglU=.02b61561-b686-4cc4-860c-e5e7c006913e@github.com>
References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
 <oZKf2g2aJmibrx_nxm8YCuYH1QafKzeuy1FrSLmtglU=.02b61561-b686-4cc4-860c-e5e7c006913e@github.com>
Message-ID: <eeqOET7dIVI01B15yLxmKiRSI7YfZZnf4BmZTTumv5U=.f88064de-b17b-4e51-9fbd-38fd68831f97@github.com>

On Thu, 25 Jul 2024 09:13:07 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

>> Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption.
>> 
>> This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime.
>> 
>> While testing, I hit the verification code from:
>> 
>> V  [libjvm.so+0x5c1ceb]  ReallocMark::check(Arena*)+0x7b  (allocation.cpp:244)
>> V  [libjvm.so+0x6df2da]  Block_Array::grow(unsigned int)+0x1a  (block.cpp:43)
>> V  [libjvm.so+0xb88679]  PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159  (block.hpp:72)
>> V  [libjvm.so+0xb88b6b]  PhaseCFG::build_dominator_tree()+0xab  (domgraph.cpp:74)
>> V  [libjvm.so+0xd75791]  PhaseCFG::do_global_code_motion()+0x11  (gcm.cpp:1635)
>> V  [libjvm.so+0x9f4fd4]  Compile::Code_Gen()+0x2a4  (compile.cpp:2949)
>> V  [libjvm.so+0x9f5f16]  Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6  (compile.cpp:991)
>> 
>> 
>> It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing.
>> 
>> This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`.
>> 
>> We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015).
>> 
>> Thanks,
>> Tobias
>
> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update src/hotspot/share/opto/block.cpp
>   
>   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>

Thank you!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20311#issuecomment-2250043897

From kevinw at openjdk.org  Thu Jul 25 10:57:36 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Thu, 25 Jul 2024 10:57:36 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v9]
In-Reply-To: <8_dPuH2noHgNOFKzsBke96yBSdGoTwhBl0-pyXaoDhA=.e638cdb0-2ea1-42ca-bd8b-88eaf2b719ac@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com>
 <uZeEnxjF6MkDWWOYSdfSUsP9VHParHx7dRweSXjaeM0=.3f3654c8-bf79-4cf3-88de-ba5530276cd9@github.com>
 <8_dPuH2noHgNOFKzsBke96yBSdGoTwhBl0-pyXaoDhA=.e638cdb0-2ea1-42ca-bd8b-88eaf2b719ac@github.com>
Message-ID: <vgVYMZVc-wpKVEGoCZgsYBvA7fwJ-0TUIWT2g5BAYj4=.3d52cdbb-57a6-4dc2-823e-3c829142a646@github.com>

On Wed, 24 Jul 2024 18:36:44 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

> ...made an update to cover that invocation.

Thanks for having only one DEFAULT_PERFMAP_FILENAME definition.  It could be wrapped with #ifdef LINUX like it was before.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2250043656

From kevinw at openjdk.org  Thu Jul 25 10:57:37 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Thu, 25 Jul 2024 10:57:37 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v12]
In-Reply-To: <Hj_ISn8I6TozxRCyy2AqQs-F1rnETzYwqMmme-ih87M=.8a7b1a72-ae81-4d00-a6ac-11fa17ec978e@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <Hj_ISn8I6TozxRCyy2AqQs-F1rnETzYwqMmme-ih87M=.8a7b1a72-ae81-4d00-a6ac-11fa17ec978e@github.com>
Message-ID: <NhUgfbKGUCOEIf8yU0cpWLePjPuxTTNC8klTO2_rX28=.c3815881-6491-4d04-ae79-a3c98ef9158b@github.com>

On Wed, 24 Jul 2024 18:40:16 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adding default perfmap filename when invoked outside of diagnostic command

Do we need to update all the initialisations to set _filename members to type "FILE" ?
e.g. HeapDumpDCmd: there is still _filename("filename","Name of the dump file", "STRING",true),

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2250044890

From kevinw at openjdk.org  Thu Jul 25 11:40:35 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Thu, 25 Jul 2024 11:40:35 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v9]
In-Reply-To: <tkO2QN5Nzk3njsiyCgolhdy7fzZ26PfDHe44LK3vUf8=.33b8e759-8725-4333-93ae-9d2a14c523b5@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <ASx5pXkZUT9ZmH7duwX5AhSsKC6HhUvhauP_qnvYcZE=.8abdc5f3-39b1-4247-b6a3-2d05a68db4f8@github.com>
 <uZeEnxjF6MkDWWOYSdfSUsP9VHParHx7dRweSXjaeM0=.3f3654c8-bf79-4cf3-88de-ba5530276cd9@github.com>
 <tkO2QN5Nzk3njsiyCgolhdy7fzZ26PfDHe44LK3vUf8=.33b8e759-8725-4333-93ae-9d2a14c523b5@github.com>
Message-ID: <m4FyXjoRzxHMR62x8z4u8dylCkZUjnc1GrXU-3KwDcU=.8874c13a-aaea-44f0-81f2-bc6221831bd1@github.com>

On Wed, 24 Jul 2024 17:54:30 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> src/hotspot/share/services/diagnosticArgument.hpp line 65:
>> 
>>> 63: class FileArgument {
>>> 64:   private:
>>> 65:     char _name[1024];
>> 
>> Probably JVM_MAXPATHLEN (which might also be 1024).
>
> Hi, I avoided JVM_MAXPATHLEN because of this comment https://github.com/openjdk/jdk/pull/20198#discussion_r1685297940

It seems strange to me to NOT use MAXPATHLEN (or JVM_MAXPATHLEN), in this one particular place, based on if somebody rebuilds the JDK on a system where it is defined to be very very long, then there would be some unnecessarily large allocations.

There are approx 140 other uses.

If JVM_MAXPATHLEN is 4k, we are saying that those other usages reserve the 4k, but this particular path should max out at 1024 bytes?

Given common cloud paths and even in our test systems, paths are commonly nearly 400 bytes, so 1024 is not that much spare capacity.

I don't want to contradict @tstuefe too much, and it's not make or break for this change, but I would think just go with the standard max path len used everywhere else.  If there's a problem with memory bloat, then hardcoding one of the usages isn't really going to help much.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1691299887

From kevinw at openjdk.org  Thu Jul 25 13:13:34 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Thu, 25 Jul 2024 13:13:34 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v12]
In-Reply-To: <NhUgfbKGUCOEIf8yU0cpWLePjPuxTTNC8klTO2_rX28=.c3815881-6491-4d04-ae79-a3c98ef9158b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <Hj_ISn8I6TozxRCyy2AqQs-F1rnETzYwqMmme-ih87M=.8a7b1a72-ae81-4d00-a6ac-11fa17ec978e@github.com>
 <NhUgfbKGUCOEIf8yU0cpWLePjPuxTTNC8klTO2_rX28=.c3815881-6491-4d04-ae79-a3c98ef9158b@github.com>
Message-ID: <Ip0rmvmbwcSZztOHUHePXBv4Z7ZEFFBaCTSpKCmCeps=.79dd4028-71a4-4e22-8319-0fd04e6aba68@github.com>

On Thu, 25 Jul 2024 10:54:43 GMT, Kevin Walls <kevinw at openjdk.org> wrote:

> Do we need to update all the initialisations to set _filename members to type "FILE" ?

I checked, no we don't NEED to change them.  We can, it works either way.  It does affect the help output. e.g.

Arguments:
        filepath :  The file path to the output file (FILE, no default value)
		
...which would be good as it's a  way of telling people these are FILEs therefore %p is interpreted, rather than just a STRING.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2250289811

From mbaesken at openjdk.org  Thu Jul 25 13:39:33 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Thu, 25 Jul 2024 13:39:33 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap' [v3]
In-Reply-To: <t1g-4dP38_LQWzBPFLqZlsHaDKKrLBc_4LzYSuH_Sc8=.8f4f92c1-2e49-40e0-88b1-ca1c37c1ec70@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
 <kjTvU8mJqos12UWZcYqG16iDGRtAWrrpweJNCHmZGf0=.d6f0ff54-bf40-4ff9-a16c-438def9a435f@github.com>
 <t1g-4dP38_LQWzBPFLqZlsHaDKKrLBc_4LzYSuH_Sc8=.8f4f92c1-2e49-40e0-88b1-ca1c37c1ec70@github.com>
Message-ID: <MJy7wKbRFXgZUci6hFRE-3RoJ3z1MXhzcUSRU9rRzH8=.da00e8d6-68a7-4565-bf6a-42d00fe3bb9f@github.com>

On Wed, 24 Jul 2024 18:59:34 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Here's an untested change that I think will fix the problem. https://github.com/openjdk/jdk/compare/master...kimbarrett:openjdk-jdk:smallregmap?expand=1

Seems this works well,  at least :tier1 tests on some platforms  (x86_64, ppc64le) look okay to me.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2250344053

From aph at openjdk.org  Thu Jul 25 13:41:33 2024
From: aph at openjdk.org (Andrew Haley)
Date: Thu, 25 Jul 2024 13:41:33 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <1n3zHLxSaMMYy7ViMvIvA0Dpo7LA7rOjY2ZKTNtp3xU=.446b0858-ec07-47df-985b-2cd8956974ff@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
 <U17GOjlTnzMqNfLCJ3TOFe6qyJbFip49tCMooOR0V94=.ad5ec6c9-6401-4859-8a3c-e9ad917bd54a@github.com>
 <zEMJ521TvbtgVoiwHOW8dBvMw2_BzkRQ9g6H2rZafUc=.95f660a0-c710-41f2-a692-71368ce11865@github.com>
 <1n3zHLxSaMMYy7ViMvIvA0Dpo7LA7rOjY2ZKTNtp3xU=.446b0858-ec07-47df-985b-2cd8956974ff@github.com>
Message-ID: <LrbAv5XoGQuXuZiReg9oJ6hpqq_Ip0wO1VN5bwNWZSA=.43a0e39d-89c1-4ca5-9ad4-fb3208fb3fb6@github.com>

On Thu, 25 Jul 2024 10:47:41 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

> The change allows immLOffset to be used in the definition of indOffX2P. I am not clear why indOffX2P is not just defined using the existing operand immLoffset16 which has as its predicate Address::offset_ok_for_immed(n->get_long(), 4).

After this change, `immLOffset` is a more general-purpose type than `immLoffset16`.  `immLOffset` matches all possible address offsets, along with some impossible ones. For example, it matches all of the misaligned `Unsafe` accesses at any offset, regardless of operand size. In the (rare) event that an operand size and offset don't fit a single instruction, we'll split the instruction when we emit it.

After this patch there will be a few rare cases where we have a regression in code size, but it's worth it for the simplicity and the size of the matcher logic, which would otherwise explode. I don't expect any significant regression in execution time.

This patch is not the last word on the matter; later patches may well further reduce the number of integer offset types in a similar way. I don't think that many of the offsetL/I/X/P types do anything useful, and we'd probably profit from removing them, but that's another patch for anther day.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1691462027

From mbaesken at openjdk.org  Thu Jul 25 13:42:48 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Thu, 25 Jul 2024 13:42:48 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap' [v4]
In-Reply-To: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
Message-ID: <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com>

> When running with ubsan - enabled binaries, some tests trigger the following report :
> 
> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
>     #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91
>     #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106
>     #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157
>     #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63
>     #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85
>     #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319
>     #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233
>     #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199
> 
> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr .

Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:

  add patch of Kim Barrett

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20296/files
  - new: https://git.openjdk.org/jdk/pull/20296/files/390a2176..b6f5dcfa

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20296&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20296&range=02-03

  Stats: 133 lines in 11 files changed: 50 ins; 66 del; 17 mod
  Patch: https://git.openjdk.org/jdk/pull/20296.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20296/head:pull/20296

PR: https://git.openjdk.org/jdk/pull/20296

From aph at openjdk.org  Thu Jul 25 13:59:35 2024
From: aph at openjdk.org (Andrew Haley)
Date: Thu, 25 Jul 2024 13:59:35 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v6]
In-Reply-To: <vw5vWKYgk45g7I9Yio_NTYLSL9fz3y6ptFHJyGNZJCE=.bb3c7d4e-9a5e-4c53-80b7-853dc74a611c@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com>
 <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com>
 <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com>
 <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com>
 <UAjH__AKdU3UMdJBkg7TlElKSA8mEFFE0MiElVrYexE=.4bc67a26-3383-4e4e-92b0-f1d3d33c5ce2@github.com>
 <M5xQ14pzHdBEr7yAdAqIVUsY_o8tXUgN9HpKxjkZznw=.f2262137-2fec-4297-ae1e-89b11874266f@github.com>
 <YxBy1Mx7Di5EDfJkCTfcaIuTzCv5KdzBzKMcE3iIeak=.2a56f436-8e14-4a22-a85d-cd06209e2c01@github.com>
 <vw5vWKYgk45g7I9Yio_NTYLSL9fz3y6ptFHJyGNZJCE=.bb3c7d4e-9a5e-4c53-80b7-853dc74a611c@github.com>
Message-ID: <RDhgVrQ6zzzaCBuvMVR7tKA2Qe1SwF2pktr5xcI_duE=.9ae8f5e8-f888-47b5-b979-e56692b278f6@github.com>

On Wed, 24 Jul 2024 19:09:06 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>>> > Also also, Klass::is_subtype_of() is used for C1 runtime.
>>> 
>>> Can you elaborate, please?
>> 
>> Sorry, that was rather vague. In C1-compiled code, the Java method `Class::isInstance(Object)`calls `Klass::is_subtype_of()`. 
>> 
>> In general, I find it difficult to decide how much work, if any, should be done to improve C1 performance. Clearly, if C1 exists only to help with startup time in a tiered compilation system, the answer is "not much".
>
> Thanks, now I see that `Class::isInstance(Object)` is backed by `Runtime1::is_instance_of()` which uses `oopDesc::is_a()` to do the job.
> 
> If it turns out to be performance critical, the intrinsic implementation should be rewritten to exercise existing subtype checking support in C1. As it is implemented now, it's already quite inefficient.

I did write an intrinsic for that, but it made this patch even larger. I have a small patch for C1, for some other time.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1691491950

From mbaesken at openjdk.org  Thu Jul 25 14:04:33 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Thu, 25 Jul 2024 14:04:33 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap' [v4]
In-Reply-To: <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
 <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com>
Message-ID: <znGOpYSNwo0aqgQcaw57RYRCqVixDaVVTBXnt-pIWQ8=.7626c298-e1d7-42d1-8f2e-ac74fcfc5e4a@github.com>

On Thu, 25 Jul 2024 13:42:48 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> When running with ubsan - enabled binaries, some tests trigger the following report :
>> 
>> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
>>     #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91
>>     #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106
>>     #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157
>>     #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63
>>     #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85
>>     #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319
>>     #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233
>>     #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199
>> 
>> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr .
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   add patch of Kim Barrett

Good additional news - the jdk :tier1 tests on  linux x86_64 with ubsan - enabled binaries are after this fix almost clean,
only some shenandoah related tests still fail because of 
https://bugs.openjdk.org/browse/JDK-8332697
8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int'



--------------------------------------------------
TEST: java/foreign/stackwalk/TestAsyncStackWalk.java#shenandoah
TEST RESULT: Failed. Unexpected exit from test [exit code: 1]
--------------------------------------------------
TEST: java/foreign/stackwalk/TestStackWalk.java#shenandoah
TEST RESULT: Failed. Unexpected exit from test [exit code: 1]
--------------------------------------------------
Test results: passed: 2,420; failed: 2

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2250400507

From szaldana at openjdk.org  Thu Jul 25 14:48:50 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Thu, 25 Jul 2024 14:48:50 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v13]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <Ic3egsnhXJQ-4k-m99sIQWIoa0ypGVZ5uIw8OTHhYG8=.0f2f8fbf-421d-45d0-bc46-73922dfde987@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  Wrapping in linux

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/d43d90d1..33976d70

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=12
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=11-12

  Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From szaldana at openjdk.org  Thu Jul 25 14:48:50 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Thu, 25 Jul 2024 14:48:50 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v12]
In-Reply-To: <Ip0rmvmbwcSZztOHUHePXBv4Z7ZEFFBaCTSpKCmCeps=.79dd4028-71a4-4e22-8319-0fd04e6aba68@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <Hj_ISn8I6TozxRCyy2AqQs-F1rnETzYwqMmme-ih87M=.8a7b1a72-ae81-4d00-a6ac-11fa17ec978e@github.com>
 <NhUgfbKGUCOEIf8yU0cpWLePjPuxTTNC8klTO2_rX28=.c3815881-6491-4d04-ae79-a3c98ef9158b@github.com>
 <Ip0rmvmbwcSZztOHUHePXBv4Z7ZEFFBaCTSpKCmCeps=.79dd4028-71a4-4e22-8319-0fd04e6aba68@github.com>
Message-ID: <JujyBGi73a8Bo6IAhyZnsjX_vyQdvGYdpWeDrL35WZA=.a119c16c-72fc-41a9-a590-7a7239546faf@github.com>

On Thu, 25 Jul 2024 13:11:00 GMT, Kevin Walls <kevinw at openjdk.org> wrote:

> good as it's a way of telling people these are FILEs therefore %p is interpreted, rather than just a STRING.

Hi Kevin, 

I feel this could be more explicit by updating the manpage to explain the %p substitution rather than updating the type to FILE seeing how users are asked to specify a filename rather than an existing file. What do you think?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2250523394

From kevinw at openjdk.org  Thu Jul 25 14:53:35 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Thu, 25 Jul 2024 14:53:35 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v12]
In-Reply-To: <JujyBGi73a8Bo6IAhyZnsjX_vyQdvGYdpWeDrL35WZA=.a119c16c-72fc-41a9-a590-7a7239546faf@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <Hj_ISn8I6TozxRCyy2AqQs-F1rnETzYwqMmme-ih87M=.8a7b1a72-ae81-4d00-a6ac-11fa17ec978e@github.com>
 <NhUgfbKGUCOEIf8yU0cpWLePjPuxTTNC8klTO2_rX28=.c3815881-6491-4d04-ae79-a3c98ef9158b@github.com>
 <Ip0rmvmbwcSZztOHUHePXBv4Z7ZEFFBaCTSpKCmCeps=.79dd4028-71a4-4e22-8319-0fd04e6aba68@github.com>
 <JujyBGi73a8Bo6IAhyZnsjX_vyQdvGYdpWeDrL35WZA=.a119c16c-72fc-41a9-a590-7a7239546faf@github.com>
Message-ID: <TMigVvxThtPHPTrUpUk8RhJnCeK_-NomWVDbOCO7oi4=.7278475c-f9b2-4031-a3f5-69b296f5732a@github.com>

On Thu, 25 Jul 2024 14:46:05 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

> > good as it's a way of telling people these are FILEs therefore %p is interpreted, rather than just a STRING.
> 
> Hi Kevin,
> 
> I feel this could be more explicit by updating the manpage to explain the %p substitution rather than updating the type to FILE seeing how users are asked to specify a filename rather than an existing file. What do you think?

Hi, I was thinking both 8-)  I can do the man page task as that is still closed...

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2250551321

From adinn at openjdk.org  Thu Jul 25 15:15:33 2024
From: adinn at openjdk.org (Andrew Dinn)
Date: Thu, 25 Jul 2024 15:15:33 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
Message-ID: <-SR_kRJlM8NfFkcmDOQ6txfC4qVYrvi5w48jQ6y9aZg=.f6f6e942-e247-4e63-b823-1d7a3922039a@github.com>

On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote:

> In the cases like:
> 
>   UNSAFE.putLong(address + off1 + 1030, lseed);
>   UNSAFE.putLong(address + 1023, lseed);
>   UNSAFE.putLong(address + off2 + 1001, lseed);
> 
> 
> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like:
> 
>   ldr  R10, [R15, #120]    # int ! Field: address
>   ldr  R11, [R16, #136]    # int ! Field: off1
>   ldr  R12, [R16, #144]    # int ! Field: off2
>   add  R11, R11, R10
>   mov R11, R11    # long -> ptr
>   add  R12, R12, R10
>   mov R10, R10    # long -> ptr
>   add R11, R11, #1030    # ptr
>   str  R17, [R11]    # int
>   add R10, R10, #1023    # ptr
>   str  R17, [R10]    # int
>   mov R10, R12    # long -> ptr
>   add R10, R10, #1001    # ptr
>   str  R17, [R10]    # int
> 
> 
> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly:
> 
>   ldr    x10, [x15,#120]
>   ldp    x11, x12, [x16,#136]
>   add    x11, x11, x10
>   add    x12, x12, x10
>   add    x11, x11, #0x406
>   str    x17, [x11]
>   add    x10, x10, #0x3ff
>   str    x17, [x10]
>   mov    x10, x12  <--- extra register copy
>   add    x10, x10, #0x3e9
>   str    x17, [x10]
> 
> 
> There is still one extra register copy, which we're trying to remove in this patch.
> 
> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition.
> 
> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so
> 
> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906

Marked as reviewed by adinn (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20157#pullrequestreview-2199613265

From adinn at openjdk.org  Thu Jul 25 15:15:34 2024
From: adinn at openjdk.org (Andrew Dinn)
Date: Thu, 25 Jul 2024 15:15:34 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <LrbAv5XoGQuXuZiReg9oJ6hpqq_Ip0wO1VN5bwNWZSA=.43a0e39d-89c1-4ca5-9ad4-fb3208fb3fb6@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
 <U17GOjlTnzMqNfLCJ3TOFe6qyJbFip49tCMooOR0V94=.ad5ec6c9-6401-4859-8a3c-e9ad917bd54a@github.com>
 <zEMJ521TvbtgVoiwHOW8dBvMw2_BzkRQ9g6H2rZafUc=.95f660a0-c710-41f2-a692-71368ce11865@github.com>
 <1n3zHLxSaMMYy7ViMvIvA0Dpo7LA7rOjY2ZKTNtp3xU=.446b0858-ec07-47df-985b-2cd8956974ff@github.com>
 <LrbAv5XoGQuXuZiReg9oJ6hpqq_Ip0wO1VN5bwNWZSA=.43a0e39d-89c1-4ca5-9ad4-fb3208fb3fb6@github.com>
Message-ID: <0tiCxWaLEvUkFXN74lGdcytTydBLaSwOrc_Xr4rk-YM=.d9585a6a-42a6-4609-a51c-c257fbfca2b5@github.com>

On Thu, 25 Jul 2024 13:38:47 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Yes, I realise that this is 16 less than 65536. However, there are two things I don't follow.
>> 
>> In the original code immLoffset was only used to define indOffLN i.e. a long offset used with a narrow pointer. The use of Address::offset_ok_for_immed(n->get_long(), 0) in the predicate limited narrow pointer offsets to -256 <= offset <= (2^12 - 1). With this change the top end of the range is now (2^12 - 1) << 4. I am wondering why that is appropriate?
>> 
>> The change allows immLOffset to be used in the definition of indOffX2P. I am not clear why indOffX2P is not just defined using the existing operand immLoffset16 which has as its predicate Address::offset_ok_for_immed(n->get_long(), 4). The only difference I can see is that the alternative predicate used here will accept a positive offset that is not 16 byte aligned. Is that the intention of the redefinition? Again, why is that appropriate?
>
>> The change allows immLOffset to be used in the definition of indOffX2P. I am not clear why indOffX2P is not just defined using the existing operand immLoffset16 which has as its predicate Address::offset_ok_for_immed(n->get_long(), 4).
> 
> After this change, `immLOffset` is a more general-purpose type than `immLoffset16`.  `immLOffset` matches all possible address offsets, along with some impossible ones. For example, it matches all of the misaligned `Unsafe` accesses at any offset, regardless of operand size. In the (rare) event that an operand size and offset don't fit a single instruction, we'll split the instruction when we emit it.
> 
> After this patch there will be a few rare cases where we have a regression in code size, but it's worth it for the simplicity and the size of the matcher logic, which would otherwise explode. I don't expect any significant regression in execution time.
> 
> This patch is not the last word on the matter; later patches may well further reduce the number of integer offset types in a similar way. I don't think that many of the offsetL/I/X/P types do anything useful, and we'd probably profit from removing them, but that's another patch for anther day.

Ok, I see. The use of immLoffset as currently defined is actually correct for narrow oops and, indeed, for all other address base types. It allows for all possible offsets that might fit into a load an immediate slot. Whether we can legitimately encode the operand offset as an immediate or need instead to use an auxiliary add does not actually depend on the type of the address base but on the size of the datum fetched by the indirect load that consumes the operand. So, an indirect operand with offset 4098 would be too big to encode in an ldrb, fine to encode in an ldrh and invalid for encoding in an ldrw or ldrx because it is not suitably aligned.

That does imply we should get rid of the other (now redundant) immLoffset<n> operands. However, we can do that in a follow-up patch because it is not what this fix is addressing

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20157#discussion_r1691636646

From szaldana at openjdk.org  Thu Jul 25 15:31:05 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Thu, 25 Jul 2024 15:31:05 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v14]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  Adding FILE descriptor for help output

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/33976d70..71d3d140

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=13
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=12-13

  Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From szaldana at openjdk.org  Thu Jul 25 15:31:05 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Thu, 25 Jul 2024 15:31:05 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v12]
In-Reply-To: <TMigVvxThtPHPTrUpUk8RhJnCeK_-NomWVDbOCO7oi4=.7278475c-f9b2-4031-a3f5-69b296f5732a@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <Hj_ISn8I6TozxRCyy2AqQs-F1rnETzYwqMmme-ih87M=.8a7b1a72-ae81-4d00-a6ac-11fa17ec978e@github.com>
 <NhUgfbKGUCOEIf8yU0cpWLePjPuxTTNC8klTO2_rX28=.c3815881-6491-4d04-ae79-a3c98ef9158b@github.com>
 <Ip0rmvmbwcSZztOHUHePXBv4Z7ZEFFBaCTSpKCmCeps=.79dd4028-71a4-4e22-8319-0fd04e6aba68@github.com>
 <JujyBGi73a8Bo6IAhyZnsjX_vyQdvGYdpWeDrL35WZA=.a119c16c-72fc-41a9-a590-7a7239546faf@github.com>
 <TMigVvxThtPHPTrUpUk8RhJnCeK_-NomWVDbOCO7oi4=.7278475c-f9b2-4031-a3f5-69b296f5732a@github.com>
Message-ID: <QnzdzdE6YOb5ErBR9ePCYd0Q5x6xTuNMMgy7vMJ-Uhw=.5ef64e7a-b31c-4c07-b94c-d190cf880f0c@github.com>

On Thu, 25 Jul 2024 14:51:22 GMT, Kevin Walls <kevinw at openjdk.org> wrote:

> > > good as it's a way of telling people these are FILEs therefore %p is interpreted, rather than just a STRING.
> > 
> > 
> > Hi Kevin,
> > I feel this could be more explicit by updating the manpage to explain the %p substitution rather than updating the type to FILE seeing how users are asked to specify a filename rather than an existing file. What do you think?
> 
> Hi, I was thinking both 8-) I can do the man page task as that is still closed...

Makes sense. I updated the relevant arguments to `FILE`. 

> I can do the man page task as that is still closed...

That'd be great and thank you for your patience with this review!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2250681706

From aph at openjdk.org  Thu Jul 25 15:42:04 2024
From: aph at openjdk.org (Andrew Haley)
Date: Thu, 25 Jul 2024 15:42:04 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v7]
In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
Message-ID: <KaJs7zZ3iFiDt3n7xWByN9WMex4VndXAPgpy3lMtSRY=.ee768444-5097-4968-8964-99329011ec73@github.com>

> This patch expands the use of a hash table for secondary superclasses
> to the interpreter, C1, and runtime. It also adds a C2 implementation
> of hashed lookup in cases where the superclass isn't known at compile
> time.
> 
> HotSpot shared runtime
> ----------------------
> 
> Building hashed secondary tables is now unconditional. It takes very
> little time, and now that the shared runtime always has the tables, it
> might as well take advantage of them. The shared code is easier to
> follow now, I think.
> 
> There might be a performance issue with x86-64 in that we build
> HotSpot for a default x86-64 target that does not support popcount.
> This means that HotSpot C++ runtime on x86 always uses a software
> emulation for popcount, even though the vast majority of machines made
> for the past 20 years can do popcount in a single instruction. It
> wouldn't be terribly hard to do something about that.
> 
> Having said that, the software popcount is really not bad.
> 
> x86
> ---
> 
> x86 is rather tricky, because we still support
> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
> well as 32- and 64-bit ports. There's some further complication in
> that only `RCX` can be used as a shift count, so there's some register
> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
> rather gnarly, with multiple levels of conditionals at compile time
> and runtime.
> 
> AArch64
> -------
> 
> AArch64 is considerably more straightforward. We always have a
> popcount instruction and (thankfully) no 32-bit code to worry about.
> 
> Generally
> ---------
> 
> I would dearly love simply to rip out the "old" secondary supers cache
> support, but I've left it in just in case someone has a performance
> regression.
> 
> The versions of `MacroAssembler::lookup_secondary_supers_table` that
> work with variable superclasses don't take a fixed set of temp
> registers, and neither do they call out to to a slow path subroutine.
> Instead, the slow patch is expanded inline.
> 
> I don't think this is necessarily bad. Apart from the very rare cases
> where C2 can't determine the superclass to search for at compile time,
> this code is only used for generating stubs, and it seemed to me
> ridiculous to have stubs calling other stubs.
> 
> I've followed the guidance from @iwanowww not to obsess too much about
> the performance of C1-compiled secondary supers lookups, and to prefer
> simplicity over absolute performance. Nonetheless, this is a
> complicated patch that touches many areas.

Andrew Haley has updated the pull request incrementally with four additional commits since the last revision:

 - Cleanup
 - temp
 - Merge branch 'JDK-8331658-work' of https://github.com/theRealAph/jdk into JDK-8331658-work
 - Minor cleanup

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19989/files
  - new: https://git.openjdk.org/jdk/pull/19989/files/48e80a13..011a3880

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=05-06

  Stats: 51 lines in 7 files changed: 21 ins; 18 del; 12 mod
  Patch: https://git.openjdk.org/jdk/pull/19989.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989

PR: https://git.openjdk.org/jdk/pull/19989

From aph at openjdk.org  Thu Jul 25 16:05:49 2024
From: aph at openjdk.org (Andrew Haley)
Date: Thu, 25 Jul 2024 16:05:49 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v8]
In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
Message-ID: <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com>

> This patch expands the use of a hash table for secondary superclasses
> to the interpreter, C1, and runtime. It also adds a C2 implementation
> of hashed lookup in cases where the superclass isn't known at compile
> time.
> 
> HotSpot shared runtime
> ----------------------
> 
> Building hashed secondary tables is now unconditional. It takes very
> little time, and now that the shared runtime always has the tables, it
> might as well take advantage of them. The shared code is easier to
> follow now, I think.
> 
> There might be a performance issue with x86-64 in that we build
> HotSpot for a default x86-64 target that does not support popcount.
> This means that HotSpot C++ runtime on x86 always uses a software
> emulation for popcount, even though the vast majority of machines made
> for the past 20 years can do popcount in a single instruction. It
> wouldn't be terribly hard to do something about that.
> 
> Having said that, the software popcount is really not bad.
> 
> x86
> ---
> 
> x86 is rather tricky, because we still support
> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
> well as 32- and 64-bit ports. There's some further complication in
> that only `RCX` can be used as a shift count, so there's some register
> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
> rather gnarly, with multiple levels of conditionals at compile time
> and runtime.
> 
> AArch64
> -------
> 
> AArch64 is considerably more straightforward. We always have a
> popcount instruction and (thankfully) no 32-bit code to worry about.
> 
> Generally
> ---------
> 
> I would dearly love simply to rip out the "old" secondary supers cache
> support, but I've left it in just in case someone has a performance
> regression.
> 
> The versions of `MacroAssembler::lookup_secondary_supers_table` that
> work with variable superclasses don't take a fixed set of temp
> registers, and neither do they call out to to a slow path subroutine.
> Instead, the slow patch is expanded inline.
> 
> I don't think this is necessarily bad. Apart from the very rare cases
> where C2 can't determine the superclass to search for at compile time,
> this code is only used for generating stubs, and it seemed to me
> ridiculous to have stubs calling other stubs.
> 
> I've followed the guidance from @iwanowww not to obsess too much about
> the performance of C1-compiled secondary supers lookups, and to prefer
> simplicity over absolute performance. Nonetheless, this is a
> complicated patch that touches many areas.

Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 40 commits:

 - Merge branch 'clean' into JDK-8331658-work
 - Cleanup
 - temp
 - Merge branch 'JDK-8331658-work' of https://github.com/theRealAph/jdk into JDK-8331658-work
 - Review comments
 - Review comments
 - Review comments
 - Review comments
 - Review feedback
 - Review feedback
 - ... and 30 more: https://git.openjdk.org/jdk/compare/34ee06f5...248f44dc

-------------

Changes: https://git.openjdk.org/jdk/pull/19989/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=07
  Stats: 991 lines in 19 files changed: 754 ins; 116 del; 121 mod
  Patch: https://git.openjdk.org/jdk/pull/19989.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989

PR: https://git.openjdk.org/jdk/pull/19989

From aph at openjdk.org  Thu Jul 25 16:25:38 2024
From: aph at openjdk.org (Andrew Haley)
Date: Thu, 25 Jul 2024 16:25:38 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v8]
In-Reply-To: <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com>
Message-ID: <xS--usGQMNszO_NP1guEfRkZTSvBwWmJ8A3wadP0fBU=.2bc1982b-8f8a-44e1-a34c-1ffdee026db4@github.com>

On Thu, 25 Jul 2024 16:05:49 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 40 commits:
> 
>  - Merge branch 'clean' into JDK-8331658-work
>  - Cleanup
>  - temp
>  - Merge branch 'JDK-8331658-work' of https://github.com/theRealAph/jdk into JDK-8331658-work
>  - Review comments
>  - Review comments
>  - Review comments
>  - Review comments
>  - Review feedback
>  - Review feedback
>  - ... and 30 more: https://git.openjdk.org/jdk/compare/34ee06f5...248f44dc

OK! I think that's everything.

Are we ready for a second pair of eyes now?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2250880303

From wkemper at openjdk.org  Thu Jul 25 17:10:05 2024
From: wkemper at openjdk.org (William Kemper)
Date: Thu, 25 Jul 2024 17:10:05 GMT
Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update
 mode [v3]
In-Reply-To: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
Message-ID: <wLt5dTWUk5MaiDzfW0G3Lwrk_mUtffeCUxIGJPIka4c=.7ec28766-8e09-4ee9-93d5-9c06642e6906@github.com>

> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development.
> 
> ## Testing
> * hotspot_gc_shenandoah
> * dacapo
> * diluvian
> * extremem
> * hyperalloc
> * specjbb2015
> * specjvm2008

William Kemper has updated the pull request incrementally with one additional commit since the last revision:

  Simplify final mark now that incremental update mode is removed

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20316/files
  - new: https://git.openjdk.org/jdk/pull/20316/files/ec2d6b64..79ceade3

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20316&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20316&range=01-02

  Stats: 13 lines in 1 file changed: 0 ins; 9 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/20316.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20316/head:pull/20316

PR: https://git.openjdk.org/jdk/pull/20316

From shade at openjdk.org  Thu Jul 25 17:28:31 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 25 Jul 2024 17:28:31 GMT
Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update
 mode [v3]
In-Reply-To: <wLt5dTWUk5MaiDzfW0G3Lwrk_mUtffeCUxIGJPIka4c=.7ec28766-8e09-4ee9-93d5-9c06642e6906@github.com>
References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
 <wLt5dTWUk5MaiDzfW0G3Lwrk_mUtffeCUxIGJPIka4c=.7ec28766-8e09-4ee9-93d5-9c06642e6906@github.com>
Message-ID: <Xw04WrROfcoqCvKoIFtVmdckxzQIJS95qrJ4Zri9iUk=.b5bb6bb4-f440-4831-a311-b33867239f2a@github.com>

On Thu, 25 Jul 2024 17:10:05 GMT, William Kemper <wkemper at openjdk.org> wrote:

>> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development.
>> 
>> ## Testing
>> * hotspot_gc_shenandoah
>> * dacapo
>> * diluvian
>> * extremem
>> * hyperalloc
>> * specjbb2015
>> * specjvm2008
>
> William Kemper has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Simplify final mark now that incremental update mode is removed

Marked as reviewed by shade (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20316#pullrequestreview-2199950416

From wkemper at openjdk.org  Thu Jul 25 21:39:06 2024
From: wkemper at openjdk.org (William Kemper)
Date: Thu, 25 Jul 2024 21:39:06 GMT
Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update
 mode [v4]
In-Reply-To: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
Message-ID: <cNGAzck0BWF_wvjEGCwPJ4908wvwaWKVOEfql6P105Q=.e3794703-ee0b-4421-b1da-75fd9d09fc1d@github.com>

> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development.
> 
> ## Testing
> * hotspot_gc_shenandoah
> * dacapo
> * diluvian
> * extremem
> * hyperalloc
> * specjbb2015
> * specjvm2008

William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:

 - Merge remote-tracking branch 'openjdk/master' into remove-incremental-update-mode
 - Simplify final mark now that incremental update mode is removed
 - Remove unintentional new line
 - Remove last vestiges of incremental update mode
 - Missed test, remove actual IU barrier flag
 - Remove missed iu_barrier usages for C1
 - Update test (all barriers can be enabled now for all modes)
 - WIP: Remove incremental update mode

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20316/files
  - new: https://git.openjdk.org/jdk/pull/20316/files/79ceade3..287b6187

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20316&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20316&range=02-03

  Stats: 10966 lines in 271 files changed: 7634 ins; 1955 del; 1377 mod
  Patch: https://git.openjdk.org/jdk/pull/20316.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20316/head:pull/20316

PR: https://git.openjdk.org/jdk/pull/20316

From lmesnik at openjdk.org  Thu Jul 25 21:49:36 2024
From: lmesnik at openjdk.org (Leonid Mesnik)
Date: Thu, 25 Jul 2024 21:49:36 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v14]
In-Reply-To: <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com>
Message-ID: <eVSJPPx1YvOofxUamBOBMAPyc7wugyokec1gQ1erbkQ=.f3dc9d08-3dc8-47d8-845b-a461b705f4c7@github.com>

On Thu, 25 Jul 2024 15:31:05 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adding FILE descriptor for help output

Thank you for improving argument handling in jcmd. Please fix small identation problem. Also I would recommend to get approval from svc team reviewer.

src/hotspot/share/prims/wbtestmethods/parserTests.cpp line 132:

> 130:    } else if (strcmp(type, "FILE") == 0) {
> 131:       DCmdArgument<FileArgument>* argument =
> 132:           new DCmdArgument<FileArgument>(name, desc, "FILE", mandatory);

Please update identation.

-------------

Marked as reviewed by lmesnik (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2200510671
PR Review Comment: https://git.openjdk.org/jdk/pull/20198#discussion_r1692165047

From vlivanov at openjdk.org  Thu Jul 25 23:33:38 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 25 Jul 2024 23:33:38 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v8]
In-Reply-To: <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com>
Message-ID: <2tItgZaRCa5BQrmelOWEsn6FVlHlEvY4is2L1n3HxhE=.eb2519cc-d69e-45e6-8ca3-b5b02565bb76@github.com>

On Thu, 25 Jul 2024 16:05:49 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 40 commits:
> 
>  - Merge branch 'clean' into JDK-8331658-work
>  - Cleanup
>  - temp
>  - Merge branch 'JDK-8331658-work' of https://github.com/theRealAph/jdk into JDK-8331658-work
>  - Review comments
>  - Review comments
>  - Review comments
>  - Review comments
>  - Review feedback
>  - Review feedback
>  - ... and 30 more: https://git.openjdk.org/jdk/compare/34ee06f5...248f44dc

Thanks! The patch looks good, except there was one failure observed during testing with the latest patch [1]. It does look related to the latest changes you did in [54050a5](https://github.com/openjdk/jdk/pull/19989/commits/54050a5c2c0aa1d6a9e36d0240c66345765845e3) about `bitmap == SECONDARY_SUPERS_BITMAP_FULL` check. 

[1]

#  Internal Error (.../src/hotspot/share/oops/array.hpp:126), pid=1225147, tid=1273512
#  assert(i >= 0 && i< _length) failed: oob: 0 <= 63 < 63

V  [libjvm.so+0x7f7eab]  oopDesc::is_a(Klass*) const+0x21b  (array.hpp:126)
V  [libjvm.so+0xe9805e]  java_lang_Throwable::fill_in_stack_trace(Handle, methodHandle const&, JavaThread*)+0x158e  (javaClasses.cpp:2677)
V  [libjvm.so+0xe982cb]  java_lang_Throwable::fill_in_stack_trace(Handle, methodHandle const&)+0x6b  (javaClasses.cpp:2719)
V  [libjvm.so+0xfe045c]  JVM_FillInStackTrace+0x9c  (jvm.cpp:515)
....

RBX=0x00000000b446bcf0 is a pointer to class: 
javasoft.sqe.tests.lang.clss029.clss02902.e67 {0x00000000b446bcf0}
...
 - name:              'javasoft/sqe/tests/lang/clss029/clss02902/e67'
 - super:             'javasoft/sqe/tests/lang/clss029/clss02902/e66'
 - sub:               'javasoft/sqe/tests/lang/clss029/clss02902/e68'   
...
 - secondary supers: Array<T>(0x00007faff68b3058)
 - hash_slot:         39
 - secondary bitmap: 0xffffffffffffffff


R12=0x00000000b446ba80 is a pointer to class: 
javasoft.sqe.tests.lang.clss029.clss02902.e66 {0x00000000b446ba80}
...
 - name:              'javasoft/sqe/tests/lang/clss029/clss02902/e66'
 - super:             'javasoft/sqe/tests/lang/clss029/clss02902/e65'
 - sub:               'javasoft/sqe/tests/lang/clss029/clss02902/e67'   
...
 - secondary supers: Array<T>(0x00007faff68b2c90)
 - hash_slot:         63
 - secondary bitmap: 0xfffffffffffefffd


Do we miss

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2251567973

From vlivanov at openjdk.org  Thu Jul 25 23:51:38 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 25 Jul 2024 23:51:38 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v8]
In-Reply-To: <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com>
Message-ID: <8X8AZw0dKF8wuWXY1KtHXSY0OItqLX-SiAJG6zRYwfY=.8e4b95c2-95ca-4847-aa12-3d8fd7b10f17@github.com>

On Thu, 25 Jul 2024 16:05:49 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 40 commits:
> 
>  - Merge branch 'clean' into JDK-8331658-work
>  - Cleanup
>  - temp
>  - Merge branch 'JDK-8331658-work' of https://github.com/theRealAph/jdk into JDK-8331658-work
>  - Review comments
>  - Review comments
>  - Review comments
>  - Review comments
>  - Review feedback
>  - Review feedback
>  - ... and 30 more: https://git.openjdk.org/jdk/compare/34ee06f5...248f44dc

src/hotspot/share/oops/klass.inline.hpp line 82:

> 80: // subtype check: true if is_subclass_of, or if k is interface and receiver implements it
> 81: inline bool Klass::is_subtype_of(Klass* k) const {
> 82:   guarantee(secondary_supers() != nullptr, "must be");

Minor point: considering libjvm contains hundreds of copies, does it make sense to turn it into an assert instead? For example, on AArch64 the check costs 36 bytes [1] in product build.  

[1]

libjvm.dylib[0x1306d4] <+28>:  ldr    x9, [x8, #0x28]
libjvm.dylib[0x1306d8] <+32>:  cbz    x9, 0x1307dc              ; <+292> [inlined] Klass::is_subtype_of(Klass*) const at klass.inline.hpp:82:3
...
libjvm.dylib[0x1307dc] <+292>: adrp   x0, 3200
libjvm.dylib[0x1307e0] <+296>: add    x0, x0, #0x663            ; "open/src/hotspot/share/oops/klass.inline.hpp"
libjvm.dylib[0x1307e4] <+300>: adrp   x2, 3200
libjvm.dylib[0x1307e8] <+304>: add    x2, x2, #0x690            ; "guarantee(secondary_supers() != nullptr) failed"
libjvm.dylib[0x1307ec] <+308>: adrp   x3, 3200
libjvm.dylib[0x1307f0] <+312>: add    x3, x3, #0x6c0            ; "must be"
libjvm.dylib[0x1307f4] <+316>: mov    w1, #0x52
libjvm.dylib[0x1307f8] <+320>: bl     0x54f870                  ; report_vm_error at debug.cpp:181

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1692288943

From wkemper at openjdk.org  Fri Jul 26 00:02:38 2024
From: wkemper at openjdk.org (William Kemper)
Date: Fri, 26 Jul 2024 00:02:38 GMT
Subject: Integrated: 8336685: Shenandoah: Remove experimental incremental
 update mode
In-Reply-To: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
Message-ID: <dkk2Zl3MgpS5RuSIAhg5A1NkDc6PtROb0X6R0nwRzAA=.606a1b08-b70f-4498-b696-89a8c80c09e7@github.com>

On Wed, 24 Jul 2024 18:08:46 GMT, William Kemper <wkemper at openjdk.org> wrote:

> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development.
> 
> ## Testing
> * hotspot_gc_shenandoah
> * dacapo
> * diluvian
> * extremem
> * hyperalloc
> * specjbb2015
> * specjvm2008

This pull request has now been integrated.

Changeset: 0584af23
Author:    William Kemper <wkemper at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/0584af23255b6b8f49190eaf2618f3bcc299adfe
Stats:     1708 lines in 69 files changed: 4 ins; 1668 del; 36 mod

8336685: Shenandoah: Remove experimental incremental update mode

Reviewed-by: shade

-------------

PR: https://git.openjdk.org/jdk/pull/20316

From ysr at openjdk.org  Fri Jul 26 01:09:39 2024
From: ysr at openjdk.org (Y. Srinivas Ramakrishna)
Date: Fri, 26 Jul 2024 01:09:39 GMT
Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update
 mode [v4]
In-Reply-To: <cNGAzck0BWF_wvjEGCwPJ4908wvwaWKVOEfql6P105Q=.e3794703-ee0b-4421-b1da-75fd9d09fc1d@github.com>
References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
 <cNGAzck0BWF_wvjEGCwPJ4908wvwaWKVOEfql6P105Q=.e3794703-ee0b-4421-b1da-75fd9d09fc1d@github.com>
Message-ID: <B9KUTPsaCmQ0ewO73Mdh6lHXUpu_lAmCNQ8FC6_fzkU=.e2c55cef-c263-4b41-a7d7-529d7f7740e3@github.com>

On Thu, 25 Jul 2024 21:39:06 GMT, William Kemper <wkemper at openjdk.org> wrote:

>> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development.
>> 
>> ## Testing
>> * hotspot_gc_shenandoah
>> * dacapo
>> * diluvian
>> * extremem
>> * hyperalloc
>> * specjbb2015
>> * specjvm2008
>
> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'openjdk/master' into remove-incremental-update-mode
>  - Simplify final mark now that incremental update mode is removed
>  - Remove unintentional new line
>  - Remove last vestiges of incremental update mode
>  - Missed test, remove actual IU barrier flag
>  - Remove missed iu_barrier usages for C1
>  - Update test (all barriers can be enabled now for all modes)
>  - WIP: Remove incremental update mode

I'd left my review comments instead of flushing them yesterday. Not sure if they are still relevant, but here they go... fwiw.

src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 339:

> 337:   product(bool, ShenandoahIUBarrier, false, DIAGNOSTIC,                     \
> 338:           "Turn on/off I-U barriers barriers in Shenandoah")                \
> 339:                                                                             \

Not your change, but these doc comments should really say `"Enable blah-blah ..."` rather than `"Turn on/off blah-blah..."`.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20316#pullrequestreview-2198135931
PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690703426

From ysr at openjdk.org  Fri Jul 26 01:09:40 2024
From: ysr at openjdk.org (Y. Srinivas Ramakrishna)
Date: Fri, 26 Jul 2024 01:09:40 GMT
Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update
 mode [v2]
In-Reply-To: <03bSRAN8T28AU2-M4IzjsBygTwG4SHrc8HUIJYLM5TE=.e4299a87-b25f-471b-9f6e-2c08e741c6f2@github.com>
References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
 <03bSRAN8T28AU2-M4IzjsBygTwG4SHrc8HUIJYLM5TE=.e4299a87-b25f-471b-9f6e-2c08e741c6f2@github.com>
Message-ID: <Fp-wdKTTyKEIGRH8CpV38B8gThKzmvNmbrLcN9Yc4Rg=.1bd4121b-9a9a-47d8-850e-1e9bb4dbbfdb@github.com>

On Wed, 24 Jul 2024 19:31:04 GMT, William Kemper <wkemper at openjdk.org> wrote:

>> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development.
>> 
>> ## Testing
>> * hotspot_gc_shenandoah
>> * dacapo
>> * diluvian
>> * extremem
>> * hyperalloc
>> * specjbb2015
>> * specjvm2008
>
> William Kemper has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove unintentional new line

src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 122:

> 120: 
> 121:       ShenandoahMarkRefsClosure<GENERATION> mark_cl(q, rp);
> 122:       ShenandoahSATBAndRemarkThreadsClosure tc(satb_mq_set, nullptr);

Because this is the only c'tor usage of this closure, may be get rid of the second argument altogether, and clean up its `do_thread()` method further above at lines 84-89 ?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690701759

From dholmes at openjdk.org  Fri Jul 26 04:08:59 2024
From: dholmes at openjdk.org (David Holmes)
Date: Fri, 26 Jul 2024 04:08:59 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
Message-ID: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>

Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.

The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.

Testing:
 - new gtest exercises the truncation code with the different possibilities for bad truncation
 - tiers 1-3 sanity testing

Thanks.

-------------

Commit messages:
 - Fixed typo
 - 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string

Changes: https://git.openjdk.org/jdk/pull/20345/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8325002
  Stats: 180 lines in 4 files changed: 177 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/20345.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20345/head:pull/20345

PR: https://git.openjdk.org/jdk/pull/20345

From dholmes at openjdk.org  Fri Jul 26 04:29:36 2024
From: dholmes at openjdk.org (David Holmes)
Date: Fri, 26 Jul 2024 04:29:36 GMT
Subject: RFR: 8333144: docker tests do not work when ubsan is configured
In-Reply-To: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
Message-ID: <iMT1EV6Ol7de4iryXsfwqufpXOFMoBckgURpg0XRQa8=.6e31894c-f976-4e1b-8295-157303885927@github.com>

On Wed, 26 Jun 2024 13:32:32 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work.
> 
> We find this in the test output
> 
> [STDOUT]
> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory
> 
> The container where the test is executed does not contain the ubsan package; we might skip the test in this case.

I think others who have more investment in this area need to weigh in. I don't know the implications for our infra folk if we need to ensure ubsan is installed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2251949827

From stuefe at openjdk.org  Fri Jul 26 05:39:33 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Fri, 26 Jul 2024 05:39:33 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
Message-ID: <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com>

On Fri, 26 Jul 2024 04:03:10 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
> 
> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
> 
> Testing:
>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>  - tiers 1-3 sanity testing
> 
> Thanks.

This is a neat technique, but it won't work for very short strings (e.g. consisting of just one or two multi-byte sequences, the latter being invalid). Reason is that you need a minimal buffer length to do the check safely.

What you could do alternatively is to allocate the `msg` buffer with 5 leading bytes that you don't use, just zero-initialize. So the string start would be at msg+5. But that way, you can safely overstep the array with e.g. index - 5 without corruption.

src/hotspot/share/utilities/exceptions.cpp line 275:

> 273:   // we may also have a truncated UTF-8 sequence. In such cases we need to fix the buffer so the UTF-8
> 274:   // sequence is valid.
> 275:   if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {

You should test for length >= 5 since it is the farthest you could access in `UTF8::truncate_to_legal_utf8` later:


  for (int index = length - 2; index > 0; index--) {
...
    assert(buffer[index - 3] == 0xED, "malformed sequence");

src/hotspot/share/utilities/exceptions.cpp line 276:

> 274:   // sequence is valid.
> 275:   if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {
> 276:     assert(msg[max_msg_size - 1] == '\0', "should be null terminated");

Would this always be true? For a formatting error, too?
Maybe just to be sure, instead of asserting set the last byte to zero.

src/hotspot/share/utilities/utf8.cpp line 407:

> 405: // To avoid that the caller can choose to check for validity first.
> 406: // The incoming buffer is still expected to be NUL-terminated.
> 407: void UTF8::truncate_to_legal_utf8(unsigned char* buffer, int length) {

Lets make buffer length size_t and avoid awkward casting

-------------

PR Review: https://git.openjdk.org/jdk/pull/20345#pullrequestreview-2200961895
PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692526390
PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692526795
PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692524657

From stuefe at openjdk.org  Fri Jul 26 05:44:31 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Fri, 26 Jul 2024 05:44:31 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
Message-ID: <WBv-Q4gapmUUyRSFfJrCY6A24u9ovvovQiCZ7N9ZadU=.9374cf3f-94c6-47bf-aeaa-037132083a6b@github.com>

On Fri, 26 Jul 2024 04:03:10 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
> 
> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
> 
> Testing:
>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>  - tiers 1-3 sanity testing
> 
> Thanks.

src/hotspot/share/utilities/exceptions.cpp line 277:

> 275:   if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {
> 276:     assert(msg[max_msg_size - 1] == '\0', "should be null terminated");
> 277:     UTF8::truncate_to_legal_utf8((unsigned char*)msg, max_msg_size);

Ah, I misread your patch and thought you pass in the strlen of the message to the truncation function, when in fact you pass in the hard coded message buffer size. 

But that begs the question of why you test strlen above, and more importantly, whether all cases where snprintf returns an error are truncation problems. It could have detected an invalid UTF8 sequence and aborted in the middle of it.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692538448

From vlivanov at openjdk.org  Fri Jul 26 05:59:33 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Fri, 26 Jul 2024 05:59:33 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v8]
In-Reply-To: <RDhgVrQ6zzzaCBuvMVR7tKA2Qe1SwF2pktr5xcI_duE=.9ae8f5e8-f888-47b5-b979-e56692b278f6@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <5N5AdXvL7EpqKbo5LbxBvjeLsduh3_eEuM9LOPjD-Fc=.e70e1af6-430e-4213-8ce7-88a9cec15960@github.com>
 <A2v60vdAPL9qb22NB6kLVyuCACPDeqHUYoYFRFX6ig0=.9ef6f86b-559d-463a-9061-d0bbb6093aa7@github.com>
 <ukQ_tEZztKeBZnn8TDo3YfJ4GI0mHUrVRZmgM4d1W1g=.1fc9f9f2-c2bf-4237-94d4-dd9aae26411b@github.com>
 <BolXJ-8qekfYskirR9P20jAQZW6s7WPe4A-oija7RA8=.855251f0-4246-403d-a9fe-00b9406f07e3@github.com>
 <eLDcJyPLboqZr-8yk1kxVfV6WTaRYXZq5lZvDoIEFKM=.c87b23c8-d9c5-45ff-a2dd-5f0c4875cb62@github.com>
 <UAjH__AKdU3UMdJBkg7TlElKSA8mEFFE0MiElVrYexE=.4bc67a26-3383-4e4e-92b0-f1d3d33c5ce2@github.com>
 <M5xQ14pzHdBEr7yAdAqIVUsY_o8tXUgN9HpKxjkZznw=.f2262137-2fec-4297-ae1e-89b11874266f@github.com>
 <YxBy1Mx7Di5EDfJkCTfcaIuTzCv5KdzBzKMcE3iIeak=.2a56f436-8e14-4a22-a85d-cd06209e2c01@github.com>
 <vw5vWKYgk45g7I9Yio_NTYLSL9fz3y6ptFHJyGNZJCE=.bb3c7d4e-9a5e-4c53-80b7-853dc74a611c@github.com>
 <RDhgVrQ6zzzaCBuvMVR7tKA2Qe1SwF2pktr
 5xcI_duE=.9ae8f5e8-f888-47b5-b979-e56692b278f6@github.com>
Message-ID: <oTP5cg-k2QeE1yGQgR3Cueo4EQZYZOr5QofOEulYM4s=.48445657-22aa-4631-b6d1-e3040f3b329f@github.com>

On Thu, 25 Jul 2024 13:56:34 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Thanks, now I see that `Class::isInstance(Object)` is backed by `Runtime1::is_instance_of()` which uses `oopDesc::is_a()` to do the job.
>> 
>> If it turns out to be performance critical, the intrinsic implementation should be rewritten to exercise existing subtype checking support in C1. As it is implemented now, it's already quite inefficient.
>
> I did write an intrinsic for that, but it made this patch even larger. I have a small patch for C1, for some other time.

FYI I filed a low-priority RFE against C1 to track it ([JDK-8337251](https://bugs.openjdk.org/browse/JDK-8337251)).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1692548251

From stuefe at openjdk.org  Fri Jul 26 06:41:36 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Fri, 26 Jul 2024 06:41:36 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic
In-Reply-To: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
Message-ID: <ej5ON8iDbUMsORwZNuLzDbXERpzGJde7q_hd50vKPGo=.34c0d39e-7a85-49a7-9d10-363a9800cc4d@github.com>

On Tue, 23 Jul 2024 21:46:50 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote:

> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031)
> 
> Testing:
>   test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java
>   test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java

Hi Ashu,

generally okay, see remarks.

It seems awkward to have a size_t vector with a defined length, and then having to specify the length as input argument. I'd consider either use the good old style of 


void foo(const size_t array[NUM], ...);


(using array with a defined size, but careful since in foo sizeof(array) is still just a pointer)

or just write a small wrapper class holding a size_t vector and taking care of the copying.

src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 58:

> 56: }
> 57: 
> 58: void ArenaStatCounter::init() {

Proposal: `reset()` ?

src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 115:

> 113:   for (int tag = 0; tag < Arena::tag_count(); tag++) {
> 114:     total += _tags_size[tag];
> 115:   }

Do it with x-macro?

src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 118:

> 116:   if (total != _current) {
> 117:     log_info(compilation, alloc)("WARNING!!! Total does not match current");
> 118:   }

Why do we calculate total? Just for this test? I would then put this into an ASSERT section, and make the info log an assert. 

However, I wonder if this is really needed. The logic updating both _current and _tags_size is pretty straightforward, I don't see how there could be a mismatch.

src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 204:

> 202:   size_t _total;
> 203:   // usage per arena tag when total peaked
> 204:   size_t _tags_size_at_peak[Arena::tag_count()];

Can you please make sure Arena::tag_count() evaluates to constexpr? When in doubt, just use the enum value instead.

src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 226:

> 224: 
> 225:   void set_total(size_t n) { _total = n; }
> 226:   void set_tags_size_at_peak(size_t* tags_size_at_peak, int nelements) {

const size_t*

src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 228:

> 226:   void set_tags_size_at_peak(size_t* tags_size_at_peak, int nelements) {
> 227:     assert(nelements*sizeof(size_t) <= sizeof(_tags_size_at_peak), "overflow check");
> 228:     memcpy(_tags_size_at_peak, tags_size_at_peak, nelements*sizeof(size_t));

style, we do blanks between * (n * x, not n*x)

src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 242:

> 240:     for (int tag = 0; tag < Arena::tag_count(); tag++) {
> 241:       st->print_cr("  " LEGEND_KEY_FMT ": %s", Arena::tag_name[tag], Arena::tag_desc[tag]);
> 242:     }

use x macro?

src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 365:

> 363: 
> 364:   void add(const FullMethodName& fmn, CompilerType comptype,
> 365:            size_t total, size_t* tags_size_at_peak, int nelements,

const size_t*

src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 471:

> 469:     _the_table->add(fmn, ct,
> 470:                     arena_stat->peak(), // total
> 471:                     (size_t *)arena_stat->tags_size_at_peak(),

Cast should not be needed

src/hotspot/share/compiler/compilationMemoryStatistic.hpp line 46:

> 44:   size_t _peak;
> 45:   // Current bytes used by arenas per tag
> 46:   size_t _tags_size[Arena::tag_count()];

Proposal: `_current_by_tag`, referring to _current

src/hotspot/share/compiler/compilationMemoryStatistic.hpp line 53:

> 51: 
> 52:   // Peak composition:
> 53:   size_t _tags_size_at_peak[Arena::tag_count()];

`_peak_by_tag` ?

src/hotspot/share/memory/arena.cpp line 48:

> 46: 
> 47: #define ARENA_TAG_STRING_(str) #str
> 48: #define ARENA_TAG_STRING(name, str, desc) ARENA_TAG_STRING_(str),

Can you use STR/XSTR in macros.hpp?

src/hotspot/share/memory/arena.hpp line 86:

> 84: };
> 85: 
> 86: #define DO_ARENA_TAG(template) \

Please don't name this template, confuses my IDE. We usually call it DO or XX or something like that

src/hotspot/share/memory/arena.hpp line 97:

> 95: 
> 96: #define ARENA_TAG_ENUM_(name) tag_##name
> 97: #define ARENA_TAG_ENUM(name, str, desc) ARENA_TAG_ENUM_(name),

Here, and in other places: Please try to cut down the number of temp. macros. You can just as well do a 

enum class Tag {
#define XX(name, whatever, whatever2) tag_##name
DO_ARENA_TAG(XX)
#undef XX
  num_tags 
};


here.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20304#pullrequestreview-2201007416
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692556908
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692554736
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692556339
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692574321
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692574925
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692577085
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692577477
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692578328
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692579046
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692557726
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692557957
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692559750
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692561100
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1692564129

From mbaesken at openjdk.org  Fri Jul 26 07:40:31 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Fri, 26 Jul 2024 07:40:31 GMT
Subject: RFR: 8333144: docker tests do not work when ubsan is configured
In-Reply-To: <iMT1EV6Ol7de4iryXsfwqufpXOFMoBckgURpg0XRQa8=.6e31894c-f976-4e1b-8295-157303885927@github.com>
References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
 <iMT1EV6Ol7de4iryXsfwqufpXOFMoBckgURpg0XRQa8=.6e31894c-f976-4e1b-8295-157303885927@github.com>
Message-ID: <LPSf0cysKCYU5hiQqDe7fJ0QT_4gXMJ9FVu-cfKCXNc=.c873813f-fed8-418d-9349-55ff41e282eb@github.com>

On Fri, 26 Jul 2024 04:27:04 GMT, David Holmes <dholmes at openjdk.org> wrote:

> I think others who have more investment in this area need to weigh in. I don't know the implications for our infra folk if we need to ensure ubsan is installed.

I think this would be in the standard container config / BUILDFILE we use for the tests.  So if all works well, no implications.

On the other hand, we could also just run the ubsan - based tests with an own exclude/problem list and exclude the docker test that currently cannot work. That needs a separate list but no other src changes like this PR or the idea with adjusted docker container config.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2252155558

From mli at openjdk.org  Fri Jul 26 07:56:08 2024
From: mli at openjdk.org (Hamlin Li)
Date: Fri, 26 Jul 2024 07:56:08 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v5]
In-Reply-To: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
Message-ID: <7Do3NsCKTNc9hpLgInx3V8mAvLEQEmdmP0n5VXy4uck=.e45d67c2-6d72-454e-a836-4cb5886e6066@github.com>

> Hi,
> Can you help to review the patch?
> 
> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it.
> 
> Thanks.
> 
> ## Test
> benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32)
> 
> I've tried several implementations, respectively with vector group
> * m2+m1+scalar
> * m2+scalar
> * m1+scalar
> * pure scalar
> The best one is combination of m2+m1, it have best performance in all source size.
> 
> ### K230
> 
> this implementation (m2+m1)
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic
> -- | -- | -- | -- | -- | -- | -- | -- | --
> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063
> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443
> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364
> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856
> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146
> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423
> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862
> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469
> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631
> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698
> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876
> 
> </google-sheets-html-origin>
> 
> vector with only m2
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px...

Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:

 - Merge branch 'master' into baes64-encode-integrated
 - move label
 - refine code
 - use pure scalar version when rvv is not supported
 - clean code
 - Initial commit

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19973/files
  - new: https://git.openjdk.org/jdk/pull/19973/files/8645a6a1..736f5f8b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=03-04

  Stats: 8439 lines in 328 files changed: 5796 ins; 1367 del; 1276 mod
  Patch: https://git.openjdk.org/jdk/pull/19973.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19973/head:pull/19973

PR: https://git.openjdk.org/jdk/pull/19973

From mli at openjdk.org  Fri Jul 26 08:10:01 2024
From: mli at openjdk.org (Hamlin Li)
Date: Fri, 26 Jul 2024 08:10:01 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v6]
In-Reply-To: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
Message-ID: <0NpNq_wNl-qus6kEr_6J7liSQXXYdjybbWQWDJPGPmQ=.8ba0ea43-2bc7-4f01-afee-adb4a43da29c@github.com>

> Hi,
> Can you help to review the patch?
> 
> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it.
> 
> Thanks.
> 
> ## Test
> benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32)
> 
> I've tried several implementations, respectively with vector group
> * m2+m1+scalar
> * m2+scalar
> * m1+scalar
> * pure scalar
> The best one is combination of m2+m1, it have best performance in all source size.
> 
> ### K230
> 
> this implementation (m2+m1)
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic
> -- | -- | -- | -- | -- | -- | -- | -- | --
> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063
> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443
> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364
> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856
> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146
> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423
> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862
> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469
> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631
> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698
> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876
> 
> </google-sheets-html-origin>
> 
> vector with only m2
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px...

Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits:

 - merge master
 - Merge branch 'master' into baes64-encode-integrated
 - move label
 - refine code
 - use pure scalar version when rvv is not supported
 - clean code
 - Initial commit

-------------

Changes: https://git.openjdk.org/jdk/pull/19973/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19973&range=05
  Stats: 238 lines in 3 files changed: 238 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/19973.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19973/head:pull/19973

PR: https://git.openjdk.org/jdk/pull/19973

From djelinski at openjdk.org  Fri Jul 26 08:18:41 2024
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Fri, 26 Jul 2024 08:18:41 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
Message-ID: <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com>

On Fri, 26 Jul 2024 04:03:10 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
> 
> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
> 
> Testing:
>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>  - tiers 1-3 sanity testing
> 
> Thanks.

src/hotspot/share/utilities/utf8.cpp line 440:

> 438:         // Could be first or fourth byte. If fourth
> 439:         // then 2 bytes before will have second byte pattern (0b1010xxxx)
> 440:         if ((index - 3) >= 0 && ((buffer[index - 2] & 0xA0) == 0xA0)) {

Suggestion:

        if ((index - 3) >= 0 && ((buffer[index - 2] & 0xF0) == 0xA0)) {

src/hotspot/share/utilities/utf8.cpp line 442:

> 440:         if ((index - 3) >= 0 && ((buffer[index - 2] & 0xA0) == 0xA0)) {
> 441:           // it was fourth byte so truncate 3 bytes earlier
> 442:           assert(buffer[index - 3] == 0xED, "malformed sequence");

This needs to be an if, not an assert: ec-a0-80 is a [legitimate 3-byte UTF-8](https://www.compart.com/en/unicode/U+C800)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692684932
PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1692684622

From fgao at openjdk.org  Fri Jul 26 09:39:42 2024
From: fgao at openjdk.org (Fei Gao)
Date: Fri, 26 Jul 2024 09:39:42 GMT
Subject: RFR: 8336245: AArch64: remove extra register copy when converting
 from long to pointer
In-Reply-To: <A7LqCA84i3ml2kFafMJr2_ENuyn9yW-KjBViIryuKBU=.8efd29b0-3636-4ef7-aa2c-dc92228cefc5@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
 <A7LqCA84i3ml2kFafMJr2_ENuyn9yW-KjBViIryuKBU=.8efd29b0-3636-4ef7-aa2c-dc92228cefc5@github.com>
Message-ID: <xwzJoMkq5YYrecjs4BtCmnDdL4ngEWUTICPeEohZu-g=.6cf68070-36f5-43c4-924c-38514746c919@github.com>

On Mon, 15 Jul 2024 11:00:39 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> In the cases like:
>> 
>>   UNSAFE.putLong(address + off1 + 1030, lseed);
>>   UNSAFE.putLong(address + 1023, lseed);
>>   UNSAFE.putLong(address + off2 + 1001, lseed);
>> 
>> 
>> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like:
>> 
>>   ldr  R10, [R15, #120]    # int ! Field: address
>>   ldr  R11, [R16, #136]    # int ! Field: off1
>>   ldr  R12, [R16, #144]    # int ! Field: off2
>>   add  R11, R11, R10
>>   mov R11, R11    # long -> ptr
>>   add  R12, R12, R10
>>   mov R10, R10    # long -> ptr
>>   add R11, R11, #1030    # ptr
>>   str  R17, [R11]    # int
>>   add R10, R10, #1023    # ptr
>>   str  R17, [R10]    # int
>>   mov R10, R12    # long -> ptr
>>   add R10, R10, #1001    # ptr
>>   str  R17, [R10]    # int
>> 
>> 
>> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly:
>> 
>>   ldr    x10, [x15,#120]
>>   ldp    x11, x12, [x16,#136]
>>   add    x11, x11, x10
>>   add    x12, x12, x10
>>   add    x11, x11, #0x406
>>   str    x17, [x11]
>>   add    x10, x10, #0x3ff
>>   str    x17, [x10]
>>   mov    x10, x12  <--- extra register copy
>>   add    x10, x10, #0x3e9
>>   str    x17, [x10]
>> 
>> 
>> There is still one extra register copy, which we're trying to remove in this patch.
>> 
>> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition.
>> 
>> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so
>> 
>> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906
>
> This will need quite a lot of testing, perhaps higher tiers and jcstress. You can test these two PRs together.

@theRealAph @adinn Thanks for your reviews!

I'll integrate it.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20157#issuecomment-2252351336

From fgao at openjdk.org  Fri Jul 26 09:39:43 2024
From: fgao at openjdk.org (Fei Gao)
Date: Fri, 26 Jul 2024 09:39:43 GMT
Subject: Integrated: 8336245: AArch64: remove extra register copy when
 converting from long to pointer
In-Reply-To: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
References: <thW3Lzj_n93-oO5b_FK12iWTO8Wb-O1480uw840nR0o=.cb6e40ea-b60a-449f-a33f-ed6bc3295928@github.com>
Message-ID: <CzXwApza6QlyJkIQX0p4Ddk351Zbsu7c_GhLS331iJ8=.618d3ade-a46b-4102-9cbb-2a5744de01ec@github.com>

On Fri, 12 Jul 2024 13:44:25 GMT, Fei Gao <fgao at openjdk.org> wrote:

> In the cases like:
> 
>   UNSAFE.putLong(address + off1 + 1030, lseed);
>   UNSAFE.putLong(address + 1023, lseed);
>   UNSAFE.putLong(address + off2 + 1001, lseed);
> 
> 
> Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like:
> 
>   ldr  R10, [R15, #120]    # int ! Field: address
>   ldr  R11, [R16, #136]    # int ! Field: off1
>   ldr  R12, [R16, #144]    # int ! Field: off2
>   add  R11, R11, R10
>   mov R11, R11    # long -> ptr
>   add  R12, R12, R10
>   mov R10, R10    # long -> ptr
>   add R11, R11, #1030    # ptr
>   str  R17, [R11]    # int
>   add R10, R10, #1023    # ptr
>   str  R17, [R10]    # int
>   mov R10, R12    # long -> ptr
>   add R10, R10, #1001    # ptr
>   str  R17, [R10]    # int
> 
> 
> In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly:
> 
>   ldr    x10, [x15,#120]
>   ldp    x11, x12, [x16,#136]
>   add    x11, x11, x10
>   add    x12, x12, x10
>   add    x11, x11, #0x406
>   str    x17, [x11]
>   add    x10, x10, #0x3ff
>   str    x17, [x10]
>   mov    x10, x12  <--- extra register copy
>   add    x10, x10, #0x3e9
>   str    x17, [x10]
> 
> 
> There is still one extra register copy, which we're trying to remove in this patch.
> 
> This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition.
> 
> Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so
> 
> [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906

This pull request has now been integrated.

Changeset: d10afa26
Author:    Fei Gao <fgao at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/d10afa26e5c59475e49b353ed34e8e85d0615d92
Stats:     320 lines in 5 files changed: 297 ins; 3 del; 20 mod

8336245: AArch64: remove extra register copy when converting from long to pointer

Co-authored-by: Andrew Haley <aph at openjdk.org>
Reviewed-by: aph, adinn

-------------

PR: https://git.openjdk.org/jdk/pull/20157

From sspitsyn at openjdk.org  Fri Jul 26 10:42:33 2024
From: sspitsyn at openjdk.org (Serguei Spitsyn)
Date: Fri, 26 Jul 2024 10:42:33 GMT
Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations [v2]
In-Reply-To: <yoYPcRiwlovmm5hdLcD8y1d25ABb3r5KUniSzzyfBzI=.be9f5747-fa03-4b13-ba53-4d868ea85989@github.com>
References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com>
 <yoYPcRiwlovmm5hdLcD8y1d25ABb3r5KUniSzzyfBzI=.be9f5747-fa03-4b13-ba53-4d868ea85989@github.com>
Message-ID: <o-XobOOcOSevq7Rfqt6VAKNZ1BdxEGekXLLxsMN0iR4=.276a4f9a-ed9f-472b-ab02-aa73413d1bdf@github.com>

On Thu, 25 Jul 2024 01:53:13 GMT, Alex Menkov <amenkov at openjdk.org> wrote:

>> Obsolete PreserveAllAnnotations flag which was deprecated in JDK 23.
>> 
>> Testing: tier1,tier2,tier3,tier4,hs-tier5-svc
>
> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove test

Looks good. Really nice simplification.
Do I understand it right that all annotations are visible now, or we just do not parse/process invisible ones? If all annotations are visible then can we get rid of the suffix `_visible'?

-------------

Marked as reviewed by sspitsyn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20315#pullrequestreview-2201523609

From kevinw at openjdk.org  Fri Jul 26 11:41:36 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Fri, 26 Jul 2024 11:41:36 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v14]
In-Reply-To: <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com>
Message-ID: <X1PNORe3zCsQbH8DQhGBwUACW8f501e9_IBAmvUiUV8=.ec8e20b1-4b8e-4a92-8654-c2a8d1a9f94d@github.com>

On Thu, 25 Jul 2024 15:31:05 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adding FILE descriptor for help output

One more thing that's troubling me.  (Apologies it's now and not last week.)

I was looking at the _filename.value().get() usage and finding it uncomfortable, compared to the previous simple _filename.value() 8-)
Harder to remember and to read and understand.  Maybe we can avoid the two accessors, it really is just a char*.

These additional argument types look like part of the framework which never found an audience: MemorySizeArgument has one usage in CompilationMemoryStatisticDCmd, NanoTimeArgument looks unused -- so the two-accessor usage is only in once place until now?

Adding FileArgument as another of these might be the wrong direction, as these classes are so almost redundant.

What if we didn't add FileArgument, and kept using <char*> for _filename args/opts:

Then in DCmdArgument<char*>::parse_value(), recognise a "FILE" argument type and call Arguments::copy_expand_pid there, to set _value.

Just seeing if we can cut down some of the complexity here, as Thomas mentioned, it is already very complex for what it is!


(There is also the to_string method which seemed like it would be useful here, but it needs a buffer so is more complex than calling two accessors...  Another thing that seems to part of the framework that was never much adopted.)

-------------

PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2201623426

From jsjolen at openjdk.org  Fri Jul 26 12:46:47 2024
From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=)
Date: Fri, 26 Jul 2024 12:46:47 GMT
Subject: RFR: 8335701: Make GrowableArray templated by an Index [v3]
In-Reply-To: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com>
References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com>
Message-ID: <LG1MWkM8a4plqZXPMsZ81I7z-_9TmZAdXEw4a97lzuE=.430dd51d-3c6b-4bbb-8328-5f0efbb67ccb@github.com>

> Hi,
> 
> Today the GrowableArray has a set index type of `int`, this PR makes it so that you can set your own index type through a template parameter.
> 
> This opens up for a few new design choices:
> 
> - Do you know that you have a very small array? Use an `uint8_t` for len and cap, each.
> - Do you have a very large one? Use an `uint64_t`.
> 
> The code has opted for `int` being default, as to keep identical semantics for all existing code and to let users not have to worry about the index if they don't care.
> 
> One "major" change that I don't want to get lost in the review: I've changed the mid-point calculation to be overflow insensitive without casting.
> 
> 
> 
> // Old 
> mid = ((max + min) / 2);
> // New
> mid = min + ((max - min) / 2);
> 
> Some semi-rigorous thinking:
> min \in [0, len)
> max \in [0, len)
> min <= max
> max - min / 2 \in [0, len/2)
> Maximizing min and max => len + 0
> Maximizing max, minimizing min => len/2
> Minimizing max, maximizing min => max = min => min
> 
> 
> // Proof that they're identical when m, h, l \in N
> (1) m = l + (h - l) / 2 <=>
> 2m = 2l + h - l = h + l
> 
> (2) m = (h + l) / 2 <=>
> 2m = h + l
> (1) = (2)
> QED

Johan Sj?len has updated the pull request incrementally with four additional commits since the last revision:

 - Fix
 - Apparently this(!)
 - This?
 - Use COMMA

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20031/files
  - new: https://git.openjdk.org/jdk/pull/20031/files/b5a87422..937f6eb6

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20031&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20031&range=01-02

  Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/20031.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20031/head:pull/20031

PR: https://git.openjdk.org/jdk/pull/20031

From stuefe at openjdk.org  Fri Jul 26 12:46:48 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Fri, 26 Jul 2024 12:46:48 GMT
Subject: RFR: 8335701: Make GrowableArray templated by an Index [v2]
In-Reply-To: <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com>
References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com>
 <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com>
Message-ID: <iz73YZU43_x7KZg48wTHbIEyRcudWUeyX3FTnNd3u8E=.57951eb8-5915-41bb-b6e0-9d7432761d76@github.com>

On Thu, 4 Jul 2024 13:35:36 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> Hi,
>> 
>> Today the GrowableArray has a set index type of `int`, this PR makes it so that you can set your own index type through a template parameter.
>> 
>> This opens up for a few new design choices:
>> 
>> - Do you know that you have a very small array? Use an `uint8_t` for len and cap, each.
>> - Do you have a very large one? Use an `uint64_t`.
>> 
>> The code has opted for `int` being default, as to keep identical semantics for all existing code and to let users not have to worry about the index if they don't care.
>> 
>> One "major" change that I don't want to get lost in the review: I've changed the mid-point calculation to be overflow insensitive without casting.
>> 
>> 
>> 
>> // Old 
>> mid = ((max + min) / 2);
>> // New
>> mid = min + ((max - min) / 2);
>> 
>> Some semi-rigorous thinking:
>> min \in [0, len)
>> max \in [0, len)
>> min <= max
>> max - min / 2 \in [0, len/2)
>> Maximizing min and max => len + 0
>> Maximizing max, minimizing min => len/2
>> Minimizing max, maximizing min => max = min => min
>> 
>> 
>> // Proof that they're identical when m, h, l \in N
>> (1) m = l + (h - l) / 2 <=>
>> 2m = 2l + h - l = h + l
>> 
>> (2) m = (h + l) / 2 <=>
>> 2m = h + l
>> (1) = (2)
>> QED
>
> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Attempt at fixing GA VMStruct

If this is for src/hotspot/share/nmt/arrayWithFreeList.hpp, would it not be a lot simpler to just implement it there, and give it another backing store?

In particular because after doing all this work it still won't even support the feature I was hoping for, mainly the ability to put an indexed free list atop of existing memory.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20031#issuecomment-2209091083

From jsjolen at openjdk.org  Fri Jul 26 12:46:48 2024
From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=)
Date: Fri, 26 Jul 2024 12:46:48 GMT
Subject: RFR: 8335701: Make GrowableArray templated by an Index [v2]
In-Reply-To: <iz73YZU43_x7KZg48wTHbIEyRcudWUeyX3FTnNd3u8E=.57951eb8-5915-41bb-b6e0-9d7432761d76@github.com>
References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com>
 <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com>
 <iz73YZU43_x7KZg48wTHbIEyRcudWUeyX3FTnNd3u8E=.57951eb8-5915-41bb-b6e0-9d7432761d76@github.com>
Message-ID: <U765IkNQyrp-UH4PUVtyT4D6GsDeqRtJK9qtGIIyO3E=.14aafa51-445c-485e-b90c-fd9bfbef5ff4@github.com>

On Thu, 4 Jul 2024 14:07:57 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> If this is for src/hotspot/share/nmt/arrayWithFreeList.hpp, would it not be a lot simpler to just implement it there, and give it another backing store?
> 
> In particular because after doing all this work it still won't even support the feature I was hoping for, mainly the ability to put an indexed free list atop of existing memory.

I did that first and it sure is simpler, but I'm not sure whether it's a good idea to have to support such a backing storage. See `resizable_array` in here: https://github.com/openjdk/jdk/pull/20002

 Still, it does do what you asked for, kind of, see: `GrowableArrayFromArray`. I can adapt AWFL to be able to use either `GAFA` or `GA`.

It's also not the only reason to do this.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20031#issuecomment-2209118236

From jsjolen at openjdk.org  Fri Jul 26 12:46:48 2024
From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=)
Date: Fri, 26 Jul 2024 12:46:48 GMT
Subject: RFR: 8335701: Make GrowableArray templated by an Index [v2]
In-Reply-To: <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com>
References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com>
 <VN0fNxU6lHhckcxd-NtBrSwE8x5o52dTv89e8NuudGM=.a81cf548-3bb0-4610-a9b6-d783b6311984@github.com>
Message-ID: <MnaoMlToY92Ay91ANlCwVRF1mSyy4_tZnGtF8lbWqFE=.1d381810-1795-4b33-a91d-cb2f1bab66c7@github.com>

On Thu, 4 Jul 2024 13:35:36 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> Hi,
>> 
>> Today the GrowableArray has a set index type of `int`, this PR makes it so that you can set your own index type through a template parameter.
>> 
>> This opens up for a few new design choices:
>> 
>> - Do you know that you have a very small array? Use an `uint8_t` for len and cap, each.
>> - Do you have a very large one? Use an `uint64_t`.
>> 
>> The code has opted for `int` being default, as to keep identical semantics for all existing code and to let users not have to worry about the index if they don't care.
>> 
>> One "major" change that I don't want to get lost in the review: I've changed the mid-point calculation to be overflow insensitive without casting.
>> 
>> 
>> 
>> // Old 
>> mid = ((max + min) / 2);
>> // New
>> mid = min + ((max - min) / 2);
>> 
>> Some semi-rigorous thinking:
>> min \in [0, len)
>> max \in [0, len)
>> min <= max
>> max - min / 2 \in [0, len/2)
>> Maximizing min and max => len + 0
>> Maximizing max, minimizing min => len/2
>> Minimizing max, maximizing min => max = min => min
>> 
>> 
>> // Proof that they're identical when m, h, l \in N
>> (1) m = l + (h - l) / 2 <=>
>> 2m = 2l + h - l = h + l
>> 
>> (2) m = (h + l) / 2 <=>
>> 2m = h + l
>> (1) = (2)
>> QED
>
> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Attempt at fixing GA VMStruct

Compiler issue in linux-x86 seems unrelated to this change.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20031#issuecomment-2252687347

From duke at openjdk.org  Fri Jul 26 12:46:48 2024
From: duke at openjdk.org (Glavo)
Date: Fri, 26 Jul 2024 12:46:48 GMT
Subject: RFR: 8335701: Make GrowableArray templated by an Index [v3]
In-Reply-To: <LG1MWkM8a4plqZXPMsZ81I7z-_9TmZAdXEw4a97lzuE=.430dd51d-3c6b-4bbb-8328-5f0efbb67ccb@github.com>
References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com>
 <LG1MWkM8a4plqZXPMsZ81I7z-_9TmZAdXEw4a97lzuE=.430dd51d-3c6b-4bbb-8328-5f0efbb67ccb@github.com>
Message-ID: <gLnsMoXLgkS5HK75auE3qmpkhkC_OJxci8AqodyteU0=.5b713ab9-5f89-4d60-aa38-e37e4dd8665b@github.com>

On Fri, 26 Jul 2024 12:44:31 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote:

>> Hi,
>> 
>> Today the GrowableArray has a set index type of `int`, this PR makes it so that you can set your own index type through a template parameter.
>> 
>> This opens up for a few new design choices:
>> 
>> - Do you know that you have a very small array? Use an `uint8_t` for len and cap, each.
>> - Do you have a very large one? Use an `uint64_t`.
>> 
>> The code has opted for `int` being default, as to keep identical semantics for all existing code and to let users not have to worry about the index if they don't care.
>> 
>> One "major" change that I don't want to get lost in the review: I've changed the mid-point calculation to be overflow insensitive without casting.
>> 
>> 
>> 
>> // Old 
>> mid = ((max + min) / 2);
>> // New
>> mid = min + ((max - min) / 2);
>> 
>> Some semi-rigorous thinking:
>> min \in [0, len)
>> max \in [0, len)
>> min <= max
>> max - min / 2 \in [0, len/2)
>> Maximizing min and max => len + 0
>> Maximizing max, minimizing min => len/2
>> Minimizing max, maximizing min => max = min => min
>> 
>> 
>> // Proof that they're identical when m, h, l \in N
>> (1) m = l + (h - l) / 2 <=>
>> 2m = 2l + h - l = h + l
>> 
>> (2) m = (h + l) / 2 <=>
>> 2m = h + l
>> (1) = (2)
>> QED
>
> Johan Sj?len has updated the pull request incrementally with four additional commits since the last revision:
> 
>  - Fix
>  - Apparently this(!)
>  - This?
>  - Use COMMA

src/hotspot/share/classfile/classFileParser.hpp line 46:

> 44: class ConstMethod;
> 45: class FieldInfo;
> 46: template<typename E, typename Index>

Suggestion:

template<typename E, typename Index = int>


Is it possible to reduce the changes by providing default parameters?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20031#discussion_r1666447498

From jsjolen at openjdk.org  Fri Jul 26 12:46:48 2024
From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=)
Date: Fri, 26 Jul 2024 12:46:48 GMT
Subject: RFR: 8335701: Make GrowableArray templated by an Index [v3]
In-Reply-To: <gLnsMoXLgkS5HK75auE3qmpkhkC_OJxci8AqodyteU0=.5b713ab9-5f89-4d60-aa38-e37e4dd8665b@github.com>
References: <RdHPj2BymMdh9XdDmzcAtFCxfvfPfA0jxKk5lDK-GPI=.cae72821-07f6-4092-934c-b4bbd08a8167@github.com>
 <LG1MWkM8a4plqZXPMsZ81I7z-_9TmZAdXEw4a97lzuE=.430dd51d-3c6b-4bbb-8328-5f0efbb67ccb@github.com>
 <gLnsMoXLgkS5HK75auE3qmpkhkC_OJxci8AqodyteU0=.5b713ab9-5f89-4d60-aa38-e37e4dd8665b@github.com>
Message-ID: <4H6ngLZ5pODKJSClj8mfQx2_B58hNIyA6yW878uF0zk=.fbe42340-3850-4a6a-a529-2b70a4a37b6c@github.com>

On Fri, 5 Jul 2024 07:38:12 GMT, Glavo <duke at openjdk.org> wrote:

>> Johan Sj?len has updated the pull request incrementally with four additional commits since the last revision:
>> 
>>  - Fix
>>  - Apparently this(!)
>>  - This?
>>  - Use COMMA
>
> src/hotspot/share/classfile/classFileParser.hpp line 46:
> 
>> 44: class ConstMethod;
>> 45: class FieldInfo;
>> 46: template<typename E, typename Index>
> 
> Suggestion:
> 
> template<typename E, typename Index = int>
> 
> 
> Is it possible to reduce the changes by providing default parameters?

Unfortunately, no. Forward decl.s may not re-define the default template argument, even though they are the same as the definition.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20031#discussion_r1666460897

From aph at openjdk.org  Fri Jul 26 15:13:06 2024
From: aph at openjdk.org (Andrew Haley)
Date: Fri, 26 Jul 2024 15:13:06 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v9]
In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
Message-ID: <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com>

> This patch expands the use of a hash table for secondary superclasses
> to the interpreter, C1, and runtime. It also adds a C2 implementation
> of hashed lookup in cases where the superclass isn't known at compile
> time.
> 
> HotSpot shared runtime
> ----------------------
> 
> Building hashed secondary tables is now unconditional. It takes very
> little time, and now that the shared runtime always has the tables, it
> might as well take advantage of them. The shared code is easier to
> follow now, I think.
> 
> There might be a performance issue with x86-64 in that we build
> HotSpot for a default x86-64 target that does not support popcount.
> This means that HotSpot C++ runtime on x86 always uses a software
> emulation for popcount, even though the vast majority of machines made
> for the past 20 years can do popcount in a single instruction. It
> wouldn't be terribly hard to do something about that.
> 
> Having said that, the software popcount is really not bad.
> 
> x86
> ---
> 
> x86 is rather tricky, because we still support
> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
> well as 32- and 64-bit ports. There's some further complication in
> that only `RCX` can be used as a shift count, so there's some register
> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
> rather gnarly, with multiple levels of conditionals at compile time
> and runtime.
> 
> AArch64
> -------
> 
> AArch64 is considerably more straightforward. We always have a
> popcount instruction and (thankfully) no 32-bit code to worry about.
> 
> Generally
> ---------
> 
> I would dearly love simply to rip out the "old" secondary supers cache
> support, but I've left it in just in case someone has a performance
> regression.
> 
> The versions of `MacroAssembler::lookup_secondary_supers_table` that
> work with variable superclasses don't take a fixed set of temp
> registers, and neither do they call out to to a slow path subroutine.
> Instead, the slow patch is expanded inline.
> 
> I don't think this is necessarily bad. Apart from the very rare cases
> where C2 can't determine the superclass to search for at compile time,
> this code is only used for generating stubs, and it seemed to me
> ridiculous to have stubs calling other stubs.
> 
> I've followed the guidance from @iwanowww not to obsess too much about
> the performance of C1-compiled secondary supers lookups, and to prefer
> simplicity over absolute performance. Nonetheless, this is a
> complicated patch that touches many areas.

Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:

  Fix test failure

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19989/files
  - new: https://git.openjdk.org/jdk/pull/19989/files/248f44dc..e9581019

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=07-08

  Stats: 7 lines in 1 file changed: 0 ins; 1 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/19989.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989

PR: https://git.openjdk.org/jdk/pull/19989

From wkemper at openjdk.org  Fri Jul 26 15:32:38 2024
From: wkemper at openjdk.org (William Kemper)
Date: Fri, 26 Jul 2024 15:32:38 GMT
Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update
 mode [v2]
In-Reply-To: <Fp-wdKTTyKEIGRH8CpV38B8gThKzmvNmbrLcN9Yc4Rg=.1bd4121b-9a9a-47d8-850e-1e9bb4dbbfdb@github.com>
References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
 <03bSRAN8T28AU2-M4IzjsBygTwG4SHrc8HUIJYLM5TE=.e4299a87-b25f-471b-9f6e-2c08e741c6f2@github.com>
 <Fp-wdKTTyKEIGRH8CpV38B8gThKzmvNmbrLcN9Yc4Rg=.1bd4121b-9a9a-47d8-850e-1e9bb4dbbfdb@github.com>
Message-ID: <inDl-fffiHKi375I22yY_978HoveyzuZf1tx3RHC-Ks=.2ddea4a6-be29-4b81-9291-8f335583a3df@github.com>

On Thu, 25 Jul 2024 02:24:43 GMT, Y. Srinivas Ramakrishna <ysr at openjdk.org> wrote:

>> William Kemper has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Remove unintentional new line
>
> src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 122:
> 
>> 120: 
>> 121:       ShenandoahMarkRefsClosure<GENERATION> mark_cl(q, rp);
>> 122:       ShenandoahSATBAndRemarkThreadsClosure tc(satb_mq_set, nullptr);
> 
> Because this is the only c'tor usage of this closure, may be get rid of the second argument altogether, and clean up its `do_thread()` method further above at lines 84-89 ?

Right, @shipilev also pointed this out. I've since cleaned it up.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1693258254

From wkemper at openjdk.org  Fri Jul 26 15:32:41 2024
From: wkemper at openjdk.org (William Kemper)
Date: Fri, 26 Jul 2024 15:32:41 GMT
Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update
 mode [v4]
In-Reply-To: <B9KUTPsaCmQ0ewO73Mdh6lHXUpu_lAmCNQ8FC6_fzkU=.e2c55cef-c263-4b41-a7d7-529d7f7740e3@github.com>
References: <cf7yyzQoE0yEh-WGr29pwjB4P5TLaFro1uJhVzlRCzY=.d2eab820-1d79-4784-8406-969026113e01@github.com>
 <cNGAzck0BWF_wvjEGCwPJ4908wvwaWKVOEfql6P105Q=.e3794703-ee0b-4421-b1da-75fd9d09fc1d@github.com>
 <B9KUTPsaCmQ0ewO73Mdh6lHXUpu_lAmCNQ8FC6_fzkU=.e2c55cef-c263-4b41-a7d7-529d7f7740e3@github.com>
Message-ID: <KiP-D8PU6B1hSk7-XAr2h5eGiAzkRwDzS9MoJADv2Js=.47749f28-2094-4d86-88a4-88acc35153a5@github.com>

On Thu, 25 Jul 2024 02:27:20 GMT, Y. Srinivas Ramakrishna <ysr at openjdk.org> wrote:

>> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
>> 
>>  - Merge remote-tracking branch 'openjdk/master' into remove-incremental-update-mode
>>  - Simplify final mark now that incremental update mode is removed
>>  - Remove unintentional new line
>>  - Remove last vestiges of incremental update mode
>>  - Missed test, remove actual IU barrier flag
>>  - Remove missed iu_barrier usages for C1
>>  - Update test (all barriers can be enabled now for all modes)
>>  - WIP: Remove incremental update mode
>
> src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 339:
> 
>> 337:   product(bool, ShenandoahIUBarrier, false, DIAGNOSTIC,                     \
>> 338:           "Turn on/off I-U barriers barriers in Shenandoah")                \
>> 339:                                                                             \
> 
> Not your change, but these doc comments should really say `"Enable blah-blah ..."` rather than `"Turn on/off blah-blah..."`.

Yes, next time we're in this file.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1693259588

From aph at openjdk.org  Fri Jul 26 15:40:35 2024
From: aph at openjdk.org (Andrew Haley)
Date: Fri, 26 Jul 2024 15:40:35 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v8]
In-Reply-To: <2tItgZaRCa5BQrmelOWEsn6FVlHlEvY4is2L1n3HxhE=.eb2519cc-d69e-45e6-8ca3-b5b02565bb76@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <0fk0Qo0HelMbG6l1d-hxUN504qx0ehO9uNxg9JrOeJU=.0b150931-21ea-4383-b6c2-85f6c74958d1@github.com>
 <2tItgZaRCa5BQrmelOWEsn6FVlHlEvY4is2L1n3HxhE=.eb2519cc-d69e-45e6-8ca3-b5b02565bb76@github.com>
Message-ID: <mJPwuVTF3rE7s9bvenMcur1eOkelC2mMzQWgJq_Dwtk=.6f590424-d389-4721-a1aa-08774216746d@github.com>

On Thu, 25 Jul 2024 23:31:21 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> Thanks! The patch looks good, except there was one failure observed during testing with the latest patch [1]. It does look related to the latest changes you did in [54050a5](https://github.com/openjdk/jdk/pull/19989/commits/54050a5c2c0aa1d6a9e36d0240c66345765845e3) about `bitmap == SECONDARY_SUPERS_BITMAP_FULL` check.

Wow! Whoever wrote that test case deserves a bouquet of roses.

It's a super-edge case. If the hash slot of an interface is 63, and the secondaries array length of the Klass we're probing is 63, then the initial probe is at Offset 63, one past the array bounds.

This bug happens because of an "optimization" created during the first round of reviews. If the secondaries array length is >= 62 (_not_ >= 63), we set the secondaries bitmap to `SECONDARY_SUPERS_BITMAP_FULL`. So, the initial probe sees a full bitmap, popcount returns 63,  and we look at  secondary_supers[63]. _Bang_.

We never noticed this problem before because there's no bounds checking in the hand-coded assembly language implementations.

The root cause of this bug is that we're not maintaining the invariant `popcount(bitmap) == secondary_supers()->length()`. There are a couple of ways to fix this. We could check `secondary_supers()->length()` before doing any probe. I'm very reluctant to add a memory load to the super-hot path for this edge case, though. It's better to take some pain in the case of an almost-full secondaries array, because those are very rare, and will never occur in most Java programs. So, i've corrected the bitmap at the point the hash table is constructed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2253019028

From amenkov at openjdk.org  Fri Jul 26 17:59:41 2024
From: amenkov at openjdk.org (Alex Menkov)
Date: Fri, 26 Jul 2024 17:59:41 GMT
Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations [v2]
In-Reply-To: <o-XobOOcOSevq7Rfqt6VAKNZ1BdxEGekXLLxsMN0iR4=.276a4f9a-ed9f-472b-ab02-aa73413d1bdf@github.com>
References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com>
 <yoYPcRiwlovmm5hdLcD8y1d25ABb3r5KUniSzzyfBzI=.be9f5747-fa03-4b13-ba53-4d868ea85989@github.com>
 <o-XobOOcOSevq7Rfqt6VAKNZ1BdxEGekXLLxsMN0iR4=.276a4f9a-ed9f-472b-ab02-aa73413d1bdf@github.com>
Message-ID: <r58fJ8zGBJ158Ucsgpo3dRY3cirAx_5PsdlPozgXwjE=.afe2557e-6603-478b-932e-be72604ecc2c@github.com>

On Fri, 26 Jul 2024 10:39:28 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

> Looks good. Really nice simplification. Do I understand it right that all annotations are visible now, or we just do not parse/process invisible ones? If all annotations are visible then can we get rid of the suffix `_visible'?

We skip (do not process) invisible annotations

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20315#issuecomment-2253225273

From amenkov at openjdk.org  Fri Jul 26 17:59:42 2024
From: amenkov at openjdk.org (Alex Menkov)
Date: Fri, 26 Jul 2024 17:59:42 GMT
Subject: Integrated: 8330427: Obsolete -XX:+PreserveAllAnnotations
In-Reply-To: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com>
References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com>
Message-ID: <uzdS6siIly7pmOkHflwuUg2NBbii8FluRUYMhnH8rAM=.75bfaae5-3986-4d56-b790-f2d6182fddc2@github.com>

On Wed, 24 Jul 2024 18:01:15 GMT, Alex Menkov <amenkov at openjdk.org> wrote:

> Obsolete PreserveAllAnnotations flag which was deprecated in JDK 23.
> 
> Testing: tier1,tier2,tier3,tier4,hs-tier5-svc

This pull request has now been integrated.

Changeset: abc4ca5a
Author:    Alex Menkov <amenkov at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/abc4ca5a8c440f8f3f36a9b35036772c5b5ee7ea
Stats:     378 lines in 7 files changed: 3 ins; 339 del; 36 mod

8330427: Obsolete -XX:+PreserveAllAnnotations

Reviewed-by: dholmes, sspitsyn, coleenp

-------------

PR: https://git.openjdk.org/jdk/pull/20315

From asmehra at openjdk.org  Fri Jul 26 18:23:33 2024
From: asmehra at openjdk.org (Ashutosh Mehra)
Date: Fri, 26 Jul 2024 18:23:33 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic
In-Reply-To: <ej5ON8iDbUMsORwZNuLzDbXERpzGJde7q_hd50vKPGo=.34c0d39e-7a85-49a7-9d10-363a9800cc4d@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
 <ej5ON8iDbUMsORwZNuLzDbXERpzGJde7q_hd50vKPGo=.34c0d39e-7a85-49a7-9d10-363a9800cc4d@github.com>
Message-ID: <1zW4OT5fJqNOIVmEJzaa75P1pkOtTDCc5o3As0Cbrfg=.37b21e54-fb16-4015-a910-40ead48c94b3@github.com>

On Fri, 26 Jul 2024 06:08:03 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031)
>> 
>> Testing:
>>   test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java
>>   test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java
>
> src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 118:
> 
>> 116:   if (total != _current) {
>> 117:     log_info(compilation, alloc)("WARNING!!! Total does not match current");
>> 118:   }
> 
> Why do we calculate total? Just for this test? I would then put this into an ASSERT section, and make the info log an assert. 
> 
> However, I wonder if this is really needed. The logic updating both _current and _tags_size is pretty straightforward, I don't see how there could be a mismatch.

This code should not have been there. I forgot to remove it. There is no use of `total` here.

> src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 204:
> 
>> 202:   size_t _total;
>> 203:   // usage per arena tag when total peaked
>> 204:   size_t _tags_size_at_peak[Arena::tag_count()];
> 
> Can you please make sure Arena::tag_count() evaluates to constexpr? When in doubt, just use the enum value instead.

Arena::tag_count() is declared as a constexpr. I wanted to avoid writing `static_cast<int>(Arena::Tag::tag_count)` every time I need tag_count, so I wrapped it in Arena::tag_count() and declared it with constexpr. Is that not sufficient to make it a constexpr?

> src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 242:
> 
>> 240:     for (int tag = 0; tag < Arena::tag_count(); tag++) {
>> 241:       st->print_cr("  " LEGEND_KEY_FMT ": %s", Arena::tag_name[tag], Arena::tag_desc[tag]);
>> 242:     }
> 
> use x macro?

What do you mean by x macro? Do you have an example that shows the use of x macro?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1693443814
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1693443227
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1693445269

From vlivanov at openjdk.org  Fri Jul 26 18:43:33 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Fri, 26 Jul 2024 18:43:33 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v9]
In-Reply-To: <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com>
Message-ID: <FJz0qOtL2DHVrLC_zwUBm7eMG--I601KNycK4uD8SN4=.0f444c69-eb62-45cc-a92d-6551ce42bf05@github.com>

On Fri, 26 Jul 2024 15:13:06 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix test failure

Oh, it comes as a surprise to me... I was under impression that the first thing hand-coded assembly variants do is check  for `bitmap != SECONDARY_SUPERS_BITMAP_FULL`. At least, it was my recollection from working on [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). (And the initial version of the patch     with the check in `Klass::lookup_secondary_supers_table()` and  `_bitmap(SECONDARY_SUPERS_BITMAP_FULL)` only reassured me that's still the case.)

> The root cause of this bug is that we're not maintaining the invariant popcount(bitmap) == secondary_supers()->length().

The invariant holds only when `bitmap != SECONDARY_SUPERS_BITMAP_FULL`. It does help that even in case of non-hashed `secondary_supers` list `secondary_supers()->length() >= popcount(bitmap)`, but initial probing becomes much less efficient (a random probe followed by a full linear pass over secondary supers list).

Alternatively, all table lookups can be adjusted to start with `bitmap != SECONDARY_SUPERS_BITMAP_FULL` checks before probing the table. It does add a branch on the fast path (and slightly increases inlined snippet code size), but the branch is highly predictable and works on a value we need anyway. So, I would be surprised to see any performance effects from it. IMO it's easier to reason and more flexible: `SECONDARY_SUPERS_BITMAP_FULL == bitmap` simply disables all table lookups and unconditionally falls back to linear search.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2253281836

From aph at openjdk.org  Fri Jul 26 21:36:34 2024
From: aph at openjdk.org (Andrew Haley)
Date: Fri, 26 Jul 2024 21:36:34 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v9]
In-Reply-To: <FJz0qOtL2DHVrLC_zwUBm7eMG--I601KNycK4uD8SN4=.0f444c69-eb62-45cc-a92d-6551ce42bf05@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com>
 <FJz0qOtL2DHVrLC_zwUBm7eMG--I601KNycK4uD8SN4=.0f444c69-eb62-45cc-a92d-6551ce42bf05@github.com>
Message-ID: <LIwHtqflDxdG8s68Pj3OMLNYpRPh19xrJfJreXtOxQc=.00a895b3-1e0f-4db8-a8f8-1d3a531b5a41@github.com>

On Fri, 26 Jul 2024 18:39:27 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> Oh, it comes as a surprise to me... I was under impression that the first thing hand-coded assembly variants do is check for `bitmap != SECONDARY_SUPERS_BITMAP_FULL`. At least, it was my recollection from working on [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). (And the initial version of the patch with the check in `Klass::lookup_secondary_supers_table()` and `_bitmap(SECONDARY_SUPERS_BITMAP_FULL)` only reassured me that's still the case.)
> 
> > The root cause of this bug is that we're not maintaining the invariant popcount(bitmap) == secondary_supers()->length().
> 
> The invariant holds only when `bitmap != SECONDARY_SUPERS_BITMAP_FULL`.

Yes, exactly so. That's what I intended to mean.

> It does help that even in case of non-hashed `secondary_supers` list `secondary_supers()->length() >= popcount(bitmap)`, but initial probing becomes much less efficient (a random probe followed by a full linear pass over secondary supers list).
> 
> Alternatively, all table lookups can be adjusted to start with `bitmap != SECONDARY_SUPERS_BITMAP_FULL` checks before probing the table. It does add a branch on the fast path (and slightly increases inlined snippet code size), but the branch is highly predictable and works on a value we need anyway.

True enough. I've been trying to move as much as I can out of the inlined code, though.

 > So, I would be surprised to see any performance effects from it. IMO it's easier to reason and more flexible: `SECONDARY_SUPERS_BITMAP_FULL == bitmap` simply disables all table lookups and unconditionally falls back to linear search.

I take your point. But that seems like it's a bit of a sledgehammer to crack a walnut, don't you think? Given that it's only necessary to fix a rare edge case. But I'm not going to be precious about this choice, I'll test for a full bitmap first if you prefer. It's only a couple of instructions, but they are in the fast path that is successful in almost 99% of cases.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2253537338

From dholmes at openjdk.org  Fri Jul 26 21:37:43 2024
From: dholmes at openjdk.org (David Holmes)
Date: Fri, 26 Jul 2024 21:37:43 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com>
Message-ID: <vz8nu18FeuJADlmZjGknJXHdBzCkuBxR6-w-18bWboI=.27e0969a-e5bf-43b8-9dc0-b018f7034fe8@github.com>

On Fri, 26 Jul 2024 05:23:28 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
>> 
>> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
>> 
>> Testing:
>>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>>  - tiers 1-3 sanity testing
>> 
>> Thanks.
>
> src/hotspot/share/utilities/exceptions.cpp line 276:
> 
>> 274:   // sequence is valid.
>> 275:   if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {
>> 276:     assert(msg[max_msg_size - 1] == '\0', "should be null terminated");
> 
> Would this always be true? For a formatting error, too?
> Maybe just to be sure, instead of asserting set the last byte to zero.

vsnprintf is supposed to guarantee it, and os::vsnprint does IIRC, so this is just a sanity check.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693624938

From dholmes at openjdk.org  Fri Jul 26 21:46:41 2024
From: dholmes at openjdk.org (David Holmes)
Date: Fri, 26 Jul 2024 21:46:41 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com>
Message-ID: <Bp9RxG0ZfwtVg7p9v_X_ZgogL1U-aG0ha7ME7nKW8c8=.49302a72-48f0-4b5b-bc16-64ff037f6006@github.com>

On Fri, 26 Jul 2024 05:36:42 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
>> 
>> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
>> 
>> Testing:
>>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>>  - tiers 1-3 sanity testing
>> 
>> Thanks.
>
> This is a neat technique, but it won't work for very short strings (e.g. consisting of just one or two multi-byte sequences, the latter being invalid). Reason is that you need a minimal buffer length to do the check safely.
> 
> What you could do alternatively is to allocate the `msg` buffer with 5 leading bytes that you don't use, just zero-initialize. So the string start would be at msg+5. But that way, you can safely overstep the array with e.g. index - 5 without corruption.

Thanks for looking at this @tstuefe  and @djelinski

> src/hotspot/share/utilities/exceptions.cpp line 275:
> 
>> 273:   // we may also have a truncated UTF-8 sequence. In such cases we need to fix the buffer so the UTF-8
>> 274:   // sequence is valid.
>> 275:   if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {
> 
> You should test for length >= 5 since it is the farthest you could access in `UTF8::truncate_to_legal_utf8` later:
> 
> 
>   for (int index = length - 2; index > 0; index--) {
> ...
>     assert(buffer[index - 3] == 0xED, "malformed sequence");

I will update `truncate_to_legal_utf` to ensure we check for small buffers - though of course we would never expect to pass such a buffer to it in the first place.

> src/hotspot/share/utilities/exceptions.cpp line 277:
> 
>> 275:   if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {
>> 276:     assert(msg[max_msg_size - 1] == '\0', "should be null terminated");
>> 277:     UTF8::truncate_to_legal_utf8((unsigned char*)msg, max_msg_size);
> 
> Ah, I misread your patch and thought you pass in the strlen of the message to the truncation function, when in fact you pass in the hard coded message buffer size. 
> 
> But that begs the question of why you test strlen above, and more importantly, whether all cases where snprintf returns an error are truncation problems. It could have detected an invalid UTF8 sequence and aborted in the middle of it.

The `strlen` check is to skip the empty buffer you can get on Windows if vsnprintf returns -1 due to overflow of INT_MAX.

We are assuming/requiring that we start with a valid UTF8 sequence and the worst that will happen is that vsnprintf will truncate it.

If we actually got -1 for a conversion error (no way to tell the difference in the two cases) then we would unnecessarily truncate, but we do not expect any such conversion errors - in part because we type check the format specifiers and args and so should never get a mismatch.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20345#issuecomment-2253549010
PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693627660
PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693627154

From dholmes at openjdk.org  Fri Jul 26 21:46:42 2024
From: dholmes at openjdk.org (David Holmes)
Date: Fri, 26 Jul 2024 21:46:42 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com>
 <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com>
Message-ID: <BGSEf3h_EuLOHuRHwBJl5h_VMezDWyv7j0w4xGgZXeA=.e5919fd1-0544-44ac-b11d-62b19e1c5bc1@github.com>

On Fri, 26 Jul 2024 21:42:32 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> src/hotspot/share/utilities/utf8.cpp line 440:
>> 
>>> 438:         // Could be first or fourth byte. If fourth
>>> 439:         // then 2 bytes before will have second byte pattern (0b1010xxxx)
>>> 440:         if ((index - 3) >= 0 && ((buffer[index - 2] & 0xA0) == 0xA0)) {
>> 
>> Suggestion:
>> 
>>         if ((index - 3) >= 0 && ((buffer[index - 2] & 0xF0) == 0xA0)) {
>
> I don't understand the rationale for the suggestion sorry.

I am looking specifically for the second byte of six pattern.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693630123

From dholmes at openjdk.org  Fri Jul 26 21:46:42 2024
From: dholmes at openjdk.org (David Holmes)
Date: Fri, 26 Jul 2024 21:46:42 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com>
Message-ID: <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com>

On Fri, 26 Jul 2024 08:16:14 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

>> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
>> 
>> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
>> 
>> Testing:
>>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>>  - tiers 1-3 sanity testing
>> 
>> Thanks.
>
> src/hotspot/share/utilities/utf8.cpp line 440:
> 
>> 438:         // Could be first or fourth byte. If fourth
>> 439:         // then 2 bytes before will have second byte pattern (0b1010xxxx)
>> 440:         if ((index - 3) >= 0 && ((buffer[index - 2] & 0xA0) == 0xA0)) {
> 
> Suggestion:
> 
>         if ((index - 3) >= 0 && ((buffer[index - 2] & 0xF0) == 0xA0)) {

I don't understand the rationale for the suggestion sorry.

> src/hotspot/share/utilities/utf8.cpp line 442:
> 
>> 440:         if ((index - 3) >= 0 && ((buffer[index - 2] & 0xA0) == 0xA0)) {
>> 441:           // it was fourth byte so truncate 3 bytes earlier
>> 442:           assert(buffer[index - 3] == 0xED, "malformed sequence");
> 
> This needs to be an if, not an assert: ec-a0-80 is a [legitimate 3-byte UTF-8](https://www.compart.com/en/unicode/U+C800)

Will need to re-examine this part.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693629740
PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693630625

From vlivanov at openjdk.org  Fri Jul 26 22:01:38 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Fri, 26 Jul 2024 22:01:38 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v9]
In-Reply-To: <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com>
Message-ID: <6elom8uMJFqHyYuOgOk56W_YuIczI5vTlDcYKXUKr2Q=.51a5c0e5-6342-4e57-af8e-0fe9b1f3648e@github.com>

On Fri, 26 Jul 2024 15:13:06 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix test failure

Yes, I'm in favor of avoiding probing the table when `SECONDARY_SUPERS_BITMAP_FULL == bitmap`. It doesn't look right when the code treats `secondary_supers` as a table irrespective of whether it was hashed or not. IMO it unnecessarily complicates things and may continue to be a source of bugs.

Also, you can rearrange fast path checks: probe the home slot bit first, then check for `SECONDARY_SUPERS_BITMAP_FULL != bitmap` before accessing `secondary_supers`. It won't help with inlined code size increase, but negative lookups will stay mostly unaffected by the additional check.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2253563566

From vlivanov at openjdk.org  Sat Jul 27 00:03:33 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Sat, 27 Jul 2024 00:03:33 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v9]
In-Reply-To: <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com>
Message-ID: <a8tSEj-R9U4twOMRH_3nlAKLyFT9_OiA2uvO26gaCUs=.a1097ab0-ad3d-4e73-8d3e-2f7a4c4bfc05@github.com>

On Fri, 26 Jul 2024 15:13:06 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix test failure

BTW it makes sense to assert the invariant. Here's a patch (accompanied by a minor cleanup):

diff --git a/src/hotspot/share/oops/instanceKlass.cpp b/src/hotspot/share/oops/instanceKlass.cpp
index b050784dfc5..5413e29defb 100644
--- a/src/hotspot/share/oops/instanceKlass.cpp
+++ b/src/hotspot/share/oops/instanceKlass.cpp
@@ -647,7 +647,7 @@ void InstanceKlass::deallocate_contents(ClassLoaderData* loader_data) {
       !secondary_supers()->is_shared()) {
     MetadataFactory::free_array<Klass*>(loader_data, secondary_supers());
   }
-  set_secondary_supers(nullptr);
+  set_secondary_supers(nullptr, SECONDARY_SUPERS_BITMAP_EMPTY);
 
   deallocate_interfaces(loader_data, super(), local_interfaces(), transitive_interfaces());
   set_transitive_interfaces(nullptr);
diff --git a/src/hotspot/share/oops/klass.cpp b/src/hotspot/share/oops/klass.cpp
index 26ec25d1c80..b1012810be4 100644
--- a/src/hotspot/share/oops/klass.cpp
+++ b/src/hotspot/share/oops/klass.cpp
@@ -319,16 +319,16 @@ bool Klass::can_be_primary_super_slow() const {
     return true;
 }
 
-void Klass::set_secondary_supers(Array<Klass*>* secondaries) {
-  assert(!UseSecondarySupersTable || secondaries == nullptr, "");
-  set_secondary_supers(secondaries, SECONDARY_SUPERS_BITMAP_EMPTY);
-}
-
 void Klass::set_secondary_supers(Array<Klass*>* secondaries, uintx bitmap) {
 #ifdef ASSERT
   if (secondaries != nullptr) {
     uintx real_bitmap = compute_secondary_supers_bitmap(secondaries);
     assert(bitmap == real_bitmap, "must be");
+    if (bitmap != SECONDARY_SUPERS_BITMAP_FULL) {
+      assert(((uint)secondaries->length() == population_count(bitmap)), "required");
+    }
+  } else {
+    assert(bitmap == SECONDARY_SUPERS_BITMAP_EMPTY, "");
   }
 #endif
   _secondary_supers_bitmap = bitmap;
diff --git a/src/hotspot/share/oops/klass.hpp b/src/hotspot/share/oops/klass.hpp
index 2f733e11eef..a9e73e7bcbd 100644
--- a/src/hotspot/share/oops/klass.hpp
+++ b/src/hotspot/share/oops/klass.hpp
@@ -236,7 +236,6 @@ class Klass : public Metadata {
   void set_secondary_super_cache(Klass* k) { _secondary_super_cache = k; }
 
   Array<Klass*>* secondary_supers() const { return _secondary_supers; }
-  void set_secondary_supers(Array<Klass*>* k);
   void set_secondary_supers(Array<Klass*>* k, uintx bitmap);
 
   uint8_t hash_slot() const { return _hash_slot; }

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2253660006

From sspitsyn at openjdk.org  Sat Jul 27 01:31:48 2024
From: sspitsyn at openjdk.org (Serguei Spitsyn)
Date: Sat, 27 Jul 2024 01:31:48 GMT
Subject: RFR: 8330427: Obsolete -XX:+PreserveAllAnnotations [v2]
In-Reply-To: <r58fJ8zGBJ158Ucsgpo3dRY3cirAx_5PsdlPozgXwjE=.afe2557e-6603-478b-932e-be72604ecc2c@github.com>
References: <_2nP9Iruq7HT-LI3HAjSJYs7kubgeqRVQwgtSaLD05Q=.55ddb061-add5-48c1-92ff-53f75b396f54@github.com>
 <yoYPcRiwlovmm5hdLcD8y1d25ABb3r5KUniSzzyfBzI=.be9f5747-fa03-4b13-ba53-4d868ea85989@github.com>
 <o-XobOOcOSevq7Rfqt6VAKNZ1BdxEGekXLLxsMN0iR4=.276a4f9a-ed9f-472b-ab02-aa73413d1bdf@github.com>
 <r58fJ8zGBJ158Ucsgpo3dRY3cirAx_5PsdlPozgXwjE=.afe2557e-6603-478b-932e-be72604ecc2c@github.com>
Message-ID: <46dVI9rDgSSWdNtPFQF-J-QYMFnG-zKUSe2NlOkxUl0=.c923333c-0450-44a0-b1cb-89f5b776806c@github.com>

On Fri, 26 Jul 2024 17:56:27 GMT, Alex Menkov <amenkov at openjdk.org> wrote:

> We skip (do not process) invisible annotations

Okay, thanks.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20315#issuecomment-2253694697

From stuefe at openjdk.org  Sat Jul 27 05:41:36 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sat, 27 Jul 2024 05:41:36 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic
In-Reply-To: <1zW4OT5fJqNOIVmEJzaa75P1pkOtTDCc5o3As0Cbrfg=.37b21e54-fb16-4015-a910-40ead48c94b3@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
 <ej5ON8iDbUMsORwZNuLzDbXERpzGJde7q_hd50vKPGo=.34c0d39e-7a85-49a7-9d10-363a9800cc4d@github.com>
 <1zW4OT5fJqNOIVmEJzaa75P1pkOtTDCc5o3As0Cbrfg=.37b21e54-fb16-4015-a910-40ead48c94b3@github.com>
Message-ID: <3FZJyHPjSnJUN-wuslNgoJDIQu6toFSyhagzBPQ_ZV4=.ccd8162b-bb4a-4594-954e-57f43310f219@github.com>

On Fri, 26 Jul 2024 18:18:45 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote:

>> src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 204:
>> 
>>> 202:   size_t _total;
>>> 203:   // usage per arena tag when total peaked
>>> 204:   size_t _tags_size_at_peak[Arena::tag_count()];
>> 
>> Can you please make sure Arena::tag_count() evaluates to constexpr? When in doubt, just use the enum value instead.
>
> Arena::tag_count() is declared as a constexpr. I wanted to avoid writing `static_cast<int>(Arena::Tag::tag_count)` every time I need tag_count, so I wrapped it in Arena::tag_count() and declared it with constexpr. Is that not sufficient to make it a constexpr?

Okay then, that is fine.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1693848291

From stuefe at openjdk.org  Sat Jul 27 05:46:34 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sat, 27 Jul 2024 05:46:34 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic
In-Reply-To: <1zW4OT5fJqNOIVmEJzaa75P1pkOtTDCc5o3As0Cbrfg=.37b21e54-fb16-4015-a910-40ead48c94b3@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
 <ej5ON8iDbUMsORwZNuLzDbXERpzGJde7q_hd50vKPGo=.34c0d39e-7a85-49a7-9d10-363a9800cc4d@github.com>
 <1zW4OT5fJqNOIVmEJzaa75P1pkOtTDCc5o3As0Cbrfg=.37b21e54-fb16-4015-a910-40ead48c94b3@github.com>
Message-ID: <x1uZAfQc-7Dvbhv5cy7wm7CGyTGWmc8oOCs23DrnXVI=.85639be2-68c2-4c49-9ac2-12cef799f77c@github.com>

On Fri, 26 Jul 2024 18:21:05 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote:

>> src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 242:
>> 
>>> 240:     for (int tag = 0; tag < Arena::tag_count(); tag++) {
>>> 241:       st->print_cr("  " LEGEND_KEY_FMT ": %s", Arena::tag_name[tag], Arena::tag_desc[tag]);
>>> 242:     }
>> 
>> use x macro?
>
> What do you mean by x macro? Do you have an example that shows the use of x macro?

You use them already in your patch.

E.g. 


#define XX(name, whatever, desc) st->print_cr("  " LEGEND_KEY_FMT ": " #name #desc
DO_ARENA_TAG(XX)
#undef XX


Admittedly, it is not a lot less code than the for loop. Up to you.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1693851006

From stuefe at openjdk.org  Sat Jul 27 06:13:34 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Sat, 27 Jul 2024 06:13:34 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v14]
In-Reply-To: <X1PNORe3zCsQbH8DQhGBwUACW8f501e9_IBAmvUiUV8=.ec8e20b1-4b8e-4a92-8654-c2a8d1a9f94d@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com>
 <X1PNORe3zCsQbH8DQhGBwUACW8f501e9_IBAmvUiUV8=.ec8e20b1-4b8e-4a92-8654-c2a8d1a9f94d@github.com>
Message-ID: <d47EseDyodKwKOaWHIo_zDzOj44sQXvZCucr0V0vV8U=.9c847055-903b-45a8-b2ba-c4c27b15211e@github.com>

On Fri, 26 Jul 2024 11:39:02 GMT, Kevin Walls <kevinw at openjdk.org> wrote:

> One more thing that's troubling me. (Apologies it's now and not last week.)
> 
> I was looking at the _filename.value().get() usage and finding it uncomfortable, compared to the previous simple _filename.value() 8-) Harder to remember and to read and understand. Maybe we can avoid the two accessors, it really is just a char*.
> 
> These additional argument types look like part of the framework which never found an audience: MemorySizeArgument has one usage in CompilationMemoryStatisticDCmd, NanoTimeArgument looks unused -- so the two-accessor usage is only in once place until now?
> 
> Adding FileArgument as another of these might be the wrong direction, as these classes are so almost redundant.
> 
> What if we didn't add FileArgument, and kept using <char*> for _filename args/opts:
> 
> Then in DCmdArgument<char*>::parse_value(), recognise a "FILE" argument type and call Arguments::copy_expand_pid there, to set _value.
> 
> Just seeing if we can cut down some of the complexity here, as Thomas mentioned, it is already very complex for what it is!
> 
> (There is also the to_string method which seemed like it would be useful here, but it needs a buffer so is more complex than calling two accessors... Another thing that seems to part of the framework that was never much adopted.)

IMHO for a functional addition we should follow the established pattern. Reworking the framework is certainly useful, but I would like it if we could get this done first (I intend to use it in other DCmds). 

And if we simplify this coding, we should think first about how to do this and what to solve. Things that come to mind:

- overuse of template
- The argument-type-by-template-division and the runtime "type" string argument (the third argument to DCmdArgument) seem redundant
- the fact that we keep command metadata (which should be constant) together with command invocation data (values for arguments that are scoped to a single command invocation) in a single global structure, and then rewrite the latter every time we invoke a command. That is a strange concept and makes cleaning up temporary memory non-trivial
- the fact that each new command takes a ton of boilerplate coding (Just see the many many repetitions in diagnosticCommand.cpp)
- the fact that we use runtime-polymorphy, which is completely fine, but then all metadata information are "static". So in order to e.g. know how many arguments a command takes, you need to know the command class, since you cannot just call e.g. `num_arguments()` on a `DCmdWithParser*`. I think the whole framework could be done without templates, just using plain old virtual functions instead. This is not code where a vtable lookup really hurts.

Just my random thoughts. Maybe there is more, but my point is that if we agree this can be improved, it would be better in a separate RFE, and not mixed into functional RFEs. 

@lmesnik 
> Also I would recommend to get approval from svc team reviewer.

Who could this be, @plummercj ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2253809524

From cjplummer at openjdk.org  Sat Jul 27 07:05:38 2024
From: cjplummer at openjdk.org (Chris Plummer)
Date: Sat, 27 Jul 2024 07:05:38 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v14]
In-Reply-To: <d47EseDyodKwKOaWHIo_zDzOj44sQXvZCucr0V0vV8U=.9c847055-903b-45a8-b2ba-c4c27b15211e@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com>
 <X1PNORe3zCsQbH8DQhGBwUACW8f501e9_IBAmvUiUV8=.ec8e20b1-4b8e-4a92-8654-c2a8d1a9f94d@github.com>
 <d47EseDyodKwKOaWHIo_zDzOj44sQXvZCucr0V0vV8U=.9c847055-903b-45a8-b2ba-c4c27b15211e@github.com>
Message-ID: <V0dcMbHYRhqAtqnMM49cqRjFGzX8VLP627FER7IWXjA=.7c8e21ff-7f83-4b3c-9d53-dae883e58a53@github.com>

On Sat, 27 Jul 2024 06:11:00 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> > Also I would recommend to get approval from svc team reviewer.
> 
> Who could this be, @plummercj ?

@kevinjwalls is on the svc team and has been involved in this review, and @dholmes-ora, @lmesnik, and @AlanBateman all count as svc reviewers.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2253858030

From vlivanov at openjdk.org  Sat Jul 27 07:12:33 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Sat, 27 Jul 2024 07:12:33 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v9]
In-Reply-To: <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <LnkS1a2xutLFBgsUO0b-doRPPTDCBjRAuiMWGquAvhU=.3de28018-570d-49f8-9cc1-4a3ea577a0b9@github.com>
Message-ID: <Smmitd15ELI7ZZgx_6FqgbOv8zKs-Ye8AAgiJntOM7g=.4f7fd243-d220-4401-a6c2-a308e04b6bb5@github.com>

On Fri, 26 Jul 2024 15:13:06 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix test failure

src/hotspot/share/oops/klass.cpp line 285:

> 283: // which doesn't zero out the memory before calling the constructor.
> 284: Klass::Klass(KlassKind kind) : _kind(kind),
> 285:                                _secondary_supers_bitmap(SECONDARY_SUPERS_BITMAP_EMPTY),

Looks like it is redundant since metaspace allocation already initializes memory with zeroes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1693907175

From djelinski at openjdk.org  Sat Jul 27 07:28:36 2024
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Sat, 27 Jul 2024 07:28:36 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <BGSEf3h_EuLOHuRHwBJl5h_VMezDWyv7j0w4xGgZXeA=.e5919fd1-0544-44ac-b11d-62b19e1c5bc1@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com>
 <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com>
 <BGSEf3h_EuLOHuRHwBJl5h_VMezDWyv7j0w4xGgZXeA=.e5919fd1-0544-44ac-b11d-62b19e1c5bc1@github.com>
Message-ID: <3bljylhUd2wp8G_TSbeTaca4F4i4HxJUdfJRUKtLbrw=.f435237b-83f0-4e41-ba67-6375cd0f6a25@github.com>

On Fri, 26 Jul 2024 21:42:58 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> I don't understand the rationale for the suggestion sorry.
>
> I am looking specifically for the second byte of six pattern.

The current code matches the pattern 0b1x1xxxxx, and you want to match 0b1010xxxx

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693912937

From djelinski at openjdk.org  Sat Jul 27 07:28:37 2024
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Sat, 27 Jul 2024 07:28:37 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com>
 <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com>
Message-ID: <sn6y6ztyTPsUW3ysMQC_685RPAxOzfutbYCE_yS1Oxw=.452272d1-43f5-4d07-8a32-9a4cd23ef67d@github.com>

On Fri, 26 Jul 2024 21:43:49 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> src/hotspot/share/utilities/utf8.cpp line 442:
>> 
>>> 440:         if ((index - 3) >= 0 && ((buffer[index - 2] & 0xA0) == 0xA0)) {
>>> 441:           // it was fourth byte so truncate 3 bytes earlier
>>> 442:           assert(buffer[index - 3] == 0xED, "malformed sequence");
>> 
>> This needs to be an if, not an assert: ec-a0-80 is a [legitimate 3-byte UTF-8](https://www.compart.com/en/unicode/U+C800)
>
> Will need to re-examine this part.

Keep in mind that 0xed a few lines above could have matched the first byte and not the fourth one.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693913232

From djelinski at openjdk.org  Sat Jul 27 08:14:31 2024
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Sat, 27 Jul 2024 08:14:31 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
Message-ID: <yTmjLaPOdaxUACJLH6e_6WZpi3nJRdAclowzh55uKmc=.74a06bba-8b03-4aa7-b900-8e1624579f9b@github.com>

On Fri, 26 Jul 2024 04:03:10 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
> 
> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
> 
> Testing:
>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>  - tiers 1-3 sanity testing
> 
> Thanks.

src/hotspot/share/utilities/utf8.cpp line 398:

> 396: // byte sequence.
> 397: static bool is_starting_byte(unsigned char b) {
> 398:   return b >= 0xC0 && b <= 0xEF;;

Do you plan to use this method only for modified UTF-8 or for standard Utf-8 as well? Standard UTF-8 also uses F0-F7 as starting bytes.

Also, remove the double semicolon.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693919203

From dholmes at openjdk.org  Sat Jul 27 12:18:34 2024
From: dholmes at openjdk.org (David Holmes)
Date: Sat, 27 Jul 2024 12:18:34 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <yTmjLaPOdaxUACJLH6e_6WZpi3nJRdAclowzh55uKmc=.74a06bba-8b03-4aa7-b900-8e1624579f9b@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <yTmjLaPOdaxUACJLH6e_6WZpi3nJRdAclowzh55uKmc=.74a06bba-8b03-4aa7-b900-8e1624579f9b@github.com>
Message-ID: <qOpeJqOJUzSaqPZQtnHc_Xh8KXmgPLtaUCMy4DrwKn4=.9cfd92b6-fbf2-4600-a310-2ac32e9b228d@github.com>

On Sat, 27 Jul 2024 08:11:53 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

>> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
>> 
>> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
>> 
>> Testing:
>>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>>  - tiers 1-3 sanity testing
>> 
>> Thanks.
>
> src/hotspot/share/utilities/utf8.cpp line 398:
> 
>> 396: // byte sequence.
>> 397: static bool is_starting_byte(unsigned char b) {
>> 398:   return b >= 0xC0 && b <= 0xEF;;
> 
> Do you plan to use this method only for modified UTF-8 or for standard Utf-8 as well? Standard UTF-8 also uses F0-F7 as starting bytes.
> 
> Also, remove the double semicolon.

AFAIK the VM only deals with modified UTF-8.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693948995

From dholmes at openjdk.org  Sat Jul 27 12:18:35 2024
From: dholmes at openjdk.org (David Holmes)
Date: Sat, 27 Jul 2024 12:18:35 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <sn6y6ztyTPsUW3ysMQC_685RPAxOzfutbYCE_yS1Oxw=.452272d1-43f5-4d07-8a32-9a4cd23ef67d@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com>
 <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com>
 <sn6y6ztyTPsUW3ysMQC_685RPAxOzfutbYCE_yS1Oxw=.452272d1-43f5-4d07-8a32-9a4cd23ef67d@github.com>
Message-ID: <5prDfKO5gMQOw5SCMMpNnWFpy4rust1LIhEeR4wdNek=.dafc8f68-6019-4b53-8405-ba8451b26336@github.com>

On Sat, 27 Jul 2024 07:25:30 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

>> Will need to re-examine this part.
>
> Keep in mind that 0xed a few lines above could have matched the first byte and not the fourth one.

So ... the first three bytes of a six byte sequence can be indistinguishable from a three byte sequence! How does that work???

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693949216

From dholmes at openjdk.org  Sat Jul 27 12:22:30 2024
From: dholmes at openjdk.org (David Holmes)
Date: Sat, 27 Jul 2024 12:22:30 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <vz8nu18FeuJADlmZjGknJXHdBzCkuBxR6-w-18bWboI=.27e0969a-e5bf-43b8-9dc0-b018f7034fe8@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com>
 <vz8nu18FeuJADlmZjGknJXHdBzCkuBxR6-w-18bWboI=.27e0969a-e5bf-43b8-9dc0-b018f7034fe8@github.com>
Message-ID: <Qk1D6SlZr4nMtAPOi2ct8DRkIRZD2vqJI9nzXcxA2DM=.fd981aac-84e3-445b-952a-bd1f0c81e042@github.com>

On Fri, 26 Jul 2024 21:35:08 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> src/hotspot/share/utilities/exceptions.cpp line 276:
>> 
>>> 274:   // sequence is valid.
>>> 275:   if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {
>>> 276:     assert(msg[max_msg_size - 1] == '\0', "should be null terminated");
>> 
>> Would this always be true? For a formatting error, too?
>> Maybe just to be sure, instead of asserting set the last byte to zero.
>
> vsnprintf is supposed to guarantee it, and os::vsnprint does IIRC, so this is just a sanity check.

Yep os::vnsprintf guarantees nul-termination

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693949707

From dholmes at openjdk.org  Sat Jul 27 12:22:31 2024
From: dholmes at openjdk.org (David Holmes)
Date: Sat, 27 Jul 2024 12:22:31 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com>
Message-ID: <DxKmQKHRLgtYvDrRlj3raK94RWxL7yr25eIJ8vv_bSk=.2cdbbea7-8cd9-4dd0-af32-5edf25b610ba@github.com>

On Fri, 26 Jul 2024 05:19:32 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
>> 
>> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
>> 
>> Testing:
>>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>>  - tiers 1-3 sanity testing
>> 
>> Thanks.
>
> src/hotspot/share/utilities/utf8.cpp line 407:
> 
>> 405: // To avoid that the caller can choose to check for validity first.
>> 406: // The incoming buffer is still expected to be NUL-terminated.
>> 407: void UTF8::truncate_to_legal_utf8(unsigned char* buffer, int length) {
> 
> Lets make buffer length size_t and avoid awkward casting

No this code uses `int` for length everywhere. Feel free to file a RFE to change it all.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693949553

From dholmes at openjdk.org  Sat Jul 27 12:33:30 2024
From: dholmes at openjdk.org (David Holmes)
Date: Sat, 27 Jul 2024 12:33:30 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <3bljylhUd2wp8G_TSbeTaca4F4i4HxJUdfJRUKtLbrw=.f435237b-83f0-4e41-ba67-6375cd0f6a25@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com>
 <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com>
 <BGSEf3h_EuLOHuRHwBJl5h_VMezDWyv7j0w4xGgZXeA=.e5919fd1-0544-44ac-b11d-62b19e1c5bc1@github.com>
 <3bljylhUd2wp8G_TSbeTaca4F4i4HxJUdfJRUKtLbrw=.f435237b-83f0-4e41-ba67-6375cd0f6a25@github.com>
Message-ID: <7hvfenQ7f8IClATzJEu_HsTcpaVPMfQj2-Vguh0uvgc=.7e2c029b-aa42-44cb-827c-9c07892e46e5@github.com>

On Sat, 27 Jul 2024 07:23:55 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

>> I am looking specifically for the second byte of six pattern.
>
> The current code matches the pattern 0b1x1xxxxx, and you want to match 0b1010xxxx

Doh! Thanks for that.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693951018

From dholmes at openjdk.org  Sat Jul 27 12:37:32 2024
From: dholmes at openjdk.org (David Holmes)
Date: Sat, 27 Jul 2024 12:37:32 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string
In-Reply-To: <qOpeJqOJUzSaqPZQtnHc_Xh8KXmgPLtaUCMy4DrwKn4=.9cfd92b6-fbf2-4600-a310-2ac32e9b228d@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <yTmjLaPOdaxUACJLH6e_6WZpi3nJRdAclowzh55uKmc=.74a06bba-8b03-4aa7-b900-8e1624579f9b@github.com>
 <qOpeJqOJUzSaqPZQtnHc_Xh8KXmgPLtaUCMy4DrwKn4=.9cfd92b6-fbf2-4600-a310-2ac32e9b228d@github.com>
Message-ID: <2ayYKgAZcb7IUO4P3YMfmWfj_fsjyUBGxj9pvuwS0bg=.3692889f-4345-456f-ab39-faf88e8d340c@github.com>

On Sat, 27 Jul 2024 12:13:55 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> src/hotspot/share/utilities/utf8.cpp line 398:
>> 
>>> 396: // byte sequence.
>>> 397: static bool is_starting_byte(unsigned char b) {
>>> 398:   return b >= 0xC0 && b <= 0xEF;;
>> 
>> Do you plan to use this method only for modified UTF-8 or for standard Utf-8 as well? Standard UTF-8 also uses F0-F7 as starting bytes.
>> 
>> Also, remove the double semicolon.
>
> AFAIK the VM only deals with modified UTF-8.

;; fixed

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1693951565

From fyang at openjdk.org  Mon Jul 29 05:02:34 2024
From: fyang at openjdk.org (Fei Yang)
Date: Mon, 29 Jul 2024 05:02:34 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v6]
In-Reply-To: <0NpNq_wNl-qus6kEr_6J7liSQXXYdjybbWQWDJPGPmQ=.8ba0ea43-2bc7-4f01-afee-adb4a43da29c@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
 <0NpNq_wNl-qus6kEr_6J7liSQXXYdjybbWQWDJPGPmQ=.8ba0ea43-2bc7-4f01-afee-adb4a43da29c@github.com>
Message-ID: <D-LqktQwsa-Mg1zpbQvnI-lKEfL8pdbhKgejdG14OmI=.8b0cd0b7-7527-4281-993a-9f8f7a571bf1@github.com>

On Fri, 26 Jul 2024 08:10:01 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> 
>> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it.
>> 
>> Thanks.
>> 
>> ## Test
>> benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32)
>> 
>> I've tried several implementations, respectively with vector group
>> * m2+m1+scalar
>> * m2+scalar
>> * m1+scalar
>> * pure scalar
>> The best one is combination of m2+m1, it have best performance in all source size.
>> 
>> ### K230
>> 
>> this implementation (m2+m1)
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic
>> -- | -- | -- | -- | -- | -- | -- | -- | --
>> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063
>> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443
>> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364
>> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856
>> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146
>> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423
>> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862
>> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469
>> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631
>> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698
>> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876
>> 
>> </google-sheets-html-origin>
>> 
>> vector with only m2
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 4...
>
> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits:
> 
>  - merge master
>  - Merge branch 'master' into baes64-encode-integrated
>  - move label
>  - refine code
>  - use pure scalar version when rvv is not supported
>  - clean code
>  - Initial commit

Hi, will take a look. BTW: Have you resolved the performance issue of base64 decode instrinsic?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2254948125

From thartmann at openjdk.org  Mon Jul 29 05:33:51 2024
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Mon, 29 Jul 2024 05:33:51 GMT
Subject: Integrated: 8336999: Verification for resource area allocated data
 structures in C2
In-Reply-To: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
References: <9W-oh-GRweInhl9ZMDkZYBanQ-D4pMxFe2PuqhvqmuY=.f83a09fa-c3ed-48dc-80ed-2d580954d1cb@github.com>
Message-ID: <7dRxg7c8RpnnQ3_Y13IYbc_Qj0_yQyPNjDIFFyir5wU=.f991c976-154b-4b45-8009-2023cd42717c@github.com>

On Wed, 24 Jul 2024 10:29:32 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

> Similar to [GrowableArrayNestingCheck](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/growableArray.cpp#L60), we should implement a check for C2's resource allocated data structures that verifies that reallocation happens under the same `ResourceMark` as the original allocation. Otherwise, use-after-free bugs like [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) will lead to memory corruption.
> 
> This change adds a [ReallocMark](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/allocation.cpp#L233) to all resource allocated data structures used by C2. I slightly modified it such that it checks the arena and skips verification if the data is not allocated in the resource arena. I also modified the grow methods such that we perform verification even if no reallocation is required. In addition, I changed a few `Unique_Node_List` allocations in vector.cpp from `comp_arena` to resource area allocations because they only have a short lifetime.
> 
> While testing, I hit the verification code from:
> 
> V  [libjvm.so+0x5c1ceb]  ReallocMark::check(Arena*)+0x7b  (allocation.cpp:244)
> V  [libjvm.so+0x6df2da]  Block_Array::grow(unsigned int)+0x1a  (block.cpp:43)
> V  [libjvm.so+0xb88679]  PhaseCFG::do_DFS(Tarjan*, unsigned int)+0x159  (block.hpp:72)
> V  [libjvm.so+0xb88b6b]  PhaseCFG::build_dominator_tree()+0xab  (domgraph.cpp:74)
> V  [libjvm.so+0xd75791]  PhaseCFG::do_global_code_motion()+0x11  (gcm.cpp:1635)
> V  [libjvm.so+0x9f4fd4]  Compile::Code_Gen()+0x2a4  (compile.cpp:2949)
> V  [libjvm.so+0x9f5f16]  Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, DirectiveSet*)+0xba6  (compile.cpp:991)
> 
> 
> It's a false positive because the code in `PhaseCFG::build_dominator_tree` pre-grows `PhaseCFG::_blocks` to prevent reallocation before entering the scope of a nested ResourceMark. I think that's bad practice and should be avoided. I changed the code to allocate `_blocks` in a separate arena and removed the pre-growing.
> 
> This detects [JDK-8336095](https://bugs.openjdk.org/browse/JDK-8336095) right away, even with `java -Xcomp -version`.
> 
> We should revisit the footprint impact of arena allocations in C2 with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015).
> 
> Thanks,
> Tobias

This pull request has now been integrated.

Changeset: 657c0bdd
Author:    Tobias Hartmann <thartmann at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/657c0bddf90b537ac653817571532705a6e3643a
Stats:     56 lines in 14 files changed: 32 ins; 8 del; 16 mod

8336999: Verification for resource area allocated data structures in C2

Reviewed-by: chagedorn, kvn

-------------

PR: https://git.openjdk.org/jdk/pull/20311

From kevinw at openjdk.org  Mon Jul 29 09:41:39 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Mon, 29 Jul 2024 09:41:39 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v14]
In-Reply-To: <d47EseDyodKwKOaWHIo_zDzOj44sQXvZCucr0V0vV8U=.9c847055-903b-45a8-b2ba-c4c27b15211e@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com>
 <X1PNORe3zCsQbH8DQhGBwUACW8f501e9_IBAmvUiUV8=.ec8e20b1-4b8e-4a92-8654-c2a8d1a9f94d@github.com>
 <d47EseDyodKwKOaWHIo_zDzOj44sQXvZCucr0V0vV8U=.9c847055-903b-45a8-b2ba-c4c27b15211e@github.com>
Message-ID: <IUN4djOhB-ZaEz_AwNwcrQyyrVxdxqc-Th5nclLnPew=.1cc99b12-abc7-4556-b3c3-16219d7dec44@github.com>

On Sat, 27 Jul 2024 06:11:00 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> One more thing that's troubling me.  (Apologies it's now and not last week.)
>> 
>> I was looking at the _filename.value().get() usage and finding it uncomfortable, compared to the previous simple _filename.value() 8-)
>> Harder to remember and to read and understand.  Maybe we can avoid the two accessors, it really is just a char*.
>> 
>> These additional argument types look like part of the framework which never found an audience: MemorySizeArgument has one usage in CompilationMemoryStatisticDCmd, NanoTimeArgument looks unused -- so the two-accessor usage is only in once place until now?
>> 
>> Adding FileArgument as another of these might be the wrong direction, as these classes are so almost redundant.
>> 
>> What if we didn't add FileArgument, and kept using <char*> for _filename args/opts:
>> 
>> Then in DCmdArgument<char*>::parse_value(), recognise a "FILE" argument type and call Arguments::copy_expand_pid there, to set _value.
>> 
>> Just seeing if we can cut down some of the complexity here, as Thomas mentioned, it is already very complex for what it is!
>> 
>> 
>> (There is also the to_string method which seemed like it would be useful here, but it needs a buffer so is more complex than calling two accessors...  Another thing that seems to part of the framework that was never much adopted.)
>
>> One more thing that's troubling me. (Apologies it's now and not last week.)
>> 
>> I was looking at the _filename.value().get() usage and finding it uncomfortable, compared to the previous simple _filename.value() 8-) Harder to remember and to read and understand. Maybe we can avoid the two accessors, it really is just a char*.
>> 
>> These additional argument types look like part of the framework which never found an audience: MemorySizeArgument has one usage in CompilationMemoryStatisticDCmd, NanoTimeArgument looks unused -- so the two-accessor usage is only in once place until now?
>> 
>> Adding FileArgument as another of these might be the wrong direction, as these classes are so almost redundant.
>> 
>> What if we didn't add FileArgument, and kept using <char*> for _filename args/opts:
>> 
>> Then in DCmdArgument<char*>::parse_value(), recognise a "FILE" argument type and call Arguments::copy_expand_pid there, to set _value.
>> 
>> Just seeing if we can cut down some of the complexity here, as Thomas mentioned, it is already very complex for what it is!
>> 
>> (There is also the to_string method which seemed like it would be useful here, but it needs a buffer so is more complex than calling two accessors... Another thing that seems to part of the framework that was never much adopted.)
> 
> IMHO for a functional addition we should follow the established pattern. Reworking the framework is certainly useful, but I would like it if we could get this done first (I intend to use it in other DCmds). 
> 
> And if we simplify this coding, we should think first about how to do this and what to solve. Things that come to mind:
> 
> - overuse of template
> - The argument-type-by-template-division and the runtime "type" string argument (the third argument to DCmdArgument) seem redundant
> - the fact that we keep command metadata (which should be constant) together with command invocation data (values for arguments that are scoped to a single command invocation) in a single global structure, and then rewrite the latter every time we invoke a command. That is a strange concept and makes cleaning up temporary memory non-trivial
> - the fact that each new command takes a ton of boilerplate coding (Just see the many many repetitions in diagnosticCommand.cpp)
> - the fact that we use runtime-polymorphy, which is completely fine, but then all metadata information are "static". So in order to e.g. know how many arguments a command takes, you need to know the command class, since you cannot just call e....

Thanks Thomas @tstuefe -

We're agreeing that some of this framework is overly complex, and that we aren't going to simplify the framework in this change.

But the more we adopt the obscure parts of the framework, the the harder it will be to move away from it, so that's the reason for suggesting not creating the FileArgument class.  Use the simpler parts of this machine, with some special cases where necessary, like a char* argument which happens to be used for a FILEname (an input filename which gets %p substitution).

The logic I don't follow is:
Using this complex mechanism because it exists, when it only has one? actual usage.  This seems to contradict the earlier max path len notes where it's suggested not to use a pattern established by about 140 other usages.

Apologies Sonia for dragging this out, still really pleased to get this change happening.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2255464478

From jwtang at openjdk.org  Mon Jul 29 09:42:09 2024
From: jwtang at openjdk.org (Jiawei Tang)
Date: Mon, 29 Jul 2024 09:42:09 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash when
 running with the javaagent option
Message-ID: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>

I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing.

-------------

Commit messages:
 - 8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option

Changes: https://git.openjdk.org/jdk/pull/20373/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8337331
  Stats: 135 lines in 3 files changed: 135 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20373.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373

PR: https://git.openjdk.org/jdk/pull/20373

From jwtang at openjdk.org  Mon Jul 29 09:49:11 2024
From: jwtang at openjdk.org (Jiawei Tang)
Date: Mon, 29 Jul 2024 09:49:11 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v2]
In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
Message-ID: <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com>

> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing.

Jiawei Tang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:

  8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20373/files
  - new: https://git.openjdk.org/jdk/pull/20373/files/d768df02..00ec5887

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=00-01

  Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20373.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373

PR: https://git.openjdk.org/jdk/pull/20373

From dholmes at openjdk.org  Mon Jul 29 09:54:10 2024
From: dholmes at openjdk.org (David Holmes)
Date: Mon, 29 Jul 2024 09:54:10 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v2]
In-Reply-To: <Bp9RxG0ZfwtVg7p9v_X_ZgogL1U-aG0ha7ME7nKW8c8=.49302a72-48f0-4b5b-bc16-64ff037f6006@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com>
 <Bp9RxG0ZfwtVg7p9v_X_ZgogL1U-aG0ha7ME7nKW8c8=.49302a72-48f0-4b5b-bc16-64ff037f6006@github.com>
Message-ID: <6j41RLigngpx3YFjVu1Fx4btFX4_j_05PmGX4Yr6P4E=.8ec67827-1ec7-4d95-8209-f737604f432d@github.com>

On Fri, 26 Jul 2024 21:39:16 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> src/hotspot/share/utilities/exceptions.cpp line 277:
>> 
>>> 275:   if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {
>>> 276:     assert(msg[max_msg_size - 1] == '\0', "should be null terminated");
>>> 277:     UTF8::truncate_to_legal_utf8((unsigned char*)msg, max_msg_size);
>> 
>> Ah, I misread your patch and thought you pass in the strlen of the message to the truncation function, when in fact you pass in the hard coded message buffer size. 
>> 
>> But that begs the question of why you test strlen above, and more importantly, whether all cases where snprintf returns an error are truncation problems. It could have detected an invalid UTF8 sequence and aborted in the middle of it.
>
> The `strlen` check is to skip the empty buffer you can get on Windows if vsnprintf returns -1 due to overflow of INT_MAX.
> 
> We are assuming/requiring that we start with a valid UTF8 sequence and the worst that will happen is that vsnprintf will truncate it.
> 
> If we actually got -1 for a conversion error (no way to tell the difference in the two cases) then we would unnecessarily truncate, but we do not expect any such conversion errors - in part because we type check the format specifiers and args and so should never get a mismatch.

Thanks for the off-list discussion @djelinski , I now understand what you mean here. Code updated and commented.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1694935121

From dholmes at openjdk.org  Mon Jul 29 09:54:09 2024
From: dholmes at openjdk.org (David Holmes)
Date: Mon, 29 Jul 2024 09:54:09 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v2]
In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
Message-ID: <sZyCrr8Ti9Ad6EiJrSO_1fvCYsmLlrgHgFACt_790_Q=.ac6ceba1-8ffd-46ec-9e30-9ed3e6ad3cf4@github.com>

> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
> 
> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
> 
> Testing:
>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>  - tiers 1-3 sanity testing
> 
> Thanks.

David Holmes has updated the pull request incrementally with three additional commits since the last revision:

 - Fix logic for 4th byte of 6.
 - Fix logic error and typo
 - Ensure the buffer is > 5 bytes

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20345/files
  - new: https://git.openjdk.org/jdk/pull/20345/files/c1a47375..02d636a6

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=00-01

  Stats: 15 lines in 1 file changed: 6 ins; 0 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/20345.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20345/head:pull/20345

PR: https://git.openjdk.org/jdk/pull/20345

From dholmes at openjdk.org  Mon Jul 29 09:56:33 2024
From: dholmes at openjdk.org (David Holmes)
Date: Mon, 29 Jul 2024 09:56:33 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v2]
In-Reply-To: <5prDfKO5gMQOw5SCMMpNnWFpy4rust1LIhEeR4wdNek=.dafc8f68-6019-4b53-8405-ba8451b26336@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <S1NZjbJMW41XauI6C9DQy6i4IPitvkb-1UJWz8Rp3OI=.10e0de51-fe1a-44af-b414-053faf37737b@github.com>
 <Hx2L_c-7TZ4xp3QGfZWrYAsuw35Z4f90q7pMX-SseTE=.30230b2b-efe6-4339-a4bd-6ee12a4a706d@github.com>
 <sn6y6ztyTPsUW3ysMQC_685RPAxOzfutbYCE_yS1Oxw=.452272d1-43f5-4d07-8a32-9a4cd23ef67d@github.com>
 <5prDfKO5gMQOw5SCMMpNnWFpy4rust1LIhEeR4wdNek=.dafc8f68-6019-4b53-8405-ba8451b26336@github.com>
Message-ID: <Ntb7qR15ci_xQZC9N4PWH99wvvkV_Uy_KienpbKm7jU=.2cf22b3c-74c1-4fe6-afef-c10d409f39ca@github.com>

On Sat, 27 Jul 2024 12:15:50 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Keep in mind that 0xed a few lines above could have matched the first byte and not the fourth one.
>
> So ... the first three bytes of a six byte sequence can be indistinguishable from a three byte sequence! How does that work???

github seems to have swallowed my comment so at the risk of repeating it ...

Thanks for the off-list discussion @djelinski , I now understand what you meant. The code and comments are updated.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1694938900

From dholmes at openjdk.org  Mon Jul 29 10:12:32 2024
From: dholmes at openjdk.org (David Holmes)
Date: Mon, 29 Jul 2024 10:12:32 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v2]
In-Reply-To: <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
 <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com>
Message-ID: <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com>

On Mon, 29 Jul 2024 09:49:11 GMT, Jiawei Tang <jwtang at openjdk.org> wrote:

>> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing.
>
> Jiawei Tang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
> 
>   8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option

Can we not just preload the problematic class so that it won't happen during the transition?

test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 2:

> 1: /*
> 2:  * Copyright (c) 2024, 2024, Oracle and/or its affiliates. All rights reserved.

Please only use a single year here.

test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 36:

> 34:  * @run main/othervm/timeout=100 -Djdk.tracePinnedThreads=full TestPinCaseWithTrace
> 35:  * @run main/othervm/timeout=100 -javaagent:TestPinCaseWithTrace.jar TestPinCaseWithTrace
> 36:  * @run main/othervm/timeout=100 -Djdk.tracePinnedThreads=full -javaagent:TestPinCaseWithTrace.jar TestPinCaseWithTrace

Unclear why we need the three variants.

Also where does the timeout value come from? How long does the test take to run?

test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 62:

> 60:     public static void main(String[] args) throws Exception{
> 61:         ExecutorService scheduler = Executors.newFixedThreadPool(1);
> 62:         Thread.Builder builder = TestPinCaseWithTrace.virtualThreadBuilder(scheduler);

Can you not just create a Virtual Thread directly rather than defining a single-threaded executor??

test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/libPinJNI.c line 28:

> 26: JNIEXPORT jint JNICALL
> 27: Java_TestPinCaseWithTrace_nativeFuncPin(JNIEnv* env, jclass klass, jint x) {
> 28:     jmethodID nativeBaz = (*env)->GetStaticMethodID(env, klass, "native2Java", "(I)I");

Suggestion: just use `m` rather than `nativeBaz`.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20373#pullrequestreview-2204496515
PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1694947729
PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1694949670
PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1694952022
PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1694954328

From alanb at openjdk.org  Mon Jul 29 10:12:34 2024
From: alanb at openjdk.org (Alan Bateman)
Date: Mon, 29 Jul 2024 10:12:34 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v2]
In-Reply-To: <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
 <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com>
Message-ID: <1XAlnt_F0byj_fkmh5Ggy2yPINoYi9wbfceo7HRbhfM=.ac35c579-282b-4f3f-8dac-510fde9a1c8a@github.com>

On Mon, 29 Jul 2024 09:49:11 GMT, Jiawei Tang <jwtang at openjdk.org> wrote:

>> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing.
>
> Jiawei Tang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
> 
>   8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option

test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 2:

> 1: /*
> 2:  * Copyright (c) 2024, 2024, Oracle and/or its affiliates. All rights reserved.

I assume you didn't mean to include a date range on a new test.

test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 66:

> 64:             System.out.println("call native: " + nativeFuncPin(1));
> 65:         });
> 66:     }

Does this really need to use a custom scheduler? If not, the running the test with -Djdk.virtualThreadScheduler.maxPoolSize=1 would be simpler. If you really need a custom scheduler, the test can use jdk.test.lib.thread.VThreadScheduler.  Also to create a pining scenario it can use jdk.test.lib.thread.VThreadPinner. You'll see examples of both in other tests.

test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 70:

> 68:     static int native2Java(int b) {
> 69:         try {
> 70:             Thread.sleep(500); // try yield, will pin, javaagent+tracePinnedThreads will crash here (because of the class `PinnedThreadPrinter`)

As noted in the JBS issue, -Djdk.tracePinnedThreads has been very problematic and has been removed in the loom repo as part of the object monitor changes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1694938805
PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1694941417
PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1694942324

From aph at openjdk.org  Mon Jul 29 10:35:07 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 29 Jul 2024 10:35:07 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v10]
In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
Message-ID: <P2b0tKeBww7FoIbgYj9vjJJfEI5sgo2nNsh7g61lynY=.6a90c2e6-d73f-4282-bcd9-157d56253d8c@github.com>

> This patch expands the use of a hash table for secondary superclasses
> to the interpreter, C1, and runtime. It also adds a C2 implementation
> of hashed lookup in cases where the superclass isn't known at compile
> time.
> 
> HotSpot shared runtime
> ----------------------
> 
> Building hashed secondary tables is now unconditional. It takes very
> little time, and now that the shared runtime always has the tables, it
> might as well take advantage of them. The shared code is easier to
> follow now, I think.
> 
> There might be a performance issue with x86-64 in that we build
> HotSpot for a default x86-64 target that does not support popcount.
> This means that HotSpot C++ runtime on x86 always uses a software
> emulation for popcount, even though the vast majority of machines made
> for the past 20 years can do popcount in a single instruction. It
> wouldn't be terribly hard to do something about that.
> 
> Having said that, the software popcount is really not bad.
> 
> x86
> ---
> 
> x86 is rather tricky, because we still support
> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
> well as 32- and 64-bit ports. There's some further complication in
> that only `RCX` can be used as a shift count, so there's some register
> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
> rather gnarly, with multiple levels of conditionals at compile time
> and runtime.
> 
> AArch64
> -------
> 
> AArch64 is considerably more straightforward. We always have a
> popcount instruction and (thankfully) no 32-bit code to worry about.
> 
> Generally
> ---------
> 
> I would dearly love simply to rip out the "old" secondary supers cache
> support, but I've left it in just in case someone has a performance
> regression.
> 
> The versions of `MacroAssembler::lookup_secondary_supers_table` that
> work with variable superclasses don't take a fixed set of temp
> registers, and neither do they call out to to a slow path subroutine.
> Instead, the slow patch is expanded inline.
> 
> I don't think this is necessarily bad. Apart from the very rare cases
> where C2 can't determine the superclass to search for at compile time,
> this code is only used for generating stubs, and it seemed to me
> ridiculous to have stubs calling other stubs.
> 
> I've followed the guidance from @iwanowww not to obsess too much about
> the performance of C1-compiled secondary supers lookups, and to prefer
> simplicity over absolute performance. Nonetheless, this is a
> complicated patch that touches many areas.

Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:

  Minor

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19989/files
  - new: https://git.openjdk.org/jdk/pull/19989/files/e9581019..329f487a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=09
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=08-09

  Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/19989.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989

PR: https://git.openjdk.org/jdk/pull/19989

From alanb at openjdk.org  Mon Jul 29 10:42:30 2024
From: alanb at openjdk.org (Alan Bateman)
Date: Mon, 29 Jul 2024 10:42:30 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v2]
In-Reply-To: <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
 <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com>
 <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com>
Message-ID: <BBJW5dJJzbgBxIu0KJaAu-ArQ_9YWfya1rFY04oH7zA=.02bdd406-a580-48d1-b6c0-abfd9a91d998@github.com>

On Mon, 29 Jul 2024 10:09:36 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Can we not just preload the problematic class so that it won't happen during the transition?

It's potentially dozens of classes as it's everything to support the StackWalker API, stream pipelines, and printing code. This diagnostic option is effectively incompatible with the agents that enable the CFLH event. It has other issues and is really a left over from early development. It has been removed in the loom repo, in favour of better JFR events.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20373#issuecomment-2255588034

From jwtang at openjdk.org  Mon Jul 29 11:21:31 2024
From: jwtang at openjdk.org (Jiawei Tang)
Date: Mon, 29 Jul 2024 11:21:31 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v2]
In-Reply-To: <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
 <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com>
 <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com>
Message-ID: <U3MujQK4Xvw97Zp57AuU4c0qA42igxV8RRX8oCqeaeI=.ea333626-7059-42b5-a802-e0c5dd550328@github.com>

On Mon, 29 Jul 2024 10:01:56 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Jiawei Tang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>> 
>>   8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option
>
> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 36:
> 
>> 34:  * @run main/othervm/timeout=100 -Djdk.tracePinnedThreads=full TestPinCaseWithTrace
>> 35:  * @run main/othervm/timeout=100 -javaagent:TestPinCaseWithTrace.jar TestPinCaseWithTrace
>> 36:  * @run main/othervm/timeout=100 -Djdk.tracePinnedThreads=full -javaagent:TestPinCaseWithTrace.jar TestPinCaseWithTrace
> 
> Unclear why we need the three variants.
> 
> Also where does the timeout value come from? How long does the test take to run?

I will remove the first two variants.
The task will not end because of dead lock in vm. But if the issue is fixed, it can finish in 1s. Considering the differences in platforms and jdk mode(release/debug), I extended the time limit.
I'm not sure if I should set this timeout value.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695042090

From jwtang at openjdk.org  Mon Jul 29 11:30:08 2024
From: jwtang at openjdk.org (Jiawei Tang)
Date: Mon, 29 Jul 2024 11:30:08 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v3]
In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
Message-ID: <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com>

> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing.

Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision:

  changes according to reviewers' advice

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20373/files
  - new: https://git.openjdk.org/jdk/pull/20373/files/00ec5887..723b1ec6

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=01-02

  Stats: 33 lines in 2 files changed: 1 ins; 26 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/20373.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373

PR: https://git.openjdk.org/jdk/pull/20373

From jwtang at openjdk.org  Mon Jul 29 11:30:09 2024
From: jwtang at openjdk.org (Jiawei Tang)
Date: Mon, 29 Jul 2024 11:30:09 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v2]
In-Reply-To: <1XAlnt_F0byj_fkmh5Ggy2yPINoYi9wbfceo7HRbhfM=.ac35c579-282b-4f3f-8dac-510fde9a1c8a@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
 <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com>
 <1XAlnt_F0byj_fkmh5Ggy2yPINoYi9wbfceo7HRbhfM=.ac35c579-282b-4f3f-8dac-510fde9a1c8a@github.com>
Message-ID: <YHMhVuIxVmPr6ulleAU64ZncRv1NGdbky8BIGo0tMT8=.77228095-43d5-445a-ac95-f4dba7f3b814@github.com>

On Mon, 29 Jul 2024 09:53:40 GMT, Alan Bateman <alanb at openjdk.org> wrote:

>> Jiawei Tang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>> 
>>   8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option
>
> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 2:
> 
>> 1: /*
>> 2:  * Copyright (c) 2024, 2024, Oracle and/or its affiliates. All rights reserved.
> 
> I assume you didn't mean to include a date range on a new test.

Change it.

> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 66:
> 
>> 64:             System.out.println("call native: " + nativeFuncPin(1));
>> 65:         });
>> 66:     }
> 
> Does this really need to use a custom scheduler? If not,  running the test with -Djdk.virtualThreadScheduler.maxPoolSize=1 would be simpler. If you really need a custom scheduler, the test can use jdk.test.lib.thread.VThreadScheduler.  Also to create a pinning scenario it can use jdk.test.lib.thread.VThreadPinner to avoid needing to add JNI code. You'll see examples of both in other tests.

Thanks, now I use `-Djdk.virtualThreadScheduler.maxPoolSize=1` instead.

> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 70:
> 
>> 68:     static int native2Java(int b) {
>> 69:         try {
>> 70:             Thread.sleep(500); // try yield, will pin, javaagent+tracePinnedThreads will crash here (because of the class `PinnedThreadPrinter`)
> 
> As noted in the JBS issue, -Djdk.tracePinnedThreads has been very problematic and has been removed in the loom repo as part of the object monitor changes.

I have read the code in loom and this issue can be resolved by using JFR event instead. But I hope this could be fixed since using javaagent is very common in java application. The root cause is that no new class should be use after the vthread is pinned, since a agent can change the class bytecode and need to use `JvmtiVTMSTransitionDisabler` when transforming class. However, this vthread is in VTMS, it cannot jump out the loop.

Using `-Djdk.tracePinnedThreads=full` will use the class `PinnedThreadPrinter` so we end in a deadlock.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695049376
PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695050299
PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695050607

From jwtang at openjdk.org  Mon Jul 29 11:30:09 2024
From: jwtang at openjdk.org (Jiawei Tang)
Date: Mon, 29 Jul 2024 11:30:09 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v2]
In-Reply-To: <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
 <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com>
 <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com>
Message-ID: <HxNigfeSkock1o0ZTmxR02djs0jGr5Hd9wxtB3Nct18=.ae254bc8-fde4-457b-b49c-ee5268e9c68e@github.com>

On Mon, 29 Jul 2024 10:00:23 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Jiawei Tang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>> 
>>   8337331: crash: pinned virtual thread will lead to jvm crash when running with the javaagent option
>
> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 2:
> 
>> 1: /*
>> 2:  * Copyright (c) 2024, 2024, Oracle and/or its affiliates. All rights reserved.
> 
> Please only use a single year here.

Change it.

> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 62:
> 
>> 60:     public static void main(String[] args) throws Exception{
>> 61:         ExecutorService scheduler = Executors.newFixedThreadPool(1);
>> 62:         Thread.Builder builder = TestPinCaseWithTrace.virtualThreadBuilder(scheduler);
> 
> Can you not just create a Virtual Thread directly rather than defining a single-threaded executor??

Now I use `-Djdk.virtualThreadScheduler.maxPoolSize=1` instead.

> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/libPinJNI.c line 28:
> 
>> 26: JNIEXPORT jint JNICALL
>> 27: Java_TestPinCaseWithTrace_nativeFuncPin(JNIEnv* env, jclass klass, jint x) {
>> 28:     jmethodID nativeBaz = (*env)->GetStaticMethodID(env, klass, "native2Java", "(I)I");
> 
> Suggestion: just use `m` rather than `nativeBaz`.

Change it.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695050956
PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695052060
PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695052254

From jwtang at openjdk.org  Mon Jul 29 11:33:31 2024
From: jwtang at openjdk.org (Jiawei Tang)
Date: Mon, 29 Jul 2024 11:33:31 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v2]
In-Reply-To: <BBJW5dJJzbgBxIu0KJaAu-ArQ_9YWfya1rFY04oH7zA=.02bdd406-a580-48d1-b6c0-abfd9a91d998@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
 <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com>
 <jJBqPY8Oq4nyz90eWb6Nn_Kfh7g8F9yUI14S5rLHq4E=.eaac8272-9619-40fd-b804-c1933c340545@github.com>
 <BBJW5dJJzbgBxIu0KJaAu-ArQ_9YWfya1rFY04oH7zA=.02bdd406-a580-48d1-b6c0-abfd9a91d998@github.com>
Message-ID: <sCKSd7Snv3SDNxlbJgaa7Cf8HZMfagRHZFi9dtMxJhI=.930e33ee-3658-429e-b114-b671aace8585@github.com>

On Mon, 29 Jul 2024 10:40:17 GMT, Alan Bateman <alanb at openjdk.org> wrote:

> Can we not just preload the problematic class so that it won't happen during the transition?

I think if a new agent are attached into the running progress, the vm may still crash?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20373#issuecomment-2255688446

From alanb at openjdk.org  Mon Jul 29 12:37:35 2024
From: alanb at openjdk.org (Alan Bateman)
Date: Mon, 29 Jul 2024 12:37:35 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v2]
In-Reply-To: <YHMhVuIxVmPr6ulleAU64ZncRv1NGdbky8BIGo0tMT8=.77228095-43d5-445a-ac95-f4dba7f3b814@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
 <XCGDY44bragDfsG7U5CYPgBFQRoMVtHdn13sWpRdwIE=.17f3be32-1b73-4958-a4f2-ce548299f29a@github.com>
 <1XAlnt_F0byj_fkmh5Ggy2yPINoYi9wbfceo7HRbhfM=.ac35c579-282b-4f3f-8dac-510fde9a1c8a@github.com>
 <YHMhVuIxVmPr6ulleAU64ZncRv1NGdbky8BIGo0tMT8=.77228095-43d5-445a-ac95-f4dba7f3b814@github.com>
Message-ID: <K54bM5EjNkz1hhK0_BYCAJzb9rLgv7QzXVfEaAympok=.3c958e93-93ca-4b06-bb70-d1a6d441764a@github.com>

On Mon, 29 Jul 2024 11:26:06 GMT, Jiawei Tang <jwtang at openjdk.org> wrote:

>> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTraceWithAgent/TestPinCaseWithTrace.java line 70:
>> 
>>> 68:     static int native2Java(int b) {
>>> 69:         try {
>>> 70:             Thread.sleep(500); // try yield, will pin, javaagent+tracePinnedThreads will crash here (because of the class `PinnedThreadPrinter`)
>> 
>> As noted in the JBS issue, -Djdk.tracePinnedThreads has been very problematic and has been removed in the loom repo as part of the object monitor changes.
>
> I have read the code in loom and this issue can be resolved by using JFR event instead. But I hope this could be fixed since using javaagent is very common in java application. The root cause is that no new class should be use after the vthread is pinned, since a agent can change the class bytecode and need to use `JvmtiVTMSTransitionDisabler` when transforming class. However, this vthread is in VTMS, it cannot jump out the loop.
> 
> Using `-Djdk.tracePinnedThreads=full` will use the class `PinnedThreadPrinter` so we end in a deadlock.

I have no objection to fixing JVMTI, I'm just pointing out that -Djdk.tracePinnedThreads has been very problematic and many other reasons so it will be proposed to be removed when we bring the changes to main line.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1695136030

From mbaesken at openjdk.org  Mon Jul 29 12:41:34 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Mon, 29 Jul 2024 12:41:34 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap' [v4]
In-Reply-To: <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
 <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com>
Message-ID: <H5Og5spL1v62N3FGj_p_7-3O_vboWyRJSrMUO5Ygpjs=.0c486cf3-d195-4938-96c3-a841d9172cbc@github.com>

On Thu, 25 Jul 2024 13:42:48 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> When running with ubsan - enabled binaries, some tests trigger the following report :
>> 
>> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
>>     #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91
>>     #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106
>>     #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157
>>     #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63
>>     #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85
>>     #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319
>>     #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233
>>     #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199
>> 
>> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr .
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   add patch of Kim Barrett

Any comments on the change ?
Kim / Richard what you think ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2255822536

From rrich at openjdk.org  Mon Jul 29 12:56:33 2024
From: rrich at openjdk.org (Richard Reingruber)
Date: Mon, 29 Jul 2024 12:56:33 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap' [v4]
In-Reply-To: <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
 <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com>
Message-ID: <MH8_MNPKmpp11HlTW0L42ND2A_PanP7cpOmNqDSPLzo=.732541b1-bca6-4d04-a323-2dad31c63b96@github.com>

On Thu, 25 Jul 2024 13:42:48 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> When running with ubsan - enabled binaries, some tests trigger the following report :
>> 
>> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
>>     #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91
>>     #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106
>>     #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157
>>     #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63
>>     #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85
>>     #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319
>>     #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233
>>     #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199
>> 
>> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr .
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   add patch of Kim Barrett

The change looks good to me.
Thanks, Richard.

I think you should add Kim as contributor (see [here](https://wiki.openjdk.org/display/SKARA/Pull+Request+Commands#PullRequestCommands-/contributor)).

-------------

Marked as reviewed by rrich (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20296#pullrequestreview-2204849049
PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2255855359

From mbaesken at openjdk.org  Mon Jul 29 13:05:32 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Mon, 29 Jul 2024 13:05:32 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap' [v4]
In-Reply-To: <MH8_MNPKmpp11HlTW0L42ND2A_PanP7cpOmNqDSPLzo=.732541b1-bca6-4d04-a323-2dad31c63b96@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
 <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com>
 <MH8_MNPKmpp11HlTW0L42ND2A_PanP7cpOmNqDSPLzo=.732541b1-bca6-4d04-a323-2dad31c63b96@github.com>
Message-ID: <7OzXtBJVimg_9dlx5j8WOSDBU7JDIUbGwIu8Iag6s2A=.309a9594-c0c6-4ab0-9878-e4ac67749ee0@github.com>

On Mon, 29 Jul 2024 12:53:53 GMT, Richard Reingruber <rrich at openjdk.org> wrote:

> I think you should add Kim as contributor (see [here](https://wiki.openjdk.org/display/SKARA/Pull+Request+Commands#PullRequestCommands-/contributor)).

Makes sense, hope we find another second reviewer  (guess as contributor, Kim cannot review)  .

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2255879699

From aph at openjdk.org  Mon Jul 29 13:10:36 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 29 Jul 2024 13:10:36 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v10]
In-Reply-To: <P2b0tKeBww7FoIbgYj9vjJJfEI5sgo2nNsh7g61lynY=.6a90c2e6-d73f-4282-bcd9-157d56253d8c@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <P2b0tKeBww7FoIbgYj9vjJJfEI5sgo2nNsh7g61lynY=.6a90c2e6-d73f-4282-bcd9-157d56253d8c@github.com>
Message-ID: <iYkxv36NxYN4Q4gvJTmC4c9kJtl0CnY1VpRZS9utBU8=.6dd575df-d3f7-4151-99dc-119879f4ac23@github.com>

On Mon, 29 Jul 2024 10:35:07 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Minor

I  promise that if you say you really want this change I will do it, but there is a cost I want to make clear.

Adding the full-bitmap test at the start of the fast-path code increases the execution time in the case of `SecondarySupersLookup.testPositive03` from 5 cycles/op to 5.5 cycles/op on average. It also adds at least 5 bytes (8 bytes for AArch64) to the inline code size, depending on how you do it.

In contrast, my proposed fix makes the invariant `pocount(bitmap) >= secondary_supers.length` truly invariant, and changes the full-bitmap test at the start of the slow path thusly to a void a performance regression with a nearly-full bitmap:


--- a/src/hotspot/cpu/x86/macroAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/macroAssembler_x86.cpp
@@ -5212,8 +5212,8 @@ void MacroAssembler::lookup_secondary_supers_table_slow_path(Register r_super_kl
   // The bitmap is full to bursting.
   // Implicit invariant: BITMAP_FULL implies (length > 0)
   assert(Klass::SECONDARY_SUPERS_BITMAP_FULL == ~uintx(0), "");
-  cmpq(r_bitmap, (int32_t)-1); // sign-extends immediate to 64-bit value
-  jcc(Assembler::equal, L_huge);
+  cmpq(r_array_length, (int32_t)SECONDARY_SUPERS_TABLE_SIZE - 2);
+  jcc(Assembler::greater, L_huge);
 
@@ -344,11 +370,12 @@ uintx Klass::hash_secondary_supers(Array<Klass*>* secondaries, bool rewrite) {
     return uintx(1) << hash_slot;
   }
 
--- a/src/hotspot/share/oops/klass.cpp
+++ b/src/hotspot/share/oops/klass.cpp
@@ -344,11 +370,12 @@ uintx Klass::hash_secondary_supers(Array<Klass*>* secondaries, bool rewrite) {
     return uintx(1) << hash_slot;
   }
 
-  // For performance reasons we don't use a hashed table unless there
-  // are at least two empty slots in it. If there were only one empty
-  // slot it'd take a long time to create the table and the resulting
-  // search would be no faster than linear probing.
-  if (length > SECONDARY_SUPERS_TABLE_SIZE - 2) {
+  // Invariant: _secondary_supers.length >= population_count(_secondary_supers_bitmap)
+
+  // Don't attempt to hash a table that's completely full, because in
+  // the case of an absent interface linear probing would not
+  // terminate.
+  if (length >= SECONDARY_SUPERS_TABLE_SIZE) {
     return SECONDARY_SUPERS_BITMAP_FULL;
   }
 


So, what I'm suggesting is a bit smaller, a bit faster, and less work for me. On the other hand you say

> It doesn't look right when the code treats secondary_supers as a table irrespective of whether it was hashed or not. IMO > it unnecessarily complicates things and may continue to be a source of bugs.

I agree about the "It doesn't look right" part, but I'm not sure I agree about the cause of the bug. IMO, that was the failure to make the `pocount(bitmap) >= secondary_supers.length` truly invariant.

Your call.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2255892483

From djelinski at openjdk.org  Mon Jul 29 13:28:32 2024
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Mon, 29 Jul 2024 13:28:32 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v2]
In-Reply-To: <sZyCrr8Ti9Ad6EiJrSO_1fvCYsmLlrgHgFACt_790_Q=.ac6ceba1-8ffd-46ec-9e30-9ed3e6ad3cf4@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <sZyCrr8Ti9Ad6EiJrSO_1fvCYsmLlrgHgFACt_790_Q=.ac6ceba1-8ffd-46ec-9e30-9ed3e6ad3cf4@github.com>
Message-ID: <fyVtacPVdwpKbRQIu8icJ08uNq5MW4AqIN7V8zoeemU=.83792ea3-4c65-49e8-9f9b-bacecad37115@github.com>

On Mon, 29 Jul 2024 09:54:09 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
>> 
>> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
>> 
>> Testing:
>>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>>  - tiers 1-3 sanity testing
>> 
>> Thanks.
>
> David Holmes has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - Fix logic for 4th byte of 6.
>  - Fix logic error and typo
>  - Ensure the buffer is > 5 bytes

src/hotspot/share/utilities/exceptions.cpp line 275:

> 273:   // we may also have a truncated UTF-8 sequence. In such cases we need to fix the buffer so the UTF-8
> 274:   // sequence is valid.
> 275:   if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {

Do we need to check if `strlen(msg) == max_msg_size - 1`? If strlen is shorter, the bytes between the null terminator and max_msg_size are undefined, which might trigger an assertion while truncating.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1695241148

From aph at openjdk.org  Mon Jul 29 13:48:35 2024
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 29 Jul 2024 13:48:35 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v10]
In-Reply-To: <iYkxv36NxYN4Q4gvJTmC4c9kJtl0CnY1VpRZS9utBU8=.6dd575df-d3f7-4151-99dc-119879f4ac23@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <P2b0tKeBww7FoIbgYj9vjJJfEI5sgo2nNsh7g61lynY=.6a90c2e6-d73f-4282-bcd9-157d56253d8c@github.com>
 <iYkxv36NxYN4Q4gvJTmC4c9kJtl0CnY1VpRZS9utBU8=.6dd575df-d3f7-4151-99dc-119879f4ac23@github.com>
Message-ID: <YmFTuQlVBScM1KWJiChk8pvoC1cTJs2fZJUym_ltH4E=.cac5810b-d345-49f7-9d84-419fc0440960@github.com>

On Mon, 29 Jul 2024 13:07:48 GMT, Andrew Haley <aph at openjdk.org> wrote:

> Adding the full-bitmap test at the start of the fast-path code increases the execution time...

Oh, and I think it'll slow down the slow path a little bit too, but that's much less important.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2255994481

From mli at openjdk.org  Mon Jul 29 14:49:34 2024
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 29 Jul 2024 14:49:34 GMT
Subject: RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v6]
In-Reply-To: <D-LqktQwsa-Mg1zpbQvnI-lKEfL8pdbhKgejdG14OmI=.8b0cd0b7-7527-4281-993a-9f8f7a571bf1@github.com>
References: <ik4NwkRGTrHtnMU2Vww_OlJzC2cJSu9Ss9E-i2ucz4o=.0b30b458-c676-48f6-8ab7-933328fd41f5@github.com>
 <0NpNq_wNl-qus6kEr_6J7liSQXXYdjybbWQWDJPGPmQ=.8ba0ea43-2bc7-4f01-afee-adb4a43da29c@github.com>
 <D-LqktQwsa-Mg1zpbQvnI-lKEfL8pdbhKgejdG14OmI=.8b0cd0b7-7527-4281-993a-9f8f7a571bf1@github.com>
Message-ID: <Ua01ftmAzg2WJZFBwP2F4sjJRhxhRtllepahph5oOhM=.304a6fca-2c81-44ae-b699-032981218177@github.com>

On Mon, 29 Jul 2024 05:00:13 GMT, Fei Yang <fyang at openjdk.org> wrote:

> Hi, will take a look. 

Thanks!

> BTW: Have you resolved the performance issue of base64 decode instrinsic?

It only bring regression in MIME cases, I did some investigation, but not found root cause yet.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2256144590

From asmehra at openjdk.org  Mon Jul 29 14:49:48 2024
From: asmehra at openjdk.org (Ashutosh Mehra)
Date: Mon, 29 Jul 2024 14:49:48 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic [v2]
In-Reply-To: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
Message-ID: <5fyuvwoHRU_EUT2tvUsWwzCjd7dazKHMiL0rGWW8jVU=.fed6e33a-7a22-4b4c-950f-d19c18ee0eaf@github.com>

> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031)
> 
> Testing:
>   test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java
>   test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java

Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision:

  Address review comments by Thomas S.
  
  Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20304/files
  - new: https://git.openjdk.org/jdk/pull/20304/files/1ffbd696..008ac6b9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20304&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20304&range=00-01

  Stats: 91 lines in 4 files changed: 31 ins; 30 del; 30 mod
  Patch: https://git.openjdk.org/jdk/pull/20304.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20304/head:pull/20304

PR: https://git.openjdk.org/jdk/pull/20304

From asmehra at openjdk.org  Mon Jul 29 16:05:32 2024
From: asmehra at openjdk.org (Ashutosh Mehra)
Date: Mon, 29 Jul 2024 16:05:32 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic
In-Reply-To: <yqXqOJCnOBQToVnGiTvMv9SRVECCZuArbWqfiVEj6VE=.eb63b66f-63a5-4c51-8757-87f2694afd98@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
 <yqXqOJCnOBQToVnGiTvMv9SRVECCZuArbWqfiVEj6VE=.eb63b66f-63a5-4c51-8757-87f2694afd98@github.com>
Message-ID: <9i2-oCVJ5XhCtD5vwX7sgpjg4kbUp-BsKpakhYgE28Q=.5013ceda-dd76-4417-9c81-a8bb3898483d@github.com>

On Wed, 24 Jul 2024 10:45:05 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031)
>> 
>> Testing:
>>   test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java
>>   test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java
>
> I plan to look at this later this week.

@tstuefe I have added a patch to address your review comments.

> or just write a small wrapper class holding a size_t vector and taking care of the copying.

Using a wrapper class is a good idea. I have added `ArenaTagsCounter` for that.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20304#issuecomment-2256315773

From asmehra at openjdk.org  Mon Jul 29 16:05:33 2024
From: asmehra at openjdk.org (Ashutosh Mehra)
Date: Mon, 29 Jul 2024 16:05:33 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic [v2]
In-Reply-To: <x1uZAfQc-7Dvbhv5cy7wm7CGyTGWmc8oOCs23DrnXVI=.85639be2-68c2-4c49-9ac2-12cef799f77c@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
 <ej5ON8iDbUMsORwZNuLzDbXERpzGJde7q_hd50vKPGo=.34c0d39e-7a85-49a7-9d10-363a9800cc4d@github.com>
 <1zW4OT5fJqNOIVmEJzaa75P1pkOtTDCc5o3As0Cbrfg=.37b21e54-fb16-4015-a910-40ead48c94b3@github.com>
 <x1uZAfQc-7Dvbhv5cy7wm7CGyTGWmc8oOCs23DrnXVI=.85639be2-68c2-4c49-9ac2-12cef799f77c@github.com>
Message-ID: <UPINfxkYkLCDl7Pd-3g4wRe-99GF8Uete1tWwqJcXn4=.94c80dca-5867-42d5-b5bd-e06e859a4045@github.com>

On Sat, 27 Jul 2024 05:44:14 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> What do you mean by x macro? Do you have an example that shows the use of x macro?
>
> You use them already in your patch.
> 
> E.g. 
> 
> 
> #define XX(name, whatever, desc) st->print_cr("  " LEGEND_KEY_FMT ": " #name #desc
> DO_ARENA_TAG(XX)
> #undef XX
> 
> 
> Admittedly, it is not a lot less code than the for loop. Up to you.

I will keep the loop if you don't have strong preference for the macro usage here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1695486252

From kbarrett at openjdk.org  Mon Jul 29 18:17:31 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Mon, 29 Jul 2024 18:17:31 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap' [v4]
In-Reply-To: <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
 <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com>
Message-ID: <5teiNpibz_Cv-qbIMPkPkoti1tG20tptXcVpaOByWZM=.7f2c67f0-c3f8-4121-b476-2232dc5ab891@github.com>

On Thu, 25 Jul 2024 13:42:48 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> When running with ubsan - enabled binaries, some tests trigger the following report :
>> 
>> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
>>     #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91
>>     #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106
>>     #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157
>>     #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63
>>     #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85
>>     #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319
>>     #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233
>>     #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199
>> 
>> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr .
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   add patch of Kim Barrett

It's annoying that we have all these very similar platform-specific classes of
non-trivial size.  It seems like it ought to be possible to refactor to reduce the
duplication.  But it might not be worth the trouble, and certainly not as part of
this change.

The additions @MBaesken has made to my prototype look good to me.

Looks good except missing some copyright updates.

I think when incorporating something like my suggested changes, the PR author
can be considered to have reviewed them.  The goal is to have some number of
people look over the code and approve all the pieces (an author and 2 reviewers).
At least, that's my recollection of some prior discussions of situations like this.
But I agree it can feel a little incestuous having 2 authors who are playing a
reviewer roll for the other's work, and especially when there's some back and
forth on it.

-------------

Changes requested by kbarrett (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20296#pullrequestreview-2205667735

From szaldana at openjdk.org  Mon Jul 29 19:01:02 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Mon, 29 Jul 2024 19:01:02 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v15]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <oVnMKyMgEthsHECfO9kDZrJ0XEzHZ40cx_x2HHO1ujw=.2ec8a7f1-3e4b-4ad9-a0b9-35008e3abc9f@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  Reverting FileArgument and using char* instead

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/71d3d140..2cccc9b4

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=14
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=13-14

  Stats: 68 lines in 5 files changed: 12 ins; 41 del; 15 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From szaldana at openjdk.org  Mon Jul 29 19:05:07 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Mon, 29 Jul 2024 19:05:07 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v16]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <z3eAxxssepMc8ha_D4nbGErIPB6OCb6CfNfi94WxA5c=.aef9fa4d-2898-44bb-a94e-c0e68b5b606a@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  Reverting some lingering changes

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/2cccc9b4..7f22495a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=15
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=14-15

  Stats: 6 lines in 2 files changed: 2 ins; 3 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From szaldana at openjdk.org  Mon Jul 29 19:08:17 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Mon, 29 Jul 2024 19:08:17 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v17]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <fyALiuRCBIdxuyUue80jejw0G9ChAh4Y0kn--lbTTHY=.ea8dd1ae-6cec-416c-976b-fe027732dd79@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  last lingering change

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/7f22495a..ceb96eb9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=16
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=15-16

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From szaldana at openjdk.org  Mon Jul 29 19:10:34 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Mon, 29 Jul 2024 19:10:34 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v14]
In-Reply-To: <IUN4djOhB-ZaEz_AwNwcrQyyrVxdxqc-Th5nclLnPew=.1cc99b12-abc7-4556-b3c3-16219d7dec44@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <rKeKx8FnFBhN6mW30EXQDJcETtRcLimDZwu_Z3VQdyA=.5b821a7b-3753-4146-89bb-f5a64effc8c5@github.com>
 <X1PNORe3zCsQbH8DQhGBwUACW8f501e9_IBAmvUiUV8=.ec8e20b1-4b8e-4a92-8654-c2a8d1a9f94d@github.com>
 <d47EseDyodKwKOaWHIo_zDzOj44sQXvZCucr0V0vV8U=.9c847055-903b-45a8-b2ba-c4c27b15211e@github.com>
 <IUN4djOhB-ZaEz_AwNwcrQyyrVxdxqc-Th5nclLnPew=.1cc99b12-abc7-4556-b3c3-16219d7dec44@github.com>
Message-ID: <vhdHYIvzAjRezuTlEt9NNWFNfEUoYS22Ab-ICYSzh2Q=.fe7b9958-034b-498e-87a5-878fd21339d7@github.com>

On Mon, 29 Jul 2024 09:39:07 GMT, Kevin Walls <kevinw at openjdk.org> wrote:

>>> One more thing that's troubling me. (Apologies it's now and not last week.)
>>> 
>>> I was looking at the _filename.value().get() usage and finding it uncomfortable, compared to the previous simple _filename.value() 8-) Harder to remember and to read and understand. Maybe we can avoid the two accessors, it really is just a char*.
>>> 
>>> These additional argument types look like part of the framework which never found an audience: MemorySizeArgument has one usage in CompilationMemoryStatisticDCmd, NanoTimeArgument looks unused -- so the two-accessor usage is only in once place until now?
>>> 
>>> Adding FileArgument as another of these might be the wrong direction, as these classes are so almost redundant.
>>> 
>>> What if we didn't add FileArgument, and kept using <char*> for _filename args/opts:
>>> 
>>> Then in DCmdArgument<char*>::parse_value(), recognise a "FILE" argument type and call Arguments::copy_expand_pid there, to set _value.
>>> 
>>> Just seeing if we can cut down some of the complexity here, as Thomas mentioned, it is already very complex for what it is!
>>> 
>>> (There is also the to_string method which seemed like it would be useful here, but it needs a buffer so is more complex than calling two accessors... Another thing that seems to part of the framework that was never much adopted.)
>> 
>> IMHO for a functional addition we should follow the established pattern. Reworking the framework is certainly useful, but I would like it if we could get this done first (I intend to use it in other DCmds). 
>> 
>> And if we simplify this coding, we should think first about how to do this and what to solve. Things that come to mind:
>> 
>> - overuse of template
>> - The argument-type-by-template-division and the runtime "type" string argument (the third argument to DCmdArgument) seem redundant
>> - the fact that we keep command metadata (which should be constant) together with command invocation data (values for arguments that are scoped to a single command invocation) in a single global structure, and then rewrite the latter every time we invoke a command. That is a strange concept and makes cleaning up temporary memory non-trivial
>> - the fact that each new command takes a ton of boilerplate coding (Just see the many many repetitions in diagnosticCommand.cpp)
>> - the fact that we use runtime-polymorphy, which is completely fine, but then all metadata information are "static". So in order to e.g. know how many arguments a command takes, you need to know the command c...
>
> Thanks Thomas @tstuefe -
> 
> We're agreeing that some of this framework is overly complex, and that we aren't going to simplify the framework in this change.
> 
> But the more we adopt the obscure parts of the framework, the the harder it will be to move away from it, so that's the reason for suggesting not creating the FileArgument class.  Use the simpler parts of this machine, with some special cases where necessary, like a char* argument which happens to be used for a FILEname (an input filename which gets %p substitution).
> 
> The logic I don't follow is:
> Using this complex mechanism because it exists, when it only has one? actual usage.  This seems to contradict the earlier max path len notes where it's suggested not to use a pattern established by about 140 other usages.
> 
> Apologies Sonia for dragging this out, still really pleased to get this change happening.

Hi @kevinjwalls, 

I reverted back to `char*` and modified parsing based on the type `FILE`.

Really hope this reaches a consensus!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2256698014

From lmesnik at openjdk.org  Mon Jul 29 19:29:35 2024
From: lmesnik at openjdk.org (Leonid Mesnik)
Date: Mon, 29 Jul 2024 19:29:35 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v17]
In-Reply-To: <fyALiuRCBIdxuyUue80jejw0G9ChAh4Y0kn--lbTTHY=.ea8dd1ae-6cec-416c-976b-fe027732dd79@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <fyALiuRCBIdxuyUue80jejw0G9ChAh4Y0kn--lbTTHY=.ea8dd1ae-6cec-416c-976b-fe027732dd79@github.com>
Message-ID: <6gCx1ciA8eMVyM90LMRHr2YcKyTZuPCn8423YIT88aU=.35b32524-4c9d-4974-a67d-2eab1146c258@github.com>

On Mon, 29 Jul 2024 19:08:17 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   last lingering change

Looks good also. The main goal of my request was to unify arguments paring. /using type FILE for 'char *' seems enough and no need to introduce new filename type for values.
Thanks for fixing.

-------------

Marked as reviewed by lmesnik (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2205814072

From dholmes at openjdk.org  Mon Jul 29 22:02:10 2024
From: dholmes at openjdk.org (David Holmes)
Date: Mon, 29 Jul 2024 22:02:10 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v3]
In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
Message-ID: <jaHgSfRfceQDzSZUwxoGgSGEPto-ev11jLPUPB29lLU=.0a882ecd-aa7f-4385-8979-4e5cb1a91a16@github.com>

> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
> 
> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
> 
> Testing:
>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>  - tiers 1-3 sanity testing
> 
> Thanks.

David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:

 - Merge branch 'master' into 8325002-fthrow
 - Fix logic for 4th byte of 6.
 - Fix logic error and typo
 - Ensure the buffer is > 5 bytes
 - Fixed typo
 - 8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20345/files
  - new: https://git.openjdk.org/jdk/pull/20345/files/02d636a6..794da826

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=01-02

  Stats: 13051 lines in 379 files changed: 7576 ins; 4001 del; 1474 mod
  Patch: https://git.openjdk.org/jdk/pull/20345.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20345/head:pull/20345

PR: https://git.openjdk.org/jdk/pull/20345

From sspitsyn at openjdk.org  Mon Jul 29 22:39:36 2024
From: sspitsyn at openjdk.org (Serguei Spitsyn)
Date: Mon, 29 Jul 2024 22:39:36 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v3]
In-Reply-To: <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
 <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com>
Message-ID: <A8wM3bukziE67E6BEq1fo8wM-fXbbvME_k4zoQTGtSY=.e967a17a-ec4d-4d38-83b5-e5578a05d2b6@github.com>

On Mon, 29 Jul 2024 11:30:08 GMT, Jiawei Tang <jwtang at openjdk.org> wrote:

>> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing.
>
> Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   changes according to reviewers' advice

src/hotspot/share/prims/jvmtiExport.cpp line 970:

> 968:     if (_thread->is_in_any_VTMS_transition()) {
> 969:       return; // no events should be posted if thread is in any VTMS transition
> 970:     }

This is not right place to fix it.

This would be better:

@@ -1091,8 +1091,8 @@ bool JvmtiExport::post_class_file_load_hook(Symbol* h_name,
   if (JvmtiEnv::get_phase() < JVMTI_PHASE_PRIMORDIAL) {
     return false;
   }
-  if (JavaThread::current()->is_in_tmp_VTMS_transition()) {
-    return false; // skip CFLH events in tmp VTMS transition
+  if (thread->is_in_any_VTMS_transition()) {
+    return; // no events should be posted if thread is in any VTMS transition
   }
 
   JvmtiClassFileLoadHookPoster poster(h_name, class_loader,


Also, there is a check in the constructor `JvmtiClassFileLoadHookPoster()`:

    if (_thread->is_in_any_VTMS_transition()) {
      return; // no events should be posted if thread is in any VTMS transition
    }

It is better to replace it with assert. With the right check in the `JvmtiExport::post_class_file_load_hook()` we should never call this constructor and `poster.post()`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1696039799

From sspitsyn at openjdk.org  Mon Jul 29 23:01:31 2024
From: sspitsyn at openjdk.org (Serguei Spitsyn)
Date: Mon, 29 Jul 2024 23:01:31 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v3]
In-Reply-To: <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
 <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com>
Message-ID: <yBWdB5qfG39speceqxReLp2SRTzlOk3bWt1rjGK83lA=.041249fc-d4b5-4c81-9dc8-4193d82e3a28@github.com>

On Mon, 29 Jul 2024 11:30:08 GMT, Jiawei Tang <jwtang at openjdk.org> wrote:

>> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing.
>
> Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   changes according to reviewers' advice

Could you please, do some test renaming/refactoring?
We have a number of `.c` JVMTI agents in the testbase. The plan is to convert them to `.cpp` in the future.
Can you convert the test use .cpp as well?
Also, I'm suggesting to name test directory/files as below:
- TestPinCaseWithCFLH/TestPinCaseWithCFLH.java
- TestPinCaseWithCFLH/libTestPinCaseWithCFLH.cpp

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20373#issuecomment-2257149690

From vlivanov at openjdk.org  Mon Jul 29 23:12:36 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Mon, 29 Jul 2024 23:12:36 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v10]
In-Reply-To: <P2b0tKeBww7FoIbgYj9vjJJfEI5sgo2nNsh7g61lynY=.6a90c2e6-d73f-4282-bcd9-157d56253d8c@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
 <P2b0tKeBww7FoIbgYj9vjJJfEI5sgo2nNsh7g61lynY=.6a90c2e6-d73f-4282-bcd9-157d56253d8c@github.com>
Message-ID: <fJXkUbnJ4S1vZ1qR-MS6oDQZ63DmFjqwEnM9zL0U_nI=.38ef928d-b916-45ad-b64f-b0a0bc3affe3@github.com>

On Mon, 29 Jul 2024 10:35:07 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Minor

Thanks for the numbers, Andrew. I agree that the fix you propose is simple and conservative which makes it very appealing.

First of all, does the bug have to be addressed separately? It affects 23, so we need to backport the fix anyway. 

Also, I took a closer look at the implementation.

A couple of observations:
* as of now, there are 4 platforms which support secondary supers table lookup (so, all of them have to be adjusted if any platform-specific changes are needed);
* there are existing implicit assumptions on `SECONDARY_SUPERS_BITMAP_FULL` (e.g.,  `secondary_supsers_bitmap == SECONDARY_SUPERS_BITMAP_FULL => secondary_supsers->length() > 0`.
 
I thought about use cases for `SECONDARY_SUPERS_BITMAP_FULL` as a kill switch for table lookups, but don't see anything important (once secondary supers are hashed unconditionally).

Let's do the fix you propose for now.

We can reconsider the decision later if any interesting use cases show up. The downside is that there'll be more platform-specific code to touch, but that looks like a fair price to pay.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2257160673

From dholmes at openjdk.org  Mon Jul 29 23:18:35 2024
From: dholmes at openjdk.org (David Holmes)
Date: Mon, 29 Jul 2024 23:18:35 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v2]
In-Reply-To: <fyVtacPVdwpKbRQIu8icJ08uNq5MW4AqIN7V8zoeemU=.83792ea3-4c65-49e8-9f9b-bacecad37115@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <sZyCrr8Ti9Ad6EiJrSO_1fvCYsmLlrgHgFACt_790_Q=.ac6ceba1-8ffd-46ec-9e30-9ed3e6ad3cf4@github.com>
 <fyVtacPVdwpKbRQIu8icJ08uNq5MW4AqIN7V8zoeemU=.83792ea3-4c65-49e8-9f9b-bacecad37115@github.com>
Message-ID: <lh8krMqdhecXEkx5-8ndf88RVyKTpK32hkrQt9_POP0=.439c434c-0074-48ed-ad97-ce146f76c236@github.com>

On Mon, 29 Jul 2024 13:26:12 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

>> David Holmes has updated the pull request incrementally with three additional commits since the last revision:
>> 
>>  - Fix logic for 4th byte of 6.
>>  - Fix logic error and typo
>>  - Ensure the buffer is > 5 bytes
>
> src/hotspot/share/utilities/exceptions.cpp line 275:
> 
>> 273:   // we may also have a truncated UTF-8 sequence. In such cases we need to fix the buffer so the UTF-8
>> 274:   // sequence is valid.
>> 275:   if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {
> 
> Do we need to check if `strlen(msg) == max_msg_size - 1`? If strlen is shorter, the bytes between the null terminator and max_msg_size are undefined, which might trigger an assertion while truncating.

In fact we know it may be shorter than `max_msg_size - 1` - that is what we get on macOS if the string is huge and exceeds `INT_MAX` causing `vsnprintf` to return -1. I originally had an assert that failed due to that.

I need to fix this case as well. <sigh>. Good catch.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1696067908

From nprasad at openjdk.org  Mon Jul 29 23:27:48 2024
From: nprasad at openjdk.org (Neethu Prasad)
Date: Mon, 29 Jul 2024 23:27:48 GMT
Subject: RFR: 8334230: Optimize C2 classes layout [v2]
In-Reply-To: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>
References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>
Message-ID: <ROSSy11IiafrTR2QH6Ig_yr9f5iC6lmJ5IIKNlMOJ6k=.f1083873-ff85-4dc3-9266-304db685be05@github.com>

> **Notes**
> 
> Rearrange C2 class fields to optimize footprint.
> 
> 
> **Verification**
> 
> 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests.
> 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes.
> 
> | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding |
> | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- |
> | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 ->  0 | 11 -> 0  | 56 bytes -> 48 | 0 -> 3 |
> | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 ->  0 | 5 -> 0  | 24 bytes -> 16 | 7 -> 4 |
> | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 ->  0 | 7 -> 0  | 56 bytes -> 48 | 7 -> 6 |
> | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 ->  0 | 8 -> 0  | 32 bytes -> 24 | 1 -> 1 |
> 
> class ArrayPointer {
> 	const class Node  *        _pointer;             /*     0     8 */
> 	const class Node  *        _base;                /*     8     8 */
> 	const jlong                _constant_offset;     /*    16     8 */
> 	const class Node  *        _int_offset;          /*    24     8 */
> 	const class GrowableArray<Node*>  * _other_offsets; /*    32     8 */
> 	const jint                 _int_offset_shift;    /*    40     4 */
> 	const bool                 _is_valid;            /*    44     1 */
> public:
> 
> 
> 	/* size: 48, cachelines: 1, members: 7 */
> 	/* padding: 3 */
> 	/* last cacheline: 48 bytes */
> };
> 
> 
> 
> class CallJavaNode : public CallNode {
> public:
> 
> 	/* class CallNode            <ancestor>; */      /*     0   128 */
> protected:
> 
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	class ciMethod *           _method;              /*   128     8 */
> 	bool                       _optimized_virtual;   /*   136     1 */
> 	bool                       _method_handle_invoke; /*   137     1 */
> 	bool                       _override_symbolic_info; /*   138     1 */
> 	bool                       _arg_escape;          /*   139     1 */
> public:
> 
> protected:
> 
> public:
> 
> 
> 	/* size: 144, cachelines: 3, members: 6 */
> 	/* padding: 4 */
> 	/* last cacheline: 16 bytes */
> 
> 	/* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */
> };
> 
> 
> 
> class C2Access : public StackObj {
> public:
> 
> 	/* class StackObj            <ancestor>; */      /*     0     0 */
> 
> 	/* XXX last struct has 1 byte of padding */
> 
> 	int ()(void) * *           _vptr.C2Access;       /*     0     8 */
> protected:
> 
> 	DecoratorSet               _decorators;          /*     8  ...

Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision:

  Undo constructor arg order change

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19861/files
  - new: https://git.openjdk.org/jdk/pull/19861/files/f5309ce4..668f8c66

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19861&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19861&range=00-01

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/19861.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19861/head:pull/19861

PR: https://git.openjdk.org/jdk/pull/19861

From kbarrett at openjdk.org  Mon Jul 29 23:28:33 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Mon, 29 Jul 2024 23:28:33 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap' [v4]
In-Reply-To: <5teiNpibz_Cv-qbIMPkPkoti1tG20tptXcVpaOByWZM=.7f2c67f0-c3f8-4121-b476-2232dc5ab891@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
 <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com>
 <5teiNpibz_Cv-qbIMPkPkoti1tG20tptXcVpaOByWZM=.7f2c67f0-c3f8-4121-b476-2232dc5ab891@github.com>
Message-ID: <EEVGWDmtho4fv8fWW9gkAczQpuPEsOWkIbGviHXAEtA=.5dc72232-d3ce-4b1e-a342-3065a99cbb38@github.com>

On Mon, 29 Jul 2024 18:14:59 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> I think when incorporating something like my suggested changes, the PR author can be considered to have reviewed them. The goal is to have some number of people look over the code and approve all the pieces (an author and 2 reviewers). At least, that's my recollection of some prior discussions of situations like this. But I agree it can feel a little incestuous having 2 authors who are playing a reviewer roll for the other's work, and especially when there's some back and forth on it.

I did some asking around about this, and it seems my old info is stale and we should usually have reviewers who are
distinct from any of the contributors.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2257175455

From nprasad at openjdk.org  Tue Jul 30 00:53:04 2024
From: nprasad at openjdk.org (Neethu Prasad)
Date: Tue, 30 Jul 2024 00:53:04 GMT
Subject: RFR: 8334230: Optimize C2 classes layout [v3]
In-Reply-To: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>
References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>
Message-ID: <9Xqj8lhtk5xtM-NHRl-GBFTZSzQcNw8yYo_ket5U0aM=.a752c907-e9c7-47cc-87c1-bf6bf0a3d642@github.com>

> **Notes**
> 
> Rearrange C2 class fields to optimize footprint.
> 
> 
> **Verification**
> 
> 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests.
> 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes.
> 
> | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding |
> | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- |
> | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 ->  0 | 11 -> 0  | 56 bytes -> 48 | 0 -> 3 |
> | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 ->  0 | 5 -> 0  | 24 bytes -> 16 | 7 -> 4 |
> | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 ->  0 | 7 -> 0  | 56 bytes -> 48 | 7 -> 6 |
> | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 ->  0 | 8 -> 0  | 32 bytes -> 24 | 1 -> 1 |
> 
> class ArrayPointer {
> 	const class Node  *        _pointer;             /*     0     8 */
> 	const class Node  *        _base;                /*     8     8 */
> 	const jlong                _constant_offset;     /*    16     8 */
> 	const class Node  *        _int_offset;          /*    24     8 */
> 	const class GrowableArray<Node*>  * _other_offsets; /*    32     8 */
> 	const jint                 _int_offset_shift;    /*    40     4 */
> 	const bool                 _is_valid;            /*    44     1 */
> public:
> 
> 
> 	/* size: 48, cachelines: 1, members: 7 */
> 	/* padding: 3 */
> 	/* last cacheline: 48 bytes */
> };
> 
> 
> 
> class CallJavaNode : public CallNode {
> public:
> 
> 	/* class CallNode            <ancestor>; */      /*     0   128 */
> protected:
> 
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	class ciMethod *           _method;              /*   128     8 */
> 	bool                       _optimized_virtual;   /*   136     1 */
> 	bool                       _method_handle_invoke; /*   137     1 */
> 	bool                       _override_symbolic_info; /*   138     1 */
> 	bool                       _arg_escape;          /*   139     1 */
> public:
> 
> protected:
> 
> public:
> 
> 
> 	/* size: 144, cachelines: 3, members: 6 */
> 	/* padding: 4 */
> 	/* last cacheline: 16 bytes */
> 
> 	/* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */
> };
> 
> 
> 
> class C2Access : public StackObj {
> public:
> 
> 	/* class StackObj            <ancestor>; */      /*     0     0 */
> 
> 	/* XXX last struct has 1 byte of padding */
> 
> 	int ()(void) * *           _vptr.C2Access;       /*     0     8 */
> protected:
> 
> 	DecoratorSet               _decorators;          /*     8  ...

Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision:

  Address constructor order issue for C2OptAccess

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19861/files
  - new: https://git.openjdk.org/jdk/pull/19861/files/668f8c66..490c381e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19861&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19861&range=01-02

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/19861.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19861/head:pull/19861

PR: https://git.openjdk.org/jdk/pull/19861

From amenkov at openjdk.org  Tue Jul 30 02:04:59 2024
From: amenkov at openjdk.org (Alex Menkov)
Date: Tue, 30 Jul 2024 02:04:59 GMT
Subject: RFR: 8331015: Obsolete -XX:+UseNotificationThread
Message-ID: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com>

Obsolete UseNotificationThread flag which was deprecated in JDK 23.

Testing: tier1..tier5

-------------

Commit messages:
 - fix

Changes: https://git.openjdk.org/jdk/pull/20381/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20381&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8331015
  Stats: 41 lines in 7 files changed: 1 ins; 31 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/20381.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20381/head:pull/20381

PR: https://git.openjdk.org/jdk/pull/20381

From dholmes at openjdk.org  Tue Jul 30 02:48:39 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 30 Jul 2024 02:48:39 GMT
Subject: RFR: 8331015: Obsolete -XX:+UseNotificationThread
In-Reply-To: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com>
References: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com>
Message-ID: <Xfw-QCRB3ppSexvfQI5AaZ0Pt1baX5g9M98znhWnc2g=.f2fbf09b-c9e3-4e6f-80f8-c9e3775689d2@github.com>

On Tue, 30 Jul 2024 01:57:33 GMT, Alex Menkov <amenkov at openjdk.org> wrote:

> Obsolete UseNotificationThread flag which was deprecated in JDK 23.
> 
> Testing: tier1..tier5

Looks good!

Thanks

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20381#pullrequestreview-2206468050

From kbarrett at openjdk.org  Tue Jul 30 03:39:58 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Tue, 30 Jul 2024 03:39:58 GMT
Subject: RFR: 8337416: Fix -Wzero-as-null-pointer-constant warnings in misc.
 runtime code
Message-ID: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com>

Please review this (perhaps trivial?) change that removes some uses of literal
0 as a null pointer constant in misc. runtime code.  Most are changed to use
nullptr. 

Testing: mach5 tier1

-------------

Commit messages:
 - fix simple runtime

Changes: https://git.openjdk.org/jdk/pull/20383/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20383&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8337416
  Stats: 17 lines in 10 files changed: 0 ins; 0 del; 17 mod
  Patch: https://git.openjdk.org/jdk/pull/20383.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20383/head:pull/20383

PR: https://git.openjdk.org/jdk/pull/20383

From dholmes at openjdk.org  Tue Jul 30 04:12:11 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 30 Jul 2024 04:12:11 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v4]
In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
Message-ID: <ZVf8-PqRdsaSkXDK_PGBeQIWxNV9zY3Id57z8TBP78Q=.e5d32a11-db6f-468d-af60-54d2c91b11a0@github.com>

> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
> 
> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
> 
> Testing:
>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>  - tiers 1-3 sanity testing
> 
> Thanks.

David Holmes has updated the pull request incrementally with one additional commit since the last revision:

  Fix logic to allow for the buffer being partially filled.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20345/files
  - new: https://git.openjdk.org/jdk/pull/20345/files/794da826..77079058

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=02-03

  Stats: 11 lines in 1 file changed: 8 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/20345.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20345/head:pull/20345

PR: https://git.openjdk.org/jdk/pull/20345

From kbarrett at openjdk.org  Tue Jul 30 04:18:02 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Tue, 30 Jul 2024 04:18:02 GMT
Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in prims
 code
Message-ID: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>

Please review this change that removes some uses of literal 0 as a null
pointer constant in prims code.

Testing: mach5 tier1

-------------

Commit messages:
 - fix warnings in prims

Changes: https://git.openjdk.org/jdk/pull/20385/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20385&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8337418
  Stats: 26 lines in 7 files changed: 0 ins; 3 del; 23 mod
  Patch: https://git.openjdk.org/jdk/pull/20385.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20385/head:pull/20385

PR: https://git.openjdk.org/jdk/pull/20385

From dholmes at openjdk.org  Tue Jul 30 04:19:32 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 30 Jul 2024 04:19:32 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v2]
In-Reply-To: <lh8krMqdhecXEkx5-8ndf88RVyKTpK32hkrQt9_POP0=.439c434c-0074-48ed-ad97-ce146f76c236@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <sZyCrr8Ti9Ad6EiJrSO_1fvCYsmLlrgHgFACt_790_Q=.ac6ceba1-8ffd-46ec-9e30-9ed3e6ad3cf4@github.com>
 <fyVtacPVdwpKbRQIu8icJ08uNq5MW4AqIN7V8zoeemU=.83792ea3-4c65-49e8-9f9b-bacecad37115@github.com>
 <lh8krMqdhecXEkx5-8ndf88RVyKTpK32hkrQt9_POP0=.439c434c-0074-48ed-ad97-ce146f76c236@github.com>
Message-ID: <XlPDI-Ag825A9SAeOMivR4dv_ClpklfuAPOoceX35CI=.50ee33cc-cb36-4498-abbf-5087703208be@github.com>

On Mon, 29 Jul 2024 23:16:14 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> src/hotspot/share/utilities/exceptions.cpp line 275:
>> 
>>> 273:   // we may also have a truncated UTF-8 sequence. In such cases we need to fix the buffer so the UTF-8
>>> 274:   // sequence is valid.
>>> 275:   if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {
>> 
>> Do we need to check if `strlen(msg) == max_msg_size - 1`? If strlen is shorter, the bytes between the null terminator and max_msg_size are undefined, which might trigger an assertion while truncating.
>
> In fact we know it may be shorter than `max_msg_size - 1` - that is what we get on macOS if the string is huge and exceeds `INT_MAX` causing `vsnprintf` to return -1. I originally had an assert that failed due to that.
> 
> I need to fix this case as well. <sigh>. Good catch.

I did some experimentation here and it seems in practice that if the buffer is only partially filled then it will contain valid data as `vsnprintf` would stop filling at a well-defined point. But as it is not a clearly specified area we pass the buffer through anyway, using `strlen(msg)` . Most of the time a partially filled buffer will end with an ASCII character anyway and so the utf8 truncation operation will be a no-op.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1696265017

From dholmes at openjdk.org  Tue Jul 30 04:19:33 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 30 Jul 2024 04:19:33 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v4]
In-Reply-To: <Bp9RxG0ZfwtVg7p9v_X_ZgogL1U-aG0ha7ME7nKW8c8=.49302a72-48f0-4b5b-bc16-64ff037f6006@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <wHY5e9XeMFpUyA7Zr0RKG2zIXC3rB5dqklIuzb8TnAQ=.55cc765a-6ec8-46dc-8cf1-4fe49d4aa476@github.com>
 <Bp9RxG0ZfwtVg7p9v_X_ZgogL1U-aG0ha7ME7nKW8c8=.49302a72-48f0-4b5b-bc16-64ff037f6006@github.com>
Message-ID: <UjgJ4jR3nx62K-v1JVXXQYxFxstfns-d_DaNdQgdraI=.51c4881c-b431-4793-b2ca-8081d7760129@github.com>

On Fri, 26 Jul 2024 21:39:16 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> src/hotspot/share/utilities/exceptions.cpp line 277:
>> 
>>> 275:   if ((ret == -1 || ret >= max_msg_size) && strlen(msg) > 0) {
>>> 276:     assert(msg[max_msg_size - 1] == '\0', "should be null terminated");
>>> 277:     UTF8::truncate_to_legal_utf8((unsigned char*)msg, max_msg_size);
>> 
>> Ah, I misread your patch and thought you pass in the strlen of the message to the truncation function, when in fact you pass in the hard coded message buffer size. 
>> 
>> But that begs the question of why you test strlen above, and more importantly, whether all cases where snprintf returns an error are truncation problems. It could have detected an invalid UTF8 sequence and aborted in the middle of it.
>
> The `strlen` check is to skip the empty buffer you can get on Windows if vsnprintf returns -1 due to overflow of INT_MAX.
> 
> We are assuming/requiring that we start with a valid UTF8 sequence and the worst that will happen is that vsnprintf will truncate it.
> 
> If we actually got -1 for a conversion error (no way to tell the difference in the two cases) then we would unnecessarily truncate, but we do not expect any such conversion errors - in part because we type check the format specifiers and args and so should never get a mismatch.

Note this has been updated now to pass `strlen(msg)`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1696264323

From dholmes at openjdk.org  Tue Jul 30 04:34:31 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 30 Jul 2024 04:34:31 GMT
Subject: RFR: 8337416: Fix -Wzero-as-null-pointer-constant warnings in
 misc. runtime code
In-Reply-To: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com>
References: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com>
Message-ID: <Z60HHFyeI-xKnJXCK3Y5VDJzAjtjbPtPK1l6yEmPPIk=.5d4b8acd-4bcd-464c-aef8-4cfd707846f4@github.com>

On Tue, 30 Jul 2024 03:34:18 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this (perhaps trivial?) change that removes some uses of literal
> 0 as a null pointer constant in misc. runtime code.  Most are changed to use
> nullptr. 
> 
> Testing: mach5 tier1

This looks fine, and I think trivial.

I think there is an existing bug but probably better to file a separate JBS issue for that.

Thanks

src/hotspot/share/oops/constantPool.cpp line 2068:

> 2066:   }
> 2067:   printf("Cpool size: %d\n", size);
> 2068:   fflush(nullptr);

This looks like a bug. I think someone used 0 aka fd0 when they needed stdout for fflush.

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20383#pullrequestreview-2206608763
PR Review Comment: https://git.openjdk.org/jdk/pull/20383#discussion_r1696275148

From stuefe at openjdk.org  Tue Jul 30 04:55:33 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 30 Jul 2024 04:55:33 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v17]
In-Reply-To: <fyALiuRCBIdxuyUue80jejw0G9ChAh4Y0kn--lbTTHY=.ea8dd1ae-6cec-416c-976b-fe027732dd79@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <fyALiuRCBIdxuyUue80jejw0G9ChAh4Y0kn--lbTTHY=.ea8dd1ae-6cec-416c-976b-fe027732dd79@github.com>
Message-ID: <YOZuklnaueS0NGWEomZaOfh-ic1ALDoP_395eGL4ITM=.1762685e-a1da-44cf-ad18-c61e01b5f48a@github.com>

On Mon, 29 Jul 2024 19:08:17 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   last lingering change

Okay.

-------------

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2206633057

From stuefe at openjdk.org  Tue Jul 30 05:16:34 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 30 Jul 2024 05:16:34 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v4]
In-Reply-To: <ZVf8-PqRdsaSkXDK_PGBeQIWxNV9zY3Id57z8TBP78Q=.e5d32a11-db6f-468d-af60-54d2c91b11a0@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <ZVf8-PqRdsaSkXDK_PGBeQIWxNV9zY3Id57z8TBP78Q=.e5d32a11-db6f-468d-af60-54d2c91b11a0@github.com>
Message-ID: <XpUTvbOxM7zHYsI-5yqbIfRo-e-_mev-dNBf1nNnY7s=.ed0f4c8f-9775-4aeb-bd64-761522a9d514@github.com>

On Tue, 30 Jul 2024 04:12:11 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
>> 
>> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
>> 
>> Testing:
>>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>>  - tiers 1-3 sanity testing
>> 
>> Thanks.
>
> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix logic to allow for the buffer being partially filled.

ok

-------------

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20345#pullrequestreview-2206655823

From stuefe at openjdk.org  Tue Jul 30 05:22:32 2024
From: stuefe at openjdk.org (Thomas Stuefe)
Date: Tue, 30 Jul 2024 05:22:32 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic [v2]
In-Reply-To: <5fyuvwoHRU_EUT2tvUsWwzCjd7dazKHMiL0rGWW8jVU=.fed6e33a-7a22-4b4c-950f-d19c18ee0eaf@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
 <5fyuvwoHRU_EUT2tvUsWwzCjd7dazKHMiL0rGWW8jVU=.fed6e33a-7a22-4b4c-950f-d19c18ee0eaf@github.com>
Message-ID: <lsdnO__d3kqEFpSJJVZOz7JSRaSQXjxT6xwC0kc1MxI=.ec76bf46-635c-411d-9d0c-918d286f0f0b@github.com>

On Mon, 29 Jul 2024 14:49:48 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote:

>> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031)
>> 
>> Testing:
>>   test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java
>>   test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java
>
> Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Address review comments by Thomas S.
>   
>   Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>

Minor naming nit, otherwise good.

src/hotspot/share/compiler/compilationMemoryStatistic.hpp line 40:

> 38: 
> 39: // Helper class to wrap the array of arena tags for easier processing
> 40: class ArenaTagsCounter {

Sorry for being a stickler for precise names, but I would like plural for counters here - it is not a single counter, its a series/vector/array of counters.
Any of these work for me: ArenaCountersByTag - ArenaCountersByTagVector - ArenaTagCounterVector - ArenaTagCounters

-------------

Marked as reviewed by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20304#pullrequestreview-2206660184
PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1696322176

From jwaters at openjdk.org  Tue Jul 30 05:26:31 2024
From: jwaters at openjdk.org (Julian Waters)
Date: Tue, 30 Jul 2024 05:26:31 GMT
Subject: RFR: 8337416: Fix -Wzero-as-null-pointer-constant warnings in
 misc. runtime code
In-Reply-To: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com>
References: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com>
Message-ID: <EeNLijHM045qSCsVlfkuAvzvyEknaPkdEa5XZnMszHw=.5120ec93-af97-4219-848d-db99043c2e1a@github.com>

On Tue, 30 Jul 2024 03:34:18 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this (perhaps trivial?) change that removes some uses of literal
> 0 as a null pointer constant in misc. runtime code.  Most are changed to use
> nullptr. 
> 
> Testing: mach5 tier1

Looks Good!

-------------

Marked as reviewed by jwaters (Committer).

PR Review: https://git.openjdk.org/jdk/pull/20383#pullrequestreview-2206668559

From dholmes at openjdk.org  Tue Jul 30 05:32:32 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 30 Jul 2024 05:32:32 GMT
Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in
 prims code
In-Reply-To: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
Message-ID: <Em7Cdv0NCUyAnZtjOQQaTNJleGEvIi4mWAFAuUVCz24=.4eebb188-2eaa-484e-8f5d-557ac99fd67d@github.com>

On Tue, 30 Jul 2024 04:12:33 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this change that removes some uses of literal 0 as a null
> pointer constant in prims code.
> 
> Testing: mach5 tier1

Couple of queries on this one.

Thanks

src/hotspot/share/prims/jni.cpp line 1151:

> 1149: \
> 1150:   EntryProbe; \
> 1151:   ResultType ret{}; \

This looks bogus. ResultType is just a macro variable and could be a primitive type. ?? Does the local need initializing?

src/hotspot/share/prims/methodHandles.cpp line 439:

> 437:   default:
> 438:     fatal("unexpected intrinsic id: %d %s", vmIntrinsics::as_int(iid), vmIntrinsics::name_at(iid));
> 439:     return 0;

Do we no longer need these returns after `fatal` to keep compilers happy?

-------------

PR Review: https://git.openjdk.org/jdk/pull/20385#pullrequestreview-2206671959
PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1696328696
PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1696329565

From dholmes at openjdk.org  Tue Jul 30 05:41:08 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 30 Jul 2024 05:41:08 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v5]
In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
Message-ID: <I5ohuzIDvghA8wDhpSAQTppCO3Kqsbp9mGeDvxO6G4U=.1adfae97-c72c-4c53-a465-982e2d398873@github.com>

> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
> 
> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
> 
> Testing:
>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>  - tiers 1-3 sanity testing
> 
> Thanks.

David Holmes has updated the pull request incrementally with one additional commit since the last revision:

  Fix off-by-one error

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20345/files
  - new: https://git.openjdk.org/jdk/pull/20345/files/77079058..d7b65bbb

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20345&range=03-04

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20345.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20345/head:pull/20345

PR: https://git.openjdk.org/jdk/pull/20345

From gcao at openjdk.org  Tue Jul 30 05:47:57 2024
From: gcao at openjdk.org (Gui Cao)
Date: Tue, 30 Jul 2024 05:47:57 GMT
Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191
Message-ID: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>

Hi, please help review this patch that fix the client VM build failed for riscv.

Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421)

The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro.  And made related adjustments in VM_Version::c2_initialize().

### Testing
- [x] linux-riscv client VM fastdebug native build

-------------

Commit messages:
 - 8337421: RISC-V: client VM build failure after JDK-8335191

Changes: https://git.openjdk.org/jdk/pull/20386/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20386&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8337421
  Stats: 200 lines in 2 files changed: 100 ins; 99 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20386.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20386/head:pull/20386

PR: https://git.openjdk.org/jdk/pull/20386

From gcao at openjdk.org  Tue Jul 30 05:47:57 2024
From: gcao at openjdk.org (Gui Cao)
Date: Tue, 30 Jul 2024 05:47:57 GMT
Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191
In-Reply-To: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
Message-ID: <tNosSLnTmI6wOQsQboQ53uNpOO8BaE3c6RvGJJWOOtg=.b1fa62f7-8c6f-4fe2-a3e2-ccaf274982b3@github.com>

On Tue, 30 Jul 2024 05:41:45 GMT, Gui Cao <gcao at openjdk.org> wrote:

> Hi, please help review this patch that fix the client VM build failed for riscv.
> 
> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421)
> 
> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro.  And made related adjustments in VM_Version::c2_initialize().
> 
> ### Testing
> - [x] linux-riscv client VM fastdebug native build

@Hamlin-Li : May I ask if this makes sense to you? Thanks.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20386#issuecomment-2257512053

From dholmes at openjdk.org  Tue Jul 30 06:35:32 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 30 Jul 2024 06:35:32 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v4]
In-Reply-To: <XpUTvbOxM7zHYsI-5yqbIfRo-e-_mev-dNBf1nNnY7s=.ed0f4c8f-9775-4aeb-bd64-761522a9d514@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <ZVf8-PqRdsaSkXDK_PGBeQIWxNV9zY3Id57z8TBP78Q=.e5d32a11-db6f-468d-af60-54d2c91b11a0@github.com>
 <XpUTvbOxM7zHYsI-5yqbIfRo-e-_mev-dNBf1nNnY7s=.ed0f4c8f-9775-4aeb-bd64-761522a9d514@github.com>
Message-ID: <K1v8QyK5R_ntO7mI74NnRLZ150UmuX2Ji3r3dOMDLg4=.f58db593-372a-4561-8a2b-c0aec888b462@github.com>

On Tue, 30 Jul 2024 05:14:06 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix logic to allow for the buffer being partially filled.
>
> ok

Thanks for the review @tstuefe

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20345#issuecomment-2257572292

From jwtang at openjdk.org  Tue Jul 30 06:46:20 2024
From: jwtang at openjdk.org (Jiawei Tang)
Date: Tue, 30 Jul 2024 06:46:20 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v4]
In-Reply-To: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
Message-ID: <Og5U6MWsTWn6yVFHLPi4Fovp1Nke8Lk41qCwReD0BIU=.5d3848df-da28-48f4-8801-3ad184e8762f@github.com>

> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing.

Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision:

  refactor testcase and change the location of fix codes

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20373/files
  - new: https://git.openjdk.org/jdk/pull/20373/files/723b1ec6..1b0de486

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20373&range=02-03

  Stats: 230 lines in 5 files changed: 117 ins; 111 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20373.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20373/head:pull/20373

PR: https://git.openjdk.org/jdk/pull/20373

From alanb at openjdk.org  Tue Jul 30 06:49:31 2024
From: alanb at openjdk.org (Alan Bateman)
Date: Tue, 30 Jul 2024 06:49:31 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v3]
In-Reply-To: <yBWdB5qfG39speceqxReLp2SRTzlOk3bWt1rjGK83lA=.041249fc-d4b5-4c81-9dc8-4193d82e3a28@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
 <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com>
 <yBWdB5qfG39speceqxReLp2SRTzlOk3bWt1rjGK83lA=.041249fc-d4b5-4c81-9dc8-4193d82e3a28@github.com>
Message-ID: <NUYBzJVMKqywg4-jWaehrYyh76pE84JYyD4n_iYnL0k=.c6682c01-9e20-4ebf-996e-7a715a53d0d7@github.com>

On Mon, 29 Jul 2024 22:58:09 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>  Can you convert the test to use `.cpp` instead of `.c` as well? 

or maybe it could use VThreadPinner which allows calling through a native frame for tests like this.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20373#issuecomment-2257591543

From kbarrett at openjdk.org  Tue Jul 30 06:56:33 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Tue, 30 Jul 2024 06:56:33 GMT
Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in
 prims code
In-Reply-To: <Em7Cdv0NCUyAnZtjOQQaTNJleGEvIi4mWAFAuUVCz24=.4eebb188-2eaa-484e-8f5d-557ac99fd67d@github.com>
References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
 <Em7Cdv0NCUyAnZtjOQQaTNJleGEvIi4mWAFAuUVCz24=.4eebb188-2eaa-484e-8f5d-557ac99fd67d@github.com>
Message-ID: <GHKiuJLY8J-ixCnxqrGAOyAJm0wdZUOGI6sbioUCNS8=.45d5fe4b-e059-4473-885f-ade0efae9cb5@github.com>

On Tue, 30 Jul 2024 05:27:37 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Please review this change that removes some uses of literal 0 as a null
>> pointer constant in prims code.
>> 
>> Testing: mach5 tier1
>
> src/hotspot/share/prims/jni.cpp line 1151:
> 
>> 1149: \
>> 1150:   EntryProbe; \
>> 1151:   ResultType ret{}; \
> 
> This looks bogus. ResultType is just a macro variable and could be a primitive type. ?? Does the local need initializing?

This is value-initialization syntax.  Value-initialization of a primitive type is zero-initialization.

However, I think we don't need the local variable at all.  Here and in the other 5(?) similar places, rather than

  ResultType ret{};
  ...
  ret = jvalue.get_##ResultType();
  return ret;

I think we could just have

  ...
  return jvalue.get_##ResultType();

> src/hotspot/share/prims/methodHandles.cpp line 439:
> 
>> 437:   default:
>> 438:     fatal("unexpected intrinsic id: %d %s", vmIntrinsics::as_int(iid), vmIntrinsics::name_at(iid));
>> 439:     return 0;
> 
> Do we no longer need these returns after `fatal` to keep compilers happy?

Now that we have, and are using, `[[noreturn]]` on all platforms, we no longer need that dead code.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1696408217
PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1696408335

From jwtang at openjdk.org  Tue Jul 30 06:56:34 2024
From: jwtang at openjdk.org (Jiawei Tang)
Date: Tue, 30 Jul 2024 06:56:34 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v3]
In-Reply-To: <A8wM3bukziE67E6BEq1fo8wM-fXbbvME_k4zoQTGtSY=.e967a17a-ec4d-4d38-83b5-e5578a05d2b6@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
 <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com>
 <A8wM3bukziE67E6BEq1fo8wM-fXbbvME_k4zoQTGtSY=.e967a17a-ec4d-4d38-83b5-e5578a05d2b6@github.com>
Message-ID: <B6mclOMVdt2B3T53Qsh1j3IQCslA5tdle7KPf-0bDF0=.ed0f9930-f107-41ca-b588-cf1ff31d1ae7@github.com>

On Mon, 29 Jul 2024 22:34:46 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   changes according to reviewers' advice
>
> src/hotspot/share/prims/jvmtiExport.cpp line 970:
> 
>> 968:     if (_thread->is_in_any_VTMS_transition()) {
>> 969:       return; // no events should be posted if thread is in any VTMS transition
>> 970:     }
> 
> This is not right place to fix it.
> 
> This would be better:
> 
> @@ -1091,8 +1091,8 @@ bool JvmtiExport::post_class_file_load_hook(Symbol* h_name,
>    if (JvmtiEnv::get_phase() < JVMTI_PHASE_PRIMORDIAL) {
>      return false;
>    }
> -  if (JavaThread::current()->is_in_tmp_VTMS_transition()) {
> -    return false; // skip CFLH events in tmp VTMS transition
> +  if (thread->is_in_any_VTMS_transition()) {
> +    return; // no events should be posted if thread is in any VTMS transition
>    }
>  
>    JvmtiClassFileLoadHookPoster poster(h_name, class_loader,
> 
> 
> Also, there is a check in the constructor `JvmtiClassFileLoadHookPoster()`:
> 
>     if (_thread->is_in_any_VTMS_transition()) {
>       return; // no events should be posted if thread is in any VTMS transition
>     }
> 
> It is better to replace it with assert. With the right check in the `JvmtiExport::post_class_file_load_hook()` we should never call this constructor and `poster.post()` when in a VTMS transition.

Change it.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1696407776

From mli at openjdk.org  Tue Jul 30 07:03:31 2024
From: mli at openjdk.org (Hamlin Li)
Date: Tue, 30 Jul 2024 07:03:31 GMT
Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191
In-Reply-To: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
Message-ID: <selusQPJMBBgbqUG7i0dxDxZkXJfHwyZJ5-LBMP3Q2c=.dc77ee0b-5abc-4b03-9b92-55e8c4d3a940@github.com>

On Tue, 30 Jul 2024 05:41:45 GMT, Gui Cao <gcao at openjdk.org> wrote:

> Hi, please help review this patch that fix the client VM build failed for riscv.
> 
> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421)
> 
> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro.  And made related adjustments in VM_Version::c2_initialize().
> 
> ### Testing
> - [x] linux-riscv client VM fastdebug native build

Thanks for catching. Looks good to me.

Just one minor comment, which is quite subjective, you're on the call.

Suggested changes:


void VM_Version::initialize() {
  common_initialize();
#ifdef COMPILER2
  c2_initialize();
#endif // COMPILER2
}

void VM_Version::common_initialize() {
  ...
}

#ifdef COMPILER2
void VM_Version::c2_initialize() {
  ...
}
#endif // COMPILER2

-------------

Marked as reviewed by mli (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20386#pullrequestreview-2206812795

From kbarrett at openjdk.org  Tue Jul 30 07:18:33 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Tue, 30 Jul 2024 07:18:33 GMT
Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in
 prims code
In-Reply-To: <GHKiuJLY8J-ixCnxqrGAOyAJm0wdZUOGI6sbioUCNS8=.45d5fe4b-e059-4473-885f-ade0efae9cb5@github.com>
References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
 <Em7Cdv0NCUyAnZtjOQQaTNJleGEvIi4mWAFAuUVCz24=.4eebb188-2eaa-484e-8f5d-557ac99fd67d@github.com>
 <GHKiuJLY8J-ixCnxqrGAOyAJm0wdZUOGI6sbioUCNS8=.45d5fe4b-e059-4473-885f-ade0efae9cb5@github.com>
Message-ID: <AYeZPI3ANHsd29eP2-PHll2yUn8KT1HL4S_2KaFUon0=.3dda4769-20ef-4653-aaeb-eec3f568925f@github.com>

On Tue, 30 Jul 2024 06:54:04 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> src/hotspot/share/prims/jni.cpp line 1151:
>> 
>>> 1149: \
>>> 1150:   EntryProbe; \
>>> 1151:   ResultType ret{}; \
>> 
>> This looks bogus. ResultType is just a macro variable and could be a primitive type. ?? Does the local need initializing?
>
> This is value-initialization syntax.  Value-initialization of a primitive type is zero-initialization.
> 
> However, I think we don't need the local variable at all.  Here and in the other 5(?) similar places, rather than
> 
>   ResultType ret{};
>   ...
>   ret = jvalue.get_##ResultType();
>   return ret;
> 
> I think we could just have
> 
>   ...
>   return jvalue.get_##ResultType();

Looks like eliminating the variable doesn't work.  It gets used in a `DT_RETURN_MARK_FOR` form, which
needs the address of the return value.  That address is obtained using a reference.  Taking a reference
to an uninitialized variable is (I think) okay, so long as one doesn't attempt to use the uninitialized value.
But then the assignment could be problematic if it's uninitialized and the assignment operator is non-trivial.
I expect the compiler will optimize away a trivial zero initialization if it's not needed.  So ensuring it is
value-initialized seems like the cleanest thing to do.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1696441217

From kbarrett at openjdk.org  Tue Jul 30 07:27:37 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Tue, 30 Jul 2024 07:27:37 GMT
Subject: RFR: 8337416: Fix -Wzero-as-null-pointer-constant warnings in
 misc. runtime code
In-Reply-To: <Z60HHFyeI-xKnJXCK3Y5VDJzAjtjbPtPK1l6yEmPPIk=.5d4b8acd-4bcd-464c-aef8-4cfd707846f4@github.com>
References: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com>
 <Z60HHFyeI-xKnJXCK3Y5VDJzAjtjbPtPK1l6yEmPPIk=.5d4b8acd-4bcd-464c-aef8-4cfd707846f4@github.com>
Message-ID: <8inmHHwwUefDxv-O0Ltki71aI177Wz7yb_mAkt5tEr8=.c66c77f7-8290-4d17-98fa-498d2ef06180@github.com>

On Tue, 30 Jul 2024 04:32:09 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Please review this (perhaps trivial?) change that removes some uses of literal
>> 0 as a null pointer constant in misc. runtime code.  Most are changed to use
>> nullptr. 
>> 
>> Testing: mach5 tier1
>
> This looks fine, and I think trivial.
> 
> I think there is an existing bug but probably better to file a separate JBS issue for that.
> 
> Thanks

Thanks for reviews @dholmes-ora and @TheShermanTanker .

I'll file a followup bug for the pre-existing fflush argument that @dholmes-ora pointed out.

> src/hotspot/share/oops/constantPool.cpp line 2068:
> 
>> 2066:   }
>> 2067:   printf("Cpool size: %d\n", size);
>> 2068:   fflush(nullptr);
> 
> This looks like a bug. I think someone used 0 aka fd0 when they needed stdout for fflush.

I assumed there was some reason for flushing all here, but you are right, this is probably a bug.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20383#issuecomment-2257653298
PR Review Comment: https://git.openjdk.org/jdk/pull/20383#discussion_r1696447458

From kbarrett at openjdk.org  Tue Jul 30 07:27:37 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Tue, 30 Jul 2024 07:27:37 GMT
Subject: Integrated: 8337416: Fix -Wzero-as-null-pointer-constant warnings in
 misc. runtime code
In-Reply-To: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com>
References: <d5tyrnQNDwidRG11CtHlC_dWlGOHRQPDi-xRS389boU=.29346f8f-d7cd-4de7-99dd-d504dac01b5e@github.com>
Message-ID: <3ojt0S-OE_7u3dFaUtQ7zGyTuTu9AR_wLCfm6rUJNJQ=.10175182-f63e-441d-8a18-8630ff7ade52@github.com>

On Tue, 30 Jul 2024 03:34:18 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this (perhaps trivial?) change that removes some uses of literal
> 0 as a null pointer constant in misc. runtime code.  Most are changed to use
> nullptr. 
> 
> Testing: mach5 tier1

This pull request has now been integrated.

Changeset: bc7c255b
Author:    Kim Barrett <kbarrett at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/bc7c255b156bf3bb3fd8c3f622b8127ab27e7c7a
Stats:     17 lines in 10 files changed: 0 ins; 0 del; 17 mod

8337416: Fix -Wzero-as-null-pointer-constant warnings in misc. runtime code

Reviewed-by: dholmes, jwaters

-------------

PR: https://git.openjdk.org/jdk/pull/20383

From clanger at openjdk.org  Tue Jul 30 07:48:36 2024
From: clanger at openjdk.org (Christoph Langer)
Date: Tue, 30 Jul 2024 07:48:36 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap' [v4]
In-Reply-To: <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
 <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com>
Message-ID: <7cxoF1PDzUFxZzMM29aVQKU87VH50Xpp42nyEk8oFvg=.c7c4e1ed-6bb4-499f-854e-ce16fcaac091@github.com>

On Thu, 25 Jul 2024 13:42:48 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> When running with ubsan - enabled binaries, some tests trigger the following report :
>> 
>> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
>>     #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91
>>     #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106
>>     #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157
>>     #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63
>>     #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85
>>     #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319
>>     #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233
>>     #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199
>> 
>> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr .
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   add patch of Kim Barrett

LGTM

-------------

Marked as reviewed by clanger (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20296#pullrequestreview-2206911734

From gcao at openjdk.org  Tue Jul 30 07:52:03 2024
From: gcao at openjdk.org (Gui Cao)
Date: Tue, 30 Jul 2024 07:52:03 GMT
Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191
 [v2]
In-Reply-To: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
Message-ID: <W6_6tk_93Tdgi18jxyNhKJVTbfzgrJmVTXcUdRa5GYo=.5566062a-5ca8-4ff1-b040-98d9ef7536cf@github.com>

> Hi, please help review this patch that fix the client VM build failed for riscv.
> 
> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421)
> 
> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro.  And made related adjustments in VM_Version::c2_initialize().
> 
> ### Testing
> - [x] linux-riscv client VM fastdebug native build

Gui Cao has updated the pull request incrementally with one additional commit since the last revision:

  Fix for review comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20386/files
  - new: https://git.openjdk.org/jdk/pull/20386/files/6f8b6883..d9055e6f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20386&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20386&range=00-01

  Stats: 14 lines in 2 files changed: 9 ins; 5 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20386.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20386/head:pull/20386

PR: https://git.openjdk.org/jdk/pull/20386

From gcao at openjdk.org  Tue Jul 30 07:52:04 2024
From: gcao at openjdk.org (Gui Cao)
Date: Tue, 30 Jul 2024 07:52:04 GMT
Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191
 [v2]
In-Reply-To: <selusQPJMBBgbqUG7i0dxDxZkXJfHwyZJ5-LBMP3Q2c=.dc77ee0b-5abc-4b03-9b92-55e8c4d3a940@github.com>
References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
 <selusQPJMBBgbqUG7i0dxDxZkXJfHwyZJ5-LBMP3Q2c=.dc77ee0b-5abc-4b03-9b92-55e8c4d3a940@github.com>
Message-ID: <yNPc7JZLQ1UCsIP3MkCnb3XX8SNZeAGDJduWbMr_ua0=.baa032d8-f434-4273-98a1-8a7c7f0bcd9a@github.com>

On Tue, 30 Jul 2024 07:00:27 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Thanks for catching. Looks good to me.
> 
> Just one minor comment, which is quite subjective, you're on the call.
> 
> Suggested changes:
> 
> ```
> void VM_Version::initialize() {
>   common_initialize();
> #ifdef COMPILER2
>   c2_initialize();
> #endif // COMPILER2
> }
> 
> void VM_Version::common_initialize() {
>   ...
> }
> 
> #ifdef COMPILER2
> void VM_Version::c2_initialize() {
>   ...
> }
> #endif // COMPILER2
> ```

Thanks for the review. Fixed

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20386#issuecomment-2257695188

From duke at openjdk.org  Tue Jul 30 07:55:38 2024
From: duke at openjdk.org (Yuri Gaevsky)
Date: Tue, 30 Jul 2024 07:55:38 GMT
Subject: RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic
In-Reply-To: <dxSBhJiLeVkLF8PvHW3MMg69vwXU0VshECCMz5HnhhI=.e0cbda8b-f7f6-44ff-806b-1f21496911be@github.com>
References: <dxSBhJiLeVkLF8PvHW3MMg69vwXU0VshECCMz5HnhhI=.e0cbda8b-f7f6-44ff-806b-1f21496911be@github.com>
Message-ID: <_0CrA8Qa71vtP2DRk3o4yb9F80-czEU-D7lEb7stkHk=.a45226ab-0f6c-4f25-acd4-657fcc29ca93@github.com>

On Wed, 7 Feb 2024 14:35:55 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:

> Hello All,
> 
> Please review these changes to enable the __vectorizedMismatch_ intrinsic on RISC-V platform with RVV instructions supported.
> 
> Thank you,
> -Yuri Gaevsky
> 
> **Correctness checks:**
>   hotspot/jtreg/compiler/{intrinsic/c1/c2}/ under QEMU-8.1 with RVV v1.0.0 and -XX:TieredStopAtLevel=1/2/3/4.

.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-2257705299

From dholmes at openjdk.org  Tue Jul 30 07:57:31 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 30 Jul 2024 07:57:31 GMT
Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in
 prims code
In-Reply-To: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
Message-ID: <QctZn5PDD3uluZJ-W_CG3Ffo0f02PsY7Zlx5neUOICQ=.7ca80211-e028-4553-87f5-f27f17d903ea@github.com>

On Tue, 30 Jul 2024 04:12:33 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this change that removes some uses of literal 0 as a null
> pointer constant in prims code.
> 
> Testing: mach5 tier1

Okay - looks good. Thanks.

-------------

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20385#pullrequestreview-2206930451

From dholmes at openjdk.org  Tue Jul 30 07:57:32 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 30 Jul 2024 07:57:32 GMT
Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in
 prims code
In-Reply-To: <AYeZPI3ANHsd29eP2-PHll2yUn8KT1HL4S_2KaFUon0=.3dda4769-20ef-4653-aaeb-eec3f568925f@github.com>
References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
 <Em7Cdv0NCUyAnZtjOQQaTNJleGEvIi4mWAFAuUVCz24=.4eebb188-2eaa-484e-8f5d-557ac99fd67d@github.com>
 <GHKiuJLY8J-ixCnxqrGAOyAJm0wdZUOGI6sbioUCNS8=.45d5fe4b-e059-4473-885f-ade0efae9cb5@github.com>
 <AYeZPI3ANHsd29eP2-PHll2yUn8KT1HL4S_2KaFUon0=.3dda4769-20ef-4653-aaeb-eec3f568925f@github.com>
Message-ID: <2HT3saxNUjevXOwHYDEDT2dIsjjzI6OS8ps6z9oF_nY=.c50ca0b5-5eed-4ce3-b124-fe5c9995fa46@github.com>

On Tue, 30 Jul 2024 07:16:21 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> This is value-initialization syntax.  Value-initialization of a primitive type is zero-initialization.
>> 
>> However, I think we don't need the local variable at all.  Here and in the other 5(?) similar places, rather than
>> 
>>   ResultType ret{};
>>   ...
>>   ret = jvalue.get_##ResultType();
>>   return ret;
>> 
>> I think we could just have
>> 
>>   ...
>>   return jvalue.get_##ResultType();
>
> Looks like eliminating the variable doesn't work.  It gets used in a `DT_RETURN_MARK_FOR` form, which
> needs the address of the return value.  That address is obtained using a reference.  Taking a reference
> to an uninitialized variable is (I think) okay, so long as one doesn't attempt to use the uninitialized value.
> But then the assignment could be problematic if it's uninitialized and the assignment operator is non-trivial.
> I expect the compiler will optimize away a trivial zero initialization if it's not needed.  So ensuring it is
> value-initialized seems like the cleanest thing to do.

One day I will remember what this syntax is and does.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1696496369

From fyang at openjdk.org  Tue Jul 30 08:13:33 2024
From: fyang at openjdk.org (Fei Yang)
Date: Tue, 30 Jul 2024 08:13:33 GMT
Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191
 [v2]
In-Reply-To: <W6_6tk_93Tdgi18jxyNhKJVTbfzgrJmVTXcUdRa5GYo=.5566062a-5ca8-4ff1-b040-98d9ef7536cf@github.com>
References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
 <W6_6tk_93Tdgi18jxyNhKJVTbfzgrJmVTXcUdRa5GYo=.5566062a-5ca8-4ff1-b040-98d9ef7536cf@github.com>
Message-ID: <gqAdNg8JIOb7Tk9ZO-jKPYa8dGW9DTDU57FIyv02Rqg=.27da6419-ca78-47d6-a658-df0fec754e83@github.com>

On Tue, 30 Jul 2024 07:52:03 GMT, Gui Cao <gcao at openjdk.org> wrote:

>> Hi, please help review this patch that fix the client VM build failed for riscv.
>> 
>> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421)
>> 
>> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro.  And made related adjustments in VM_Version::c2_initialize().
>> 
>> ### Testing
>> - [x] linux-riscv client VM fastdebug native build
>
> Gui Cao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix for review comments

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6007:

> 6005:     generate_compare_long_strings();
> 6006: 
> 6007:     generate_string_indexof_stubs();

I think we can put this two under macro `COMPILER2` too. Then we can further remove check for macro `COMPILER2_OR_JVMCI` in this function. I don't think these stubs are ever used by JVMCI which is only partially implemented on this platform for now.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20386#discussion_r1696524759

From gcao at openjdk.org  Tue Jul 30 08:16:44 2024
From: gcao at openjdk.org (Gui Cao)
Date: Tue, 30 Jul 2024 08:16:44 GMT
Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191
 [v3]
In-Reply-To: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
Message-ID: <HMZWo-m5vmc27XgW5E5SE6Gx_rr8nBnoglxOJ8J64Uw=.803cd228-4977-46c7-a267-3b2a9ffce03b@github.com>

> Hi, please help review this patch that fix the client VM build failed for riscv.
> 
> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421)
> 
> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro.  And made related adjustments in VM_Version::c2_initialize().
> 
> ### Testing
> - [x] linux-riscv client VM fastdebug native build

Gui Cao has updated the pull request incrementally with one additional commit since the last revision:

  Remove check for macro COMPILER2_OR_JVMCI in generate_compiler_stubs function

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20386/files
  - new: https://git.openjdk.org/jdk/pull/20386/files/d9055e6f..9d3b6d29

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20386&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20386&range=01-02

  Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20386.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20386/head:pull/20386

PR: https://git.openjdk.org/jdk/pull/20386

From gcao at openjdk.org  Tue Jul 30 08:16:44 2024
From: gcao at openjdk.org (Gui Cao)
Date: Tue, 30 Jul 2024 08:16:44 GMT
Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191
 [v2]
In-Reply-To: <gqAdNg8JIOb7Tk9ZO-jKPYa8dGW9DTDU57FIyv02Rqg=.27da6419-ca78-47d6-a658-df0fec754e83@github.com>
References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
 <W6_6tk_93Tdgi18jxyNhKJVTbfzgrJmVTXcUdRa5GYo=.5566062a-5ca8-4ff1-b040-98d9ef7536cf@github.com>
 <gqAdNg8JIOb7Tk9ZO-jKPYa8dGW9DTDU57FIyv02Rqg=.27da6419-ca78-47d6-a658-df0fec754e83@github.com>
Message-ID: <wH2MgatEWTnw2loLBXJEdPtKcrXo6cSNG_n45pAsJi8=.b95827f2-7eca-413f-8fe0-6d858bffc2dc@github.com>

On Tue, 30 Jul 2024 08:10:24 GMT, Fei Yang <fyang at openjdk.org> wrote:

> I think we can put this two under macro `COMPILER2` too. Then we can further remove check for macro `COMPILER2_OR_JVMCI` in this function. I don't think these stubs are ever used by JVMCI which is only partially implemented on this platform for now.

Thanks for your review. Fixed

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20386#discussion_r1696530195

From lucy at openjdk.org  Tue Jul 30 08:25:39 2024
From: lucy at openjdk.org (Lutz Schmidt)
Date: Tue, 30 Jul 2024 08:25:39 GMT
Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well
 [v17]
In-Reply-To: <NQ1QNuTBkNsmBReCpdhY1lrdIYz9s8UiNd1As1sLQ7M=.17c8f789-2bf1-4beb-891f-debccad29164@github.com>
References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
 <NQ1QNuTBkNsmBReCpdhY1lrdIYz9s8UiNd1As1sLQ7M=.17c8f789-2bf1-4beb-891f-debccad29164@github.com>
Message-ID: <N83svaDOiUQdrjynAb0K834OxualQ_3FSJkKxL_0B3c=.5e5ffc29-2088-4345-aba2-23cdd9ae9817@github.com>

On Mon, 1 Jul 2024 14:14:50 GMT, Amit Kumar <amitkumar at openjdk.org> wrote:

>> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450)
>> 
>> I ran `tier1` test with  `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. 
>> 
>> 
>> Without Patch: 
>> 
>> SecondarySuperCacheHits.test  avgt   15  0.929 ? 0.010  ns/op
>> 
>> SecondarySuperCacheInterContention.test     avgt   15  1.413 ? 0.007  ns/op
>> SecondarySuperCacheInterContention.test:t1  avgt   15  1.415 ? 0.016  ns/op
>> SecondarySuperCacheInterContention.test:t2  avgt   15  1.410 ? 0.017  ns/op
>> 
>> Benchmark                             Mode  Cnt   Score   Error  Units
>> SecondarySupersLookup.testNegative00  avgt   15   1.806 ? 0.325  ns/op
>> SecondarySupersLookup.testNegative01  avgt   15   2.364 ? 0.236  ns/op
>> SecondarySupersLookup.testNegative02  avgt   15   2.903 ? 0.215  ns/op
>> SecondarySupersLookup.testNegative03  avgt   15   3.417 ? 0.199  ns/op
>> SecondarySupersLookup.testNegative04  avgt   15   3.758 ? 0.102  ns/op
>> SecondarySupersLookup.testNegative05  avgt   15   4.352 ? 0.123  ns/op
>> SecondarySupersLookup.testNegative06  avgt   15   4.800 ? 0.099  ns/op
>> SecondarySupersLookup.testNegative07  avgt   15   5.365 ? 0.060  ns/op
>> SecondarySupersLookup.testNegative08  avgt   15   6.316 ? 0.092  ns/op
>> SecondarySupersLookup.testNegative09  avgt   15   6.669 ? 0.164  ns/op
>> SecondarySupersLookup.testNegative10  avgt   15   7.041 ? 0.164  ns/op
>> SecondarySupersLookup.testNegative16  avgt   15   9.336 ? 0.185  ns/op
>> SecondarySupersLookup.testNegative20  avgt   15  11.373 ? 0.029  ns/op
>> SecondarySupersLookup.testNegative30  avgt   15  15.236 ? 0.051  ns/op
>> SecondarySupersLookup.testNegative32  avgt   15  16.031 ? 0.091  ns/op
>> SecondarySupersLookup.testNegative40  avgt   15  19.197 ? 0.279  ns/op
>> SecondarySupersLookup.testNegative50  avgt   15  23.804 ? 2.387  ns/op
>> SecondarySupersLookup.testNegative55  avgt   15  25.610 ? 1.155  ns/op
>> SecondarySupersLookup.testNegative56  avgt   15  26.128 ? 2.203  ns/op
>> SecondarySupersLookup.testNegative57  avgt   15  26.126 ? 0.881  ns/op
>> SecondarySupersLookup.testNegative58  avgt   15  26.314 ? 0.521  ns/op
>> SecondarySupersLookup.testNegative59  avgt   15  26.750 ? 0.837  ns/op
>> SecondarySupersLookup.testNegative60  avgt   15  27.118 ? 0.557 ...
>
> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update src/hotspot/cpu/s390/macroAssembler_s390.cpp
>   
>   Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com>

Changes look good.
Sorry for the poor response time.

-------------

Marked as reviewed by lucy (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19544#pullrequestreview-2207003971

From shade at openjdk.org  Tue Jul 30 08:28:34 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Tue, 30 Jul 2024 08:28:34 GMT
Subject: RFR: 8334230: Optimize C2 classes layout [v3]
In-Reply-To: <9Xqj8lhtk5xtM-NHRl-GBFTZSzQcNw8yYo_ket5U0aM=.a752c907-e9c7-47cc-87c1-bf6bf0a3d642@github.com>
References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>
 <9Xqj8lhtk5xtM-NHRl-GBFTZSzQcNw8yYo_ket5U0aM=.a752c907-e9c7-47cc-87c1-bf6bf0a3d642@github.com>
Message-ID: <DU3m0oaK_eqlDJhlUjT9EW97-HApsm0FAO6UYOVwviM=.0c490ea0-96e6-4408-b149-8caea3447ecc@github.com>

On Tue, 30 Jul 2024 00:53:04 GMT, Neethu Prasad <nprasad at openjdk.org> wrote:

>> **Notes**
>> 
>> Rearrange C2 class fields to optimize footprint.
>> 
>> 
>> **Verification**
>> 
>> 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests.
>> 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes.
>> 
>> | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding |
>> | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- |
>> | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 ->  0 | 11 -> 0  | 56 bytes -> 48 | 0 -> 3 |
>> | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 ->  0 | 5 -> 0  | 24 bytes -> 16 | 7 -> 4 |
>> | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 ->  0 | 7 -> 0  | 56 bytes -> 48 | 7 -> 6 |
>> | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 ->  0 | 8 -> 0  | 32 bytes -> 24 | 1 -> 1 |
>> 
>> class ArrayPointer {
>> 	const class Node  *        _pointer;             /*     0     8 */
>> 	const class Node  *        _base;                /*     8     8 */
>> 	const jlong                _constant_offset;     /*    16     8 */
>> 	const class Node  *        _int_offset;          /*    24     8 */
>> 	const class GrowableArray<Node*>  * _other_offsets; /*    32     8 */
>> 	const jint                 _int_offset_shift;    /*    40     4 */
>> 	const bool                 _is_valid;            /*    44     1 */
>> public:
>> 
>> 
>> 	/* size: 48, cachelines: 1, members: 7 */
>> 	/* padding: 3 */
>> 	/* last cacheline: 48 bytes */
>> };
>> 
>> 
>> 
>> class CallJavaNode : public CallNode {
>> public:
>> 
>> 	/* class CallNode            <ancestor>; */      /*     0   128 */
>> protected:
>> 
>> 	/* --- cacheline 2 boundary (128 bytes) --- */
>> 	class ciMethod *           _method;              /*   128     8 */
>> 	bool                       _optimized_virtual;   /*   136     1 */
>> 	bool                       _method_handle_invoke; /*   137     1 */
>> 	bool                       _override_symbolic_info; /*   138     1 */
>> 	bool                       _arg_escape;          /*   139     1 */
>> public:
>> 
>> protected:
>> 
>> public:
>> 
>> 
>> 	/* size: 144, cachelines: 3, members: 6 */
>> 	/* padding: 4 */
>> 	/* last cacheline: 16 bytes */
>> 
>> 	/* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */
>> };
>> 
>> 
>> 
>> class C2Access : public StackObj {
>> public:
>> 
>> 	/* class StackObj            <ancestor>; */      /*     0     0 */
>> 
>> 	/* XXX last struct has 1 byte of padding */
>> 
> ...
>
> Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Address constructor order issue for C2OptAccess

Marked as reviewed by shade (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/19861#pullrequestreview-2207011005

From djelinski at openjdk.org  Tue Jul 30 08:42:37 2024
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Tue, 30 Jul 2024 08:42:37 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v5]
In-Reply-To: <I5ohuzIDvghA8wDhpSAQTppCO3Kqsbp9mGeDvxO6G4U=.1adfae97-c72c-4c53-a465-982e2d398873@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <I5ohuzIDvghA8wDhpSAQTppCO3Kqsbp9mGeDvxO6G4U=.1adfae97-c72c-4c53-a465-982e2d398873@github.com>
Message-ID: <ecGNOoxKho_Go2gZcszTdimzEVHA2NkzQ6XlX97xmoA=.63c2ca29-e65f-4ffb-b178-c87567650241@github.com>

On Tue, 30 Jul 2024 05:41:08 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
>> 
>> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
>> 
>> Testing:
>>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>>  - tiers 1-3 sanity testing
>> 
>> Thanks.
>
> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix off-by-one error

src/hotspot/share/utilities/exceptions.cpp line 278:

> 276:   if (ret == -1 || ret >= max_msg_size) {
> 277:     int len = (int) strlen(msg);
> 278:     if (len > 0) {

`truncate` asserts that len>5, you might need to adjust that.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1696567086

From gcao at openjdk.org  Tue Jul 30 08:48:11 2024
From: gcao at openjdk.org (Gui Cao)
Date: Tue, 30 Jul 2024 08:48:11 GMT
Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191
 [v4]
In-Reply-To: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
Message-ID: <qkkur4f8aYsaOYftYIA67lXS3sFtoYJGjIonPoGsD4s=.d26a6434-2fd2-4f19-b355-3b6cfdb1fa49@github.com>

> Hi, please help review this patch that fix the client VM build failed for riscv.
> 
> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421)
> 
> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro.  And made related adjustments in VM_Version::c2_initialize().
> 
> ### Testing
> - [x] linux-riscv client VM fastdebug native build

Gui Cao has updated the pull request incrementally with one additional commit since the last revision:

  Polishing

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20386/files
  - new: https://git.openjdk.org/jdk/pull/20386/files/9d3b6d29..edf16e07

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20386&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20386&range=02-03

  Stats: 8 lines in 1 file changed: 4 ins; 4 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20386.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20386/head:pull/20386

PR: https://git.openjdk.org/jdk/pull/20386

From fyang at openjdk.org  Tue Jul 30 08:48:11 2024
From: fyang at openjdk.org (Fei Yang)
Date: Tue, 30 Jul 2024 08:48:11 GMT
Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191
 [v4]
In-Reply-To: <qkkur4f8aYsaOYftYIA67lXS3sFtoYJGjIonPoGsD4s=.d26a6434-2fd2-4f19-b355-3b6cfdb1fa49@github.com>
References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
 <qkkur4f8aYsaOYftYIA67lXS3sFtoYJGjIonPoGsD4s=.d26a6434-2fd2-4f19-b355-3b6cfdb1fa49@github.com>
Message-ID: <vuPlLM3zCQEWYUVmgtZ95YJJrGwg7wh4gk9l7ABSED0=.bf84d10e-c2ad-4c2c-9920-98e7439a5633@github.com>

On Tue, 30 Jul 2024 08:44:36 GMT, Gui Cao <gcao at openjdk.org> wrote:

>> Hi, please help review this patch that fix the client VM build failed for riscv.
>> 
>> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421)
>> 
>> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro.  And made related adjustments in VM_Version::c2_initialize().
>> 
>> ### Testing
>> - [x] linux-riscv client VM fastdebug native build
>
> Gui Cao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Polishing

Marked as reviewed by fyang (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20386#pullrequestreview-2207054048

From fyang at openjdk.org  Tue Jul 30 08:48:11 2024
From: fyang at openjdk.org (Fei Yang)
Date: Tue, 30 Jul 2024 08:48:11 GMT
Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191
 [v3]
In-Reply-To: <HMZWo-m5vmc27XgW5E5SE6Gx_rr8nBnoglxOJ8J64Uw=.803cd228-4977-46c7-a267-3b2a9ffce03b@github.com>
References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
 <HMZWo-m5vmc27XgW5E5SE6Gx_rr8nBnoglxOJ8J64Uw=.803cd228-4977-46c7-a267-3b2a9ffce03b@github.com>
Message-ID: <KoSCi1gr2QcrBOgsvYD14IHQijNV-iS4dYqiXNQghrg=.b0b49097-28f7-448c-a16c-d0ce816b616e@github.com>

On Tue, 30 Jul 2024 08:16:44 GMT, Gui Cao <gcao at openjdk.org> wrote:

>> Hi, please help review this patch that fix the client VM build failed for riscv.
>> 
>> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421)
>> 
>> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro.  And made related adjustments in VM_Version::c2_initialize().
>> 
>> ### Testing
>> - [x] linux-riscv client VM fastdebug native build
>
> Gui Cao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove check for macro COMPILER2_OR_JVMCI in generate_compiler_stubs function

src/hotspot/cpu/riscv/vm_version_riscv.cpp line 333:

> 331:   // NOTE: Make sure codes dependent on UseRVV are put after MaxVectorSize initialize,
> 332:   //       as there are extra checks inside it which could disable UseRVV
> 333:   //       in some situations.

Please also move this code comment to immediately after initialization of MaxVectorSize. Otherwise looks good.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20386#discussion_r1696567820

From clanger at openjdk.org  Tue Jul 30 09:33:34 2024
From: clanger at openjdk.org (Christoph Langer)
Date: Tue, 30 Jul 2024 09:33:34 GMT
Subject: RFR: 8333144: docker tests do not work when ubsan is configured
In-Reply-To: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
Message-ID: <6WiWP5JSoQ6KQbe5FAg84BGAcRp0XtojWan0nyGaXjo=.562bc939-4751-47c8-a739-3b76cb67b710@github.com>

On Wed, 26 Jun 2024 13:32:32 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work.
> 
> We find this in the test output
> 
> [STDOUT]
> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory
> 
> The container where the test is executed does not contain the ubsan package; we might skip the test in this case.

I would prefer to add the required ubsan libraries to the container used for testing when testing an ubsan enabled build. Can we achieve this?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2257905835

From amitkumar at openjdk.org  Tue Jul 30 09:35:38 2024
From: amitkumar at openjdk.org (Amit Kumar)
Date: Tue, 30 Jul 2024 09:35:38 GMT
Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well
 [v17]
In-Reply-To: <G2IfSoXv1DKf69H_Gr5O_L-FTkQQgYGBS15UCNMoVt0=.acf2acd9-337c-4d45-8321-1c1be4e3316e@github.com>
References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
 <NQ1QNuTBkNsmBReCpdhY1lrdIYz9s8UiNd1As1sLQ7M=.17c8f789-2bf1-4beb-891f-debccad29164@github.com>
 <G2IfSoXv1DKf69H_Gr5O_L-FTkQQgYGBS15UCNMoVt0=.acf2acd9-337c-4d45-8321-1c1be4e3316e@github.com>
Message-ID: <IVpAB244ihXt0tdqTHQMTSknGgfLjFZvptzpVmTG1Wg=.eca9e293-728e-4e7f-89b7-5cab6a6d40ef@github.com>

On Thu, 4 Jul 2024 15:26:22 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update src/hotspot/cpu/s390/macroAssembler_s390.cpp
>>   
>>   Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com>
>
> Looks good.

thank you @theRealAph @TheRealMDoerr @RealLucy for the reviews.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19544#issuecomment-2257905725

From amitkumar at openjdk.org  Tue Jul 30 09:35:38 2024
From: amitkumar at openjdk.org (Amit Kumar)
Date: Tue, 30 Jul 2024 09:35:38 GMT
Subject: Integrated: 8331126: [s390x] secondary_super_cache does not scale well
In-Reply-To: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
References: <WQPmUhOYimCaLKdnDzFUfTvuKbM99-fcJfp90JjfP34=.4b62e47f-e6f1-42fb-808e-e233c4975803@github.com>
Message-ID: <yVlMdHoceR8O1gGSRhgNq9IVsiz51AHlqiFvvRh2c50=.d89a9bea-7c55-4823-8435-efd21dfa7683@github.com>

On Tue, 4 Jun 2024 15:19:51 GMT, Amit Kumar <amitkumar at openjdk.org> wrote:

> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450)
> 
> I ran `tier1` test with  `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. 
> 
> 
> Without Patch: 
> 
> SecondarySuperCacheHits.test  avgt   15  0.929 ? 0.010  ns/op
> 
> SecondarySuperCacheInterContention.test     avgt   15  1.413 ? 0.007  ns/op
> SecondarySuperCacheInterContention.test:t1  avgt   15  1.415 ? 0.016  ns/op
> SecondarySuperCacheInterContention.test:t2  avgt   15  1.410 ? 0.017  ns/op
> 
> Benchmark                             Mode  Cnt   Score   Error  Units
> SecondarySupersLookup.testNegative00  avgt   15   1.806 ? 0.325  ns/op
> SecondarySupersLookup.testNegative01  avgt   15   2.364 ? 0.236  ns/op
> SecondarySupersLookup.testNegative02  avgt   15   2.903 ? 0.215  ns/op
> SecondarySupersLookup.testNegative03  avgt   15   3.417 ? 0.199  ns/op
> SecondarySupersLookup.testNegative04  avgt   15   3.758 ? 0.102  ns/op
> SecondarySupersLookup.testNegative05  avgt   15   4.352 ? 0.123  ns/op
> SecondarySupersLookup.testNegative06  avgt   15   4.800 ? 0.099  ns/op
> SecondarySupersLookup.testNegative07  avgt   15   5.365 ? 0.060  ns/op
> SecondarySupersLookup.testNegative08  avgt   15   6.316 ? 0.092  ns/op
> SecondarySupersLookup.testNegative09  avgt   15   6.669 ? 0.164  ns/op
> SecondarySupersLookup.testNegative10  avgt   15   7.041 ? 0.164  ns/op
> SecondarySupersLookup.testNegative16  avgt   15   9.336 ? 0.185  ns/op
> SecondarySupersLookup.testNegative20  avgt   15  11.373 ? 0.029  ns/op
> SecondarySupersLookup.testNegative30  avgt   15  15.236 ? 0.051  ns/op
> SecondarySupersLookup.testNegative32  avgt   15  16.031 ? 0.091  ns/op
> SecondarySupersLookup.testNegative40  avgt   15  19.197 ? 0.279  ns/op
> SecondarySupersLookup.testNegative50  avgt   15  23.804 ? 2.387  ns/op
> SecondarySupersLookup.testNegative55  avgt   15  25.610 ? 1.155  ns/op
> SecondarySupersLookup.testNegative56  avgt   15  26.128 ? 2.203  ns/op
> SecondarySupersLookup.testNegative57  avgt   15  26.126 ? 0.881  ns/op
> SecondarySupersLookup.testNegative58  avgt   15  26.314 ? 0.521  ns/op
> SecondarySupersLookup.testNegative59  avgt   15  26.750 ? 0.837  ns/op
> SecondarySupersLookup.testNegative60  avgt   15  27.118 ? 0.557  ns/op
> SecondarySupersLookup.testNegative61  avgt   15  27.763 ? 1.628  ns...

This pull request has now been integrated.

Changeset: 7ac53118
Author:    Amit Kumar <amitkumar at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/7ac531181c25815577ba2f6f426e1da270e4f589
Stats:     429 lines in 6 files changed: 426 ins; 0 del; 3 mod

8331126: [s390x] secondary_super_cache does not scale well

Reviewed-by: lucy, aph, mdoerr

-------------

PR: https://git.openjdk.org/jdk/pull/19544

From kevinw at openjdk.org  Tue Jul 30 10:14:37 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Tue, 30 Jul 2024 10:14:37 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v17]
In-Reply-To: <fyALiuRCBIdxuyUue80jejw0G9ChAh4Y0kn--lbTTHY=.ea8dd1ae-6cec-416c-976b-fe027732dd79@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <fyALiuRCBIdxuyUue80jejw0G9ChAh4Y0kn--lbTTHY=.ea8dd1ae-6cec-416c-976b-fe027732dd79@github.com>
Message-ID: <54ySFb85fkY1XfU-2IWvCwIWKijd_F8xS-vWm_wO7KY=.b9713fd9-9b3b-49d2-93fc-57054de1e190@github.com>

On Mon, 29 Jul 2024 19:08:17 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   last lingering change

Thanks Sonia, and thanks Thomas!

I did just see a poblem with DumpPerfMapAtExit that I didn't notice before.  When -XX:+DumpPerfMapAtExit causes a call to CodeCache::write_perf_map, there's now no %p substitution so /tmp/perf-%p.map gets created.

We all hate duplication but CodeCache::write_perf_map has two very different callers.  It could do something like this (feel free to adjust/correct/do something else):

src/hotspot/share/code/codeCache.cpp


 #ifdef LINUX
 void CodeCache::write_perf_map(const char* filename, outputStream* st) {
   MutexLocker mu(CodeCache_lock, Mutex::_no_safepoint_check_flag);
+  if (filename == nullptr) {
+    st->print_cr("Warning: Not writing perf map as null filename provided.");
+    return;
+  }
+  char fname[JVM_MAXPATHLEN];
+  if (strstr(filename, "%p") != nullptr) {
+    // Unnecessary if filename contains %%p but will be a rare waste of time:
+    if (!Arguments::copy_expand_pid(filename, strlen(filename), fname, JVM_MAXPATHLEN)) {
+      st->print_cr("Warning: Not writing perf map as substitution failed.");
+      return;
+    }
+    filename = fname;
+  }
+
   fileStream fs(filename, "w");


JVM_MAXPATHLEN will have a lot of slack space there as if it contains %p it really should be the default filename, so you could go with a lower value.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2257986965

From shade at openjdk.org  Tue Jul 30 10:22:32 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Tue, 30 Jul 2024 10:22:32 GMT
Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in
 prims code
In-Reply-To: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
Message-ID: <oftVH8FweQHlfgduKuMopunOkTBQ30f8s0j5dB0AnQo=.6ab59e3a-906c-41d9-93c1-d614209531e9@github.com>

On Tue, 30 Jul 2024 04:12:33 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this change that removes some uses of literal 0 as a null
> pointer constant in prims code.
> 
> Testing: mach5 tier1

All right, this looks fine. (I am somewhat allergic to `{}` syntax, but it is what it is.)

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20385#pullrequestreview-2207285678

From kevinw at openjdk.org  Tue Jul 30 11:07:30 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Tue, 30 Jul 2024 11:07:30 GMT
Subject: RFR: 8331015: Obsolete -XX:+UseNotificationThread
In-Reply-To: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com>
References: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com>
Message-ID: <BY2HT1UNrrQ4e7TAJruG2X9Caqwgb5bQyT0JOuuakpQ=.b342d0c2-0386-47e8-9026-d1e8b5ad9a7a@github.com>

On Tue, 30 Jul 2024 01:57:33 GMT, Alex Menkov <amenkov at openjdk.org> wrote:

> Obsolete UseNotificationThread flag which was deprecated in JDK 23.
> 
> Testing: tier1..tier5

Looks good

-------------

Marked as reviewed by kevinw (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20381#pullrequestreview-2207376511

From sspitsyn at openjdk.org  Tue Jul 30 11:23:32 2024
From: sspitsyn at openjdk.org (Serguei Spitsyn)
Date: Tue, 30 Jul 2024 11:23:32 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v4]
In-Reply-To: <Og5U6MWsTWn6yVFHLPi4Fovp1Nke8Lk41qCwReD0BIU=.5d3848df-da28-48f4-8801-3ad184e8762f@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
 <Og5U6MWsTWn6yVFHLPi4Fovp1Nke8Lk41qCwReD0BIU=.5d3848df-da28-48f4-8801-3ad184e8762f@github.com>
Message-ID: <-4ohGO-ytRMr_I-4SRpWX6QDeZQCIhVho9mTQadK3MQ=.dff1bb8d-907c-4844-9e1b-801d69984d49@github.com>

On Tue, 30 Jul 2024 06:46:20 GMT, Jiawei Tang <jwtang at openjdk.org> wrote:

>> I add the testcase which can reproduce the crash. I hope that I could get some advise if the codes need changing.
>
> Jiawei Tang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   refactor testcase and change the location of fix codes

Changes requested by sspitsyn (Reviewer).

src/hotspot/share/prims/jvmtiExport.cpp line 1098:

> 1096:   if (JavaThread::current()->is_in_any_VTMS_transition()) {
> 1097:     return false; // no events should be posted if thread is in any VTMS transition
> 1098:   }

Sorry, I was not clear the 3 lines above 1093-1095 had to be replaced with new lines 1096-1098.
The check for `is_in_any_VTMS_transition()` includes the checks `is_in_tmp_VTMS_transition()`.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20373#pullrequestreview-2207409392
PR Review Comment: https://git.openjdk.org/jdk/pull/20373#discussion_r1696791636

From sspitsyn at openjdk.org  Tue Jul 30 11:27:32 2024
From: sspitsyn at openjdk.org (Serguei Spitsyn)
Date: Tue, 30 Jul 2024 11:27:32 GMT
Subject: RFR: 8337331: crash: pinned virtual thread will lead to jvm crash
 when running with the javaagent option [v3]
In-Reply-To: <NUYBzJVMKqywg4-jWaehrYyh76pE84JYyD4n_iYnL0k=.c6682c01-9e20-4ebf-996e-7a715a53d0d7@github.com>
References: <9hxaRK_d2_alDaHWhl3ilx_M-9TIoi7QiXQ4Lc_LYOo=.3fe67617-7953-4d57-851b-e31959144e0c@github.com>
 <Pq3717t6CcEZuvhb8V34_CyTW6eHdVtPs_u_nGRwib8=.2883d513-24b6-4d38-ae4d-90b0e78e7eac@github.com>
 <yBWdB5qfG39speceqxReLp2SRTzlOk3bWt1rjGK83lA=.041249fc-d4b5-4c81-9dc8-4193d82e3a28@github.com>
 <NUYBzJVMKqywg4-jWaehrYyh76pE84JYyD4n_iYnL0k=.c6682c01-9e20-4ebf-996e-7a715a53d0d7@github.com>
Message-ID: <bZ-8wYbIe6rPMB0cv05KHVs3uajJdi_Lrqzg8A6bZVc=.daf9fada-9fe5-45c9-9572-a5660c2c4520@github.com>

On Tue, 30 Jul 2024 06:46:38 GMT, Alan Bateman <alanb at openjdk.org> wrote:

> >  Can you convert the test to use .cpp instead of .c as well?

> or maybe it could use VThreadPinner which allows calling through a native frame for tests like this.

This is a good suggestion, I was also thinking about it.
An example can be found in the test:
 `test/hotspot/jtreg/serviceability/jvmti/vthread/GetThreadState/GetThreadStateTest.java`

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20373#issuecomment-2258119426

From jwaters at openjdk.org  Tue Jul 30 11:54:33 2024
From: jwaters at openjdk.org (Julian Waters)
Date: Tue, 30 Jul 2024 11:54:33 GMT
Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in
 prims code
In-Reply-To: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
Message-ID: <b3UnImUMMfRpNDfiV2Td8ysgmt53Zxas8DTLFnt2ieM=.3da375d1-b0d7-49a8-b947-72fa35bee6ee@github.com>

On Tue, 30 Jul 2024 04:12:33 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this change that removes some uses of literal 0 as a null
> pointer constant in prims code.
> 
> Testing: mach5 tier1

Looks Good!

-------------

Marked as reviewed by jwaters (Committer).

PR Review: https://git.openjdk.org/jdk/pull/20385#pullrequestreview-2207470079

From jwaters at openjdk.org  Tue Jul 30 11:54:34 2024
From: jwaters at openjdk.org (Julian Waters)
Date: Tue, 30 Jul 2024 11:54:34 GMT
Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in
 prims code
In-Reply-To: <GHKiuJLY8J-ixCnxqrGAOyAJm0wdZUOGI6sbioUCNS8=.45d5fe4b-e059-4473-885f-ade0efae9cb5@github.com>
References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
 <Em7Cdv0NCUyAnZtjOQQaTNJleGEvIi4mWAFAuUVCz24=.4eebb188-2eaa-484e-8f5d-557ac99fd67d@github.com>
 <GHKiuJLY8J-ixCnxqrGAOyAJm0wdZUOGI6sbioUCNS8=.45d5fe4b-e059-4473-885f-ade0efae9cb5@github.com>
Message-ID: <TqktOSyEh_Ox9Ex39pPlxyn_LBy3msYU05uizfb4tHc=.8d065351-2145-46d2-bc18-203cf3be6865@github.com>

On Tue, 30 Jul 2024 06:54:10 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> src/hotspot/share/prims/methodHandles.cpp line 439:
>> 
>>> 437:   default:
>>> 438:     fatal("unexpected intrinsic id: %d %s", vmIntrinsics::as_int(iid), vmIntrinsics::name_at(iid));
>>> 439:     return 0;
>> 
>> Do we no longer need these returns after `fatal` to keep compilers happy?
>
> Now that we have, and are using, `[[noreturn]]` on all platforms, we no longer need that dead code.

I'll admit, I do prefer having a return to end all possible control flows in a non void method, but oh well

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1696829614

From dholmes at openjdk.org  Tue Jul 30 12:34:33 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 30 Jul 2024 12:34:33 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v5]
In-Reply-To: <ecGNOoxKho_Go2gZcszTdimzEVHA2NkzQ6XlX97xmoA=.63c2ca29-e65f-4ffb-b178-c87567650241@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <I5ohuzIDvghA8wDhpSAQTppCO3Kqsbp9mGeDvxO6G4U=.1adfae97-c72c-4c53-a465-982e2d398873@github.com>
 <ecGNOoxKho_Go2gZcszTdimzEVHA2NkzQ6XlX97xmoA=.63c2ca29-e65f-4ffb-b178-c87567650241@github.com>
Message-ID: <5r9kuyj3yAUYArP73qpEE9Mkb0kNlboRq2m3CWNduIg=.d893b616-6c3a-4b2d-9b1b-ab9ece37e3ea@github.com>

On Tue, 30 Jul 2024 08:39:51 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

>> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix off-by-one error
>
> src/hotspot/share/utilities/exceptions.cpp line 278:
> 
>> 276:   if (ret == -1 || ret >= max_msg_size) {
>> 277:     int len = (int) strlen(msg);
>> 278:     if (len > 0) {
> 
> `truncate` asserts that len>5, you might need to adjust that.

We know we only got here because the message was either huge (-1) or > 1K. We only get a zero length if we got -1 and are on Windows. Any length < max_msg_size means we got -1 and are on macOS and we have truncated prior to the conversion that caused the INT_MAX overflow. In theory it could be a single "%s" format but in practice we don't call fthrow that way. Also if we got the -1 then there are actually very few circumstances that can get us to this point because it needs to be an exception message that can relate to huge strings (which at the moment is trying to look up a class with an illegally long name - something that will soon be handled on the Java side before we get to the VM.)

So if we somehow were to trigger the len>5 assert, that is fine as it indicates something unusual/unexpected that we want to catch.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20345#discussion_r1696883436

From djelinski at openjdk.org  Tue Jul 30 12:38:33 2024
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Tue, 30 Jul 2024 12:38:33 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v5]
In-Reply-To: <I5ohuzIDvghA8wDhpSAQTppCO3Kqsbp9mGeDvxO6G4U=.1adfae97-c72c-4c53-a465-982e2d398873@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <I5ohuzIDvghA8wDhpSAQTppCO3Kqsbp9mGeDvxO6G4U=.1adfae97-c72c-4c53-a465-982e2d398873@github.com>
Message-ID: <0DGLCSfGuXAZI6APR2EiKoRxAlxZo6OIRpDtHHTJ-is=.20ad55c6-830a-475c-bbcf-a0a9c84e771e@github.com>

On Tue, 30 Jul 2024 05:41:08 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
>> 
>> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
>> 
>> Testing:
>>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>>  - tiers 1-3 sanity testing
>> 
>> Thanks.
>
> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix off-by-one error

LGTM

-------------

Marked as reviewed by djelinski (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20345#pullrequestreview-2207567151

From mbaesken at openjdk.org  Tue Jul 30 12:43:35 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Tue, 30 Jul 2024 12:43:35 GMT
Subject: RFR: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap' [v4]
In-Reply-To: <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
 <ATYMTAD044cPjr_Oph_i29cpfKR6cf8PfnumpFWl_FM=.81e6597c-5af7-4d19-9e96-fe1ddd8a7ebd@github.com>
Message-ID: <4ZJEarDQH0c_N4bwAtFQw3lG_WCwsa-c2QmHAmFD0J0=.ece94411-8818-4a29-8934-21ca8f60db1c@github.com>

On Thu, 25 Jul 2024 13:42:48 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> When running with ubsan - enabled binaries, some tests trigger the following report :
>> 
>> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
>>     #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91
>>     #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106
>>     #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157
>>     #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63
>>     #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85
>>     #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319
>>     #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233
>>     #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199
>> 
>> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr .
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   add patch of Kim Barrett

Thanks for the reviews !

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20296#issuecomment-2258251969

From mbaesken at openjdk.org  Tue Jul 30 12:43:36 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Tue, 30 Jul 2024 12:43:36 GMT
Subject: Integrated: 8333354: ubsan: frame.inline.hpp:91:25: and
 src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call
 on null pointer of type 'const struct SmallRegisterMap'
In-Reply-To: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
References: <6apJS69Nf0cZrzMg0H6oC86Fyz2pfiFJB6lBqUjhPWA=.fbeb700a-b2b0-41ce-a9a5-89e81084aee9@github.com>
Message-ID: <W-pWj7nt20S7Ovrl0hnUWnP_SKbzwDt2wx0cemi4p9I=.dcd9cc26-f079-48a9-aea9-0c57893597c4@github.com>

On Tue, 23 Jul 2024 09:49:38 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> When running with ubsan - enabled binaries, some tests trigger the following report :
> 
> src/hotspot/share/runtime/frame.inline.hpp:91:25: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'
>     #0 0x7fc1df86071e in unsigned char* frame::oopmapreg_to_location<SmallRegisterMap>(VMRegImpl*, SmallRegisterMap const*) const src/hotspot/share/runtime/frame.inline.hpp:91
>     #1 0x7fc1df86071e in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::iterate_oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:106
>     #2 0x7fc1df8611df in void OopMapDo<OopClosure, DerivedOopClosure, IncludeAllValues>::oops_do<SmallRegisterMap>(frame const*, SmallRegisterMap const*, ImmutableOopMap const*) src/hotspot/share/compiler/oopMap.inline.hpp:157
>     #3 0x7fc1df8611df in FrameOopIterator<SmallRegisterMap>::oops_do(OopClosure*) src/hotspot/share/oops/stackChunkOop.cpp:63
>     #4 0x7fc1dcfc8745 in BarrierSetStackChunk::encode_gc_mode(stackChunkOopDesc*, OopIterator*) src/hotspot/share/gc/shared/barrierSetStackChunk.cpp:85
>     #5 0x7fc1df854080 in bool TransformStackChunkClosure::do_frame<(ChunkFrames)0, SmallRegisterMap>(StackChunkFrameStream<(ChunkFrames)0> const&, SmallRegisterMap const*) src/hotspot/share/oops/stackChunkOop.cpp:319
>     #6 0x7fc1df854080 in void stackChunkOopDesc::iterate_stack<(ChunkFrames)0, TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:233
>     #7 0x7fc1df82f184 in void stackChunkOopDesc::iterate_stack<TransformStackChunkClosure>(TransformStackChunkClosure*) src/hotspot/share/oops/stackChunkOop.inline.hpp:199
> 
> Seems in case of (at least) class SmallRegisterMap we miss handling nullptr .

This pull request has now been integrated.

Changeset: 81628328
Author:    Matthias Baesken <mbaesken at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/8162832832ac6e8c17f942e718e309a3489e0da6
Stats:     131 lines in 10 files changed: 50 ins; 64 del; 17 mod

8333354: ubsan: frame.inline.hpp:91:25: and src/hotspot/share/runtime/frame.inline.hpp:88:29: runtime error: member call on null pointer of type 'const struct SmallRegisterMap'

Co-authored-by: Kim Barrett <kbarrett at openjdk.org>
Reviewed-by: rrich, clanger

-------------

PR: https://git.openjdk.org/jdk/pull/20296

From mbaesken at openjdk.org  Tue Jul 30 13:03:07 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Tue, 30 Jul 2024 13:03:07 GMT
Subject: RFR: 8333144: docker tests do not work when ubsan is configured
 [v2]
In-Reply-To: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
Message-ID: <QlWywf3yso2GdzP_yf85VTKSE_n45c9PowGDpNbV_9c=.6e180952-944f-4de7-a0f3-4121a444cb31@github.com>

> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work.
> 
> We find this in the test output
> 
> [STDOUT]
> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory
> 
> The container where the test is executed does not contain the ubsan package; we might skip the test in this case.

Matthias Baesken has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:

 - Merge remote-tracking branch 'origin/master' into JDK-8333144
 - JDK-8333144

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19907/files
  - new: https://git.openjdk.org/jdk/pull/19907/files/35163ff7..ba4f63be

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19907&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19907&range=00-01

  Stats: 29403 lines in 1043 files changed: 19226 ins; 5688 del; 4489 mod
  Patch: https://git.openjdk.org/jdk/pull/19907.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19907/head:pull/19907

PR: https://git.openjdk.org/jdk/pull/19907

From mbaesken at openjdk.org  Tue Jul 30 13:26:45 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Tue, 30 Jul 2024 13:26:45 GMT
Subject: RFR: 8333144: docker tests do not work when ubsan is configured
 [v3]
In-Reply-To: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
Message-ID: <WOwcaWSeF_X020nBqsY6rs7STGxZmZVuZAyeA3nt1Tg=.a16acf38-8c6d-429a-b184-8c5c04ac9ceb@github.com>

> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work.
> 
> We find this in the test output
> 
> [STDOUT]
> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory
> 
> The container where the test is executed does not contain the ubsan package; we might skip the test in this case.

Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:

  install libubsan1 into test container

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19907/files
  - new: https://git.openjdk.org/jdk/pull/19907/files/ba4f63be..4a792430

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19907&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19907&range=01-02

  Stats: 12 lines in 2 files changed: 1 ins; 11 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/19907.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19907/head:pull/19907

PR: https://git.openjdk.org/jdk/pull/19907

From mbaesken at openjdk.org  Tue Jul 30 13:30:31 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Tue, 30 Jul 2024 13:30:31 GMT
Subject: RFR: 8333144: docker tests do not work when ubsan is configured
In-Reply-To: <6WiWP5JSoQ6KQbe5FAg84BGAcRp0XtojWan0nyGaXjo=.562bc939-4751-47c8-a739-3b76cb67b710@github.com>
References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
 <6WiWP5JSoQ6KQbe5FAg84BGAcRp0XtojWan0nyGaXjo=.562bc939-4751-47c8-a739-3b76cb67b710@github.com>
Message-ID: <5sIltLs4ES9qaTNsz1m1O2W1yz-ZfPWuUAGZM877XNA=.c118065b-6cf3-44ab-871c-937e373d3dc7@github.com>

On Tue, 30 Jul 2024 09:31:15 GMT, Christoph Langer <clanger at openjdk.org> wrote:

> I would prefer to add the required ubsan libraries to the container used for testing when testing an ubsan enabled build. Can we achieve this?

I added  libubsan1 to the container  (tested it and works nicely, should do no harm if  we test a non-ubsan build).
Should we go this way ?
If so I could remove the WhiteBox related changes (or keep it for other usages).
I also tried to add a WhiteBox based check to  'DockerTestUtils.java' but this seems not to work. But as i said it is probably not necessary.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2258351989

From asmehra at openjdk.org  Tue Jul 30 13:35:35 2024
From: asmehra at openjdk.org (Ashutosh Mehra)
Date: Tue, 30 Jul 2024 13:35:35 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic [v2]
In-Reply-To: <lsdnO__d3kqEFpSJJVZOz7JSRaSQXjxT6xwC0kc1MxI=.ec76bf46-635c-411d-9d0c-918d286f0f0b@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
 <5fyuvwoHRU_EUT2tvUsWwzCjd7dazKHMiL0rGWW8jVU=.fed6e33a-7a22-4b4c-950f-d19c18ee0eaf@github.com>
 <lsdnO__d3kqEFpSJJVZOz7JSRaSQXjxT6xwC0kc1MxI=.ec76bf46-635c-411d-9d0c-918d286f0f0b@github.com>
Message-ID: <IeStexqMLw1WnPuf5RpzLaFFRraSGUV39IRN-Zr1N1k=.41eab3b5-5640-481a-8b53-7a7072489da2@github.com>

On Tue, 30 Jul 2024 05:18:17 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Address review comments by Thomas S.
>>   
>>   Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
>
> src/hotspot/share/compiler/compilationMemoryStatistic.hpp line 40:
> 
>> 38: 
>> 39: // Helper class to wrap the array of arena tags for easier processing
>> 40: class ArenaTagsCounter {
> 
> Sorry for being a stickler for precise names, but I would like plural for counters here - it is not a single counter, its a series/vector/array of counters.
> Any of these work for me: ArenaCountersByTag - ArenaCountersByTagVector - ArenaTagCounterVector - ArenaTagCounters

I am pretty bad in naming things, so I welcome these suggestions. I will go with ArenaCountersByTag.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20304#discussion_r1696973723

From nprasad at openjdk.org  Tue Jul 30 13:52:33 2024
From: nprasad at openjdk.org (Neethu Prasad)
Date: Tue, 30 Jul 2024 13:52:33 GMT
Subject: RFR: 8334230: Optimize C2 classes layout [v3]
In-Reply-To: <9Xqj8lhtk5xtM-NHRl-GBFTZSzQcNw8yYo_ket5U0aM=.a752c907-e9c7-47cc-87c1-bf6bf0a3d642@github.com>
References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>
 <9Xqj8lhtk5xtM-NHRl-GBFTZSzQcNw8yYo_ket5U0aM=.a752c907-e9c7-47cc-87c1-bf6bf0a3d642@github.com>
Message-ID: <cqc5eZCkK8915jLHjCDOiSGUJA74tTBZz7PYMg1czFc=.65685c5a-951b-4f55-b9a6-6228522f6eaf@github.com>

On Tue, 30 Jul 2024 00:53:04 GMT, Neethu Prasad <nprasad at openjdk.org> wrote:

>> **Notes**
>> 
>> Rearrange C2 class fields to optimize footprint.
>> 
>> 
>> **Verification**
>> 
>> 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests.
>> 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes.
>> 
>> | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding |
>> | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- |
>> | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 ->  0 | 11 -> 0  | 56 bytes -> 48 | 0 -> 3 |
>> | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 ->  0 | 5 -> 0  | 24 bytes -> 16 | 7 -> 4 |
>> | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 ->  0 | 7 -> 0  | 56 bytes -> 48 | 7 -> 6 |
>> | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 ->  0 | 8 -> 0  | 32 bytes -> 24 | 1 -> 1 |
>> 
>> class ArrayPointer {
>> 	const class Node  *        _pointer;             /*     0     8 */
>> 	const class Node  *        _base;                /*     8     8 */
>> 	const jlong                _constant_offset;     /*    16     8 */
>> 	const class Node  *        _int_offset;          /*    24     8 */
>> 	const class GrowableArray<Node*>  * _other_offsets; /*    32     8 */
>> 	const jint                 _int_offset_shift;    /*    40     4 */
>> 	const bool                 _is_valid;            /*    44     1 */
>> public:
>> 
>> 
>> 	/* size: 48, cachelines: 1, members: 7 */
>> 	/* padding: 3 */
>> 	/* last cacheline: 48 bytes */
>> };
>> 
>> 
>> 
>> class CallJavaNode : public CallNode {
>> public:
>> 
>> 	/* class CallNode            <ancestor>; */      /*     0   128 */
>> protected:
>> 
>> 	/* --- cacheline 2 boundary (128 bytes) --- */
>> 	class ciMethod *           _method;              /*   128     8 */
>> 	bool                       _optimized_virtual;   /*   136     1 */
>> 	bool                       _method_handle_invoke; /*   137     1 */
>> 	bool                       _override_symbolic_info; /*   138     1 */
>> 	bool                       _arg_escape;          /*   139     1 */
>> public:
>> 
>> protected:
>> 
>> public:
>> 
>> 
>> 	/* size: 144, cachelines: 3, members: 6 */
>> 	/* padding: 4 */
>> 	/* last cacheline: 16 bytes */
>> 
>> 	/* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */
>> };
>> 
>> 
>> 
>> class C2Access : public StackObj {
>> public:
>> 
>> 	/* class StackObj            <ancestor>; */      /*     0     0 */
>> 
>> 	/* XXX last struct has 1 byte of padding */
>> 
> ...
>
> Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Address constructor order issue for C2OptAccess

Thanks for the review & approval.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19861#issuecomment-2258399973

From duke at openjdk.org  Tue Jul 30 13:52:33 2024
From: duke at openjdk.org (duke)
Date: Tue, 30 Jul 2024 13:52:33 GMT
Subject: RFR: 8334230: Optimize C2 classes layout [v3]
In-Reply-To: <9Xqj8lhtk5xtM-NHRl-GBFTZSzQcNw8yYo_ket5U0aM=.a752c907-e9c7-47cc-87c1-bf6bf0a3d642@github.com>
References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>
 <9Xqj8lhtk5xtM-NHRl-GBFTZSzQcNw8yYo_ket5U0aM=.a752c907-e9c7-47cc-87c1-bf6bf0a3d642@github.com>
Message-ID: <uWFDikC4hzr1vv0BkTHm4YfYGcuYm086t5CHNsU6HB4=.2475c8c2-a712-4716-b3e2-c78eabf8433e@github.com>

On Tue, 30 Jul 2024 00:53:04 GMT, Neethu Prasad <nprasad at openjdk.org> wrote:

>> **Notes**
>> 
>> Rearrange C2 class fields to optimize footprint.
>> 
>> 
>> **Verification**
>> 
>> 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests.
>> 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes.
>> 
>> | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding |
>> | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- |
>> | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 ->  0 | 11 -> 0  | 56 bytes -> 48 | 0 -> 3 |
>> | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 ->  0 | 5 -> 0  | 24 bytes -> 16 | 7 -> 4 |
>> | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 ->  0 | 7 -> 0  | 56 bytes -> 48 | 7 -> 6 |
>> | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 ->  0 | 8 -> 0  | 32 bytes -> 24 | 1 -> 1 |
>> 
>> class ArrayPointer {
>> 	const class Node  *        _pointer;             /*     0     8 */
>> 	const class Node  *        _base;                /*     8     8 */
>> 	const jlong                _constant_offset;     /*    16     8 */
>> 	const class Node  *        _int_offset;          /*    24     8 */
>> 	const class GrowableArray<Node*>  * _other_offsets; /*    32     8 */
>> 	const jint                 _int_offset_shift;    /*    40     4 */
>> 	const bool                 _is_valid;            /*    44     1 */
>> public:
>> 
>> 
>> 	/* size: 48, cachelines: 1, members: 7 */
>> 	/* padding: 3 */
>> 	/* last cacheline: 48 bytes */
>> };
>> 
>> 
>> 
>> class CallJavaNode : public CallNode {
>> public:
>> 
>> 	/* class CallNode            <ancestor>; */      /*     0   128 */
>> protected:
>> 
>> 	/* --- cacheline 2 boundary (128 bytes) --- */
>> 	class ciMethod *           _method;              /*   128     8 */
>> 	bool                       _optimized_virtual;   /*   136     1 */
>> 	bool                       _method_handle_invoke; /*   137     1 */
>> 	bool                       _override_symbolic_info; /*   138     1 */
>> 	bool                       _arg_escape;          /*   139     1 */
>> public:
>> 
>> protected:
>> 
>> public:
>> 
>> 
>> 	/* size: 144, cachelines: 3, members: 6 */
>> 	/* padding: 4 */
>> 	/* last cacheline: 16 bytes */
>> 
>> 	/* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */
>> };
>> 
>> 
>> 
>> class C2Access : public StackObj {
>> public:
>> 
>> 	/* class StackObj            <ancestor>; */      /*     0     0 */
>> 
>> 	/* XXX last struct has 1 byte of padding */
>> 
> ...
>
> Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Address constructor order issue for C2OptAccess

@neethu-prasad 
Your change (at version 490c381ee37ec38774fd08b1239d28ad11ad7aa6) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19861#issuecomment-2258401588

From nprasad at openjdk.org  Tue Jul 30 14:10:37 2024
From: nprasad at openjdk.org (Neethu Prasad)
Date: Tue, 30 Jul 2024 14:10:37 GMT
Subject: Integrated: 8334230: Optimize C2 classes layout
In-Reply-To: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>
References: <ZhGZc1261TFoU0MEzTHpz0ldXbRPEycH-Ed9-En_wvI=.d25fb953-c48c-4e1e-af6b-dacaa9bb5abb@github.com>
Message-ID: <i47cPhAlUn1TTJhuzpzLGJ6m1nmvgLv1tNDlgq4X7jY=.f47d4aa7-4599-4a8a-8947-f656aaa2c0b4@github.com>

On Mon, 24 Jun 2024 15:53:24 GMT, Neethu Prasad <nprasad at openjdk.org> wrote:

> **Notes**
> 
> Rearrange C2 class fields to optimize footprint.
> 
> 
> **Verification**
> 
> 1. Ran tier2_compiler, hotspot_compiler, tier 1 & tier 2 tests.
> 2. Ran pahole on 64 bit machine post re-ordering and verified that there are no holes / reduction in total bytes.
> 
> | Class | Size | Cachelines | Sum Members | Holes | Sum holes | Last Cacheline | Padding |
> | ----- | ----- | ---------- | --------------- | ----- | ---------- | --------------- | -------- |
> | ArrayPointer | 56 -> 48 | 1 -> 1 | 45 -> 0 | 2 ->  0 | 11 -> 0  | 56 bytes -> 48 | 0 -> 3 |
> | CallJavaNode | 152 -> 144 | 3 -> 3 | 12 -> 0 | 1 ->  0 | 5 -> 0  | 24 bytes -> 16 | 7 -> 4 |
> | C2Access | 56 -> 48 | 1-> 1 | 42 -> 0 | 1 ->  0 | 7 -> 0  | 56 bytes -> 48 | 7 -> 6 |
> | VectorSet| 32 -> 24 | 1-> 1 | 24 -> 0 | 1 ->  0 | 8 -> 0  | 32 bytes -> 24 | 1 -> 1 |
> 
> class ArrayPointer {
> 	const class Node  *        _pointer;             /*     0     8 */
> 	const class Node  *        _base;                /*     8     8 */
> 	const jlong                _constant_offset;     /*    16     8 */
> 	const class Node  *        _int_offset;          /*    24     8 */
> 	const class GrowableArray<Node*>  * _other_offsets; /*    32     8 */
> 	const jint                 _int_offset_shift;    /*    40     4 */
> 	const bool                 _is_valid;            /*    44     1 */
> public:
> 
> 
> 	/* size: 48, cachelines: 1, members: 7 */
> 	/* padding: 3 */
> 	/* last cacheline: 48 bytes */
> };
> 
> 
> 
> class CallJavaNode : public CallNode {
> public:
> 
> 	/* class CallNode            <ancestor>; */      /*     0   128 */
> protected:
> 
> 	/* --- cacheline 2 boundary (128 bytes) --- */
> 	class ciMethod *           _method;              /*   128     8 */
> 	bool                       _optimized_virtual;   /*   136     1 */
> 	bool                       _method_handle_invoke; /*   137     1 */
> 	bool                       _override_symbolic_info; /*   138     1 */
> 	bool                       _arg_escape;          /*   139     1 */
> public:
> 
> protected:
> 
> public:
> 
> 
> 	/* size: 144, cachelines: 3, members: 6 */
> 	/* padding: 4 */
> 	/* last cacheline: 16 bytes */
> 
> 	/* BRAIN FART ALERT! 144 bytes != 12 (member bytes) + 0 (member bits) + 0 (byte holes) + 0 (bit holes), diff = 1024 bits */
> };
> 
> 
> 
> class C2Access : public StackObj {
> public:
> 
> 	/* class StackObj            <ancestor>; */      /*     0     0 */
> 
> 	/* XXX last struct has 1 byte of padding */
> 
> 	int ()(void) * *           _vptr.C2Access;       /*     0     8 */
> protected:
> 
> 	DecoratorSet               _decorators;          /*     8  ...

This pull request has now been integrated.

Changeset: 1cb27f7e
Author:    Neethu Prasad <nprasad at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/1cb27f7e2355ccb911bb274fc004e5bc57fd5dc9
Stats:     17 lines in 4 files changed: 8 ins; 8 del; 1 mod

8334230: Optimize C2 classes layout

Reviewed-by: shade, kvn, thartmann

-------------

PR: https://git.openjdk.org/jdk/pull/19861

From szaldana at openjdk.org  Tue Jul 30 14:33:10 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Tue, 30 Jul 2024 14:33:10 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v18]
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <kEEMF9wHZB2AKvO2w3Mg_o249GBDeZ8PlWMQFIejh7k=.ca4cd356-7c4f-4344-bb40-b336c717dae4@github.com>

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:

  Fixing invocation outside of jcmd

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20198/files
  - new: https://git.openjdk.org/jdk/pull/20198/files/ceb96eb9..564349e3

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=17
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20198&range=16-17

  Stats: 12 lines in 2 files changed: 11 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20198.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20198/head:pull/20198

PR: https://git.openjdk.org/jdk/pull/20198

From kevinw at openjdk.org  Tue Jul 30 14:52:35 2024
From: kevinw at openjdk.org (Kevin Walls)
Date: Tue, 30 Jul 2024 14:52:35 GMT
Subject: RFR: 8334492: DiagnosticCommands (jcmd) should accept %p in output
 filenames and substitute PID [v18]
In-Reply-To: <kEEMF9wHZB2AKvO2w3Mg_o249GBDeZ8PlWMQFIejh7k=.ca4cd356-7c4f-4344-bb40-b336c717dae4@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
 <kEEMF9wHZB2AKvO2w3Mg_o249GBDeZ8PlWMQFIejh7k=.ca4cd356-7c4f-4344-bb40-b336c717dae4@github.com>
Message-ID: <vYX77VW0Aj5Uj7sLL1tTsCHkNgGO9fZ0wEkD5gsyR7s=.99bcd306-43da-43b1-a56b-2fba622a9b3f@github.com>

On Tue, 30 Jul 2024 14:33:10 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

>> Hi all, 
>> 
>> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
>> 
>> This PR addresses the following diagnostic commands: 
>> - [x] Compiler.perfmap 
>> - [x] GC.heap_dump
>> - [x] System.dump_map
>> - [x] Thread.dump_to_file
>> - [x] VM.cds
>> 
>> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
>> 
>> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
>> 
>> 
>> filename         (Optional) Name of the file to which the flight recording data is
>>                    written when the recording is stopped. If no filename is given, a
>>                    filename is generated from the PID and the current date and is
>>                    placed in the directory where the process was started. The
>>                    filename may also be a directory in which case, the filename is
>>                    generated from the PID and the current date in the specified
>>                    directory. (STRING, no default value)
>> 
>>                    Note: If a filename is given, '%p' in the filename will be
>>                    replaced by the PID, and '%t' will be replaced by the time in
>>                    'yyyy_MM_dd_HH_mm_ss' format.
>> 
>> 
>> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
>> 
>> Testing: 
>> 
>> - [x] Added test case passes. 
>> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
>> 
>> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
>> 
>> Cheers, 
>> Sonia
>
> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fixing invocation outside of jcmd

OK great, thanks for that last update, nothing more to say! 8-)

I will do the man page update and get it out for review soon.

-------------

Marked as reviewed by kevinw (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20198#pullrequestreview-2207924716
PR Comment: https://git.openjdk.org/jdk/pull/20198#issuecomment-2258543294

From asmehra at openjdk.org  Tue Jul 30 15:02:51 2024
From: asmehra at openjdk.org (Ashutosh Mehra)
Date: Tue, 30 Jul 2024 15:02:51 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic [v3]
In-Reply-To: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
Message-ID: <-9CKmLLhUHanmV4kGh3Mzti8od9vxyQAQ_t2hvQVLX4=.c50db040-3a76-42de-ae55-27e4db236fda@github.com>

> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031)
> 
> Testing:
>   test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java
>   test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java

Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision:

  Rename ArenaTagsCounter to ArenaCountersByTag
  
  Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20304/files
  - new: https://git.openjdk.org/jdk/pull/20304/files/008ac6b9..70762110

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20304&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20304&range=01-02

  Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/20304.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20304/head:pull/20304

PR: https://git.openjdk.org/jdk/pull/20304

From asmehra at openjdk.org  Tue Jul 30 15:02:51 2024
From: asmehra at openjdk.org (Ashutosh Mehra)
Date: Tue, 30 Jul 2024 15:02:51 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic
In-Reply-To: <yqXqOJCnOBQToVnGiTvMv9SRVECCZuArbWqfiVEj6VE=.eb63b66f-63a5-4c51-8757-87f2694afd98@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
 <yqXqOJCnOBQToVnGiTvMv9SRVECCZuArbWqfiVEj6VE=.eb63b66f-63a5-4c51-8757-87f2694afd98@github.com>
Message-ID: <PGwdTyL6oCyFp513QBNKaVsBE9dKstmCKacfcxYf3jg=.0c9730d7-37a5-4912-b57a-0a69696fd18b@github.com>

On Wed, 24 Jul 2024 10:45:05 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031)
>> 
>> Testing:
>>   test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java
>>   test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java
>
> I plan to look at this later this week.

@tstuefe renamed ArenaTagsCounter to ArenaCountersByTag

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20304#issuecomment-2258563618

From kvn at openjdk.org  Tue Jul 30 17:10:33 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Tue, 30 Jul 2024 17:10:33 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic [v3]
In-Reply-To: <-9CKmLLhUHanmV4kGh3Mzti8od9vxyQAQ_t2hvQVLX4=.c50db040-3a76-42de-ae55-27e4db236fda@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
 <-9CKmLLhUHanmV4kGh3Mzti8od9vxyQAQ_t2hvQVLX4=.c50db040-3a76-42de-ae55-27e4db236fda@github.com>
Message-ID: <_XRsVlxQxbH7kHV8gIwxeK6S2Cy3oT0frrcIzOgQESY=.202dcbf2-3258-417a-933e-eab10e3d7fbc@github.com>

On Tue, 30 Jul 2024 15:02:51 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote:

>> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031)
>> 
>> Testing:
>>   test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java
>>   test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java
>
> Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename ArenaTagsCounter to ArenaCountersByTag
>   
>   Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>

Looks reasonable. I submitted our testing to make sure tests passed in our testing.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20304#pullrequestreview-2208245618

From szaldana at openjdk.org  Tue Jul 30 18:43:41 2024
From: szaldana at openjdk.org (Sonia Zaldana Calles)
Date: Tue, 30 Jul 2024 18:43:41 GMT
Subject: Integrated: 8334492: DiagnosticCommands (jcmd) should accept %p in
 output filenames and substitute PID
In-Reply-To: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
References: <8kEqL61aS6ZZeLtvifidQhURa2tenl92m5uIAtXAxcE=.31d2d492-7212-4637-99bd-eeff4773a18b@github.com>
Message-ID: <7Dk-ApfeC6fpoIAMknLszeWlrg0G-h728jC6iaHn2S4=.e2253687-64ca-48cf-bf2e-9f1a680049c4@github.com>

On Tue, 16 Jul 2024 16:25:50 GMT, Sonia Zaldana Calles <szaldana at openjdk.org> wrote:

> Hi all, 
> 
> This PR addresses [8334492](https://bugs.openjdk.org/browse/JDK-8334492) enabling jcmd diagnostic commands that issue an output file to accept the `%p` pattern in the file name and substitute it for the PID. 
> 
> This PR addresses the following diagnostic commands: 
> - [x] Compiler.perfmap 
> - [x] GC.heap_dump
> - [x] System.dump_map
> - [x] Thread.dump_to_file
> - [x] VM.cds
> 
> Note that some jcmd diagnostic commands already enable this functionality (`JFR.configure, JFR.dump, JFR.start and JFR.stop`). 
> 
> I propose opening a separate issue to track updating the man page similarly to how it?s done for the JFR diagnostic commands. For example, 
> 
> 
> filename         (Optional) Name of the file to which the flight recording data is
>                    written when the recording is stopped. If no filename is given, a
>                    filename is generated from the PID and the current date and is
>                    placed in the directory where the process was started. The
>                    filename may also be a directory in which case, the filename is
>                    generated from the PID and the current date in the specified
>                    directory. (STRING, no default value)
> 
>                    Note: If a filename is given, '%p' in the filename will be
>                    replaced by the PID, and '%t' will be replaced by the time in
>                    'yyyy_MM_dd_HH_mm_ss' format.
> 
> 
> Unfortunately, per [8276265](https://bugs.openjdk.org/browse/JDK-8276265), sources for the jcmd manpage remain in Oracle internal repos so this PR can?t address that. 
> 
> Testing: 
> 
> - [x] Added test case passes. 
> - [x] Modified existing VM.cds tests to also check for `%p` filenames. 
> 
> Looking forward to your comments and addressing any diagnostic commands I might have missed (if any). 
> 
> Cheers, 
> Sonia

This pull request has now been integrated.

Changeset: f5c9e8f1
Author:    Sonia Zaldana Calles <szaldana at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/f5c9e8f1229f3aa00743119a2dee3e15d57048d8
Stats:     173 lines in 11 files changed: 134 ins; 16 del; 23 mod

8334492: DiagnosticCommands (jcmd) should accept %p in output filenames and substitute PID

Reviewed-by: kevinw, stuefe, lmesnik

-------------

PR: https://git.openjdk.org/jdk/pull/20198

From kvn at openjdk.org  Tue Jul 30 21:08:34 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Tue, 30 Jul 2024 21:08:34 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic [v3]
In-Reply-To: <-9CKmLLhUHanmV4kGh3Mzti8od9vxyQAQ_t2hvQVLX4=.c50db040-3a76-42de-ae55-27e4db236fda@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
 <-9CKmLLhUHanmV4kGh3Mzti8od9vxyQAQ_t2hvQVLX4=.c50db040-3a76-42de-ae55-27e4db236fda@github.com>
Message-ID: <TCzh2VR4vb__B5WC8W5yGFITgdOjkquLymnj5pzbSac=.fec19a8b-e77a-4846-b916-ba5c51fd9801@github.com>

On Tue, 30 Jul 2024 15:02:51 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote:

>> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031)
>> 
>> Testing:
>>   test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java
>>   test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java
>
> Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename ArenaTagsCounter to ArenaCountersByTag
>   
>   Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>

My testing passed.

-------------

Marked as reviewed by kvn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20304#pullrequestreview-2208698602

From dholmes at openjdk.org  Tue Jul 30 22:38:36 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 30 Jul 2024 22:38:36 GMT
Subject: RFR: 8325002: Exceptions::fthrow needs to ensure it truncates to a
 valid utf8 string [v5]
In-Reply-To: <0DGLCSfGuXAZI6APR2EiKoRxAlxZo6OIRpDtHHTJ-is=.20ad55c6-830a-475c-bbcf-a0a9c84e771e@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
 <I5ohuzIDvghA8wDhpSAQTppCO3Kqsbp9mGeDvxO6G4U=.1adfae97-c72c-4c53-a465-982e2d398873@github.com>
 <0DGLCSfGuXAZI6APR2EiKoRxAlxZo6OIRpDtHHTJ-is=.20ad55c6-830a-475c-bbcf-a0a9c84e771e@github.com>
Message-ID: <gqsiIGMIc2Z68KOimRr82y0DFUkOxHrJKBy5m4r6D44=.a4469385-ec21-4c12-b51d-05e8fe19acf9@github.com>

On Tue, 30 Jul 2024 12:36:01 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

>> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix off-by-one error
>
> LGTM

Thanks for the review and the assistance @djelinski !

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20345#issuecomment-2259316733

From dholmes at openjdk.org  Tue Jul 30 22:38:37 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 30 Jul 2024 22:38:37 GMT
Subject: Integrated: 8325002: Exceptions::fthrow needs to ensure it truncates
 to a valid utf8 string
In-Reply-To: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
References: <NeYPxTjRR65RKQPjxfxskGHvOoJOq-VZazOuC8xeKTo=.7a947e5d-e437-46f2-86b9-b0a32ad1e070@github.com>
Message-ID: <owL-YVQEFHKjBUu4GxNs1KZFFe6w4R0994d0b37sQQk=.4ed07c8b-4a98-4cce-82a0-6ccae6d0d18a@github.com>

On Fri, 26 Jul 2024 04:03:10 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Exceptions::fthrow uses a 1024 byte buffer to format the incoming exception message string, but this may not be large enough, leading to truncation. However, we should ensure we truncate to a valid UTF8 sequence.
> 
> The process is explained in the code. Thanks to @RogerRiggs and @djelinski for their suggestions on how to tackle this.
> 
> Testing:
>  - new gtest exercises the truncation code with the different possibilities for bad truncation
>  - tiers 1-3 sanity testing
> 
> Thanks.

This pull request has now been integrated.

Changeset: 5b7bb40d
Author:    David Holmes <dholmes at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/5b7bb40d1f5a8e1261cc252db2a09b5e4f07e5f0
Stats:     194 lines in 4 files changed: 191 ins; 0 del; 3 mod

8325002: Exceptions::fthrow needs to ensure it truncates to a valid utf8 string

Reviewed-by: djelinski, stuefe

-------------

PR: https://git.openjdk.org/jdk/pull/20345

From dholmes at openjdk.org  Tue Jul 30 23:07:56 2024
From: dholmes at openjdk.org (David Holmes)
Date: Tue, 30 Jul 2024 23:07:56 GMT
Subject: RFR: 8337515: JVM_DumpAllStacks is dead code
Message-ID: <bj6I0sIWwBuxo-YC7xW59uo-mEN1RByejEDgf2nKa6w=.8655fc0c-6264-4af1-be7b-02966460e3f4@github.com>

Trivial cleanup of long unused code.

Thanks.

-------------

Commit messages:
 - 8337515: JVM_DumpAllStacks is dead code

Changes: https://git.openjdk.org/jdk/pull/20396/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20396&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8337515
  Stats: 11 lines in 2 files changed: 0 ins; 11 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20396.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20396/head:pull/20396

PR: https://git.openjdk.org/jdk/pull/20396

From kvn at openjdk.org  Wed Jul 31 00:32:34 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Wed, 31 Jul 2024 00:32:34 GMT
Subject: RFR: 8337515: JVM_DumpAllStacks is dead code
In-Reply-To: <bj6I0sIWwBuxo-YC7xW59uo-mEN1RByejEDgf2nKa6w=.8655fc0c-6264-4af1-be7b-02966460e3f4@github.com>
References: <bj6I0sIWwBuxo-YC7xW59uo-mEN1RByejEDgf2nKa6w=.8655fc0c-6264-4af1-be7b-02966460e3f4@github.com>
Message-ID: <xbSCjeDRwb0CWIhmEcQlhhZpHuwWrjosn3OulE4bhe8=.fc77088d-bedc-47c4-98b2-7ac74f368d31@github.com>

On Tue, 30 Jul 2024 23:03:43 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Trivial cleanup of long unused code.
> 
> Thanks.

Good.

-------------

Marked as reviewed by kvn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20396#pullrequestreview-2208930663

From dholmes at openjdk.org  Wed Jul 31 01:01:49 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 31 Jul 2024 01:01:49 GMT
Subject: RFR: 8337515: JVM_DumpAllStacks is dead code
In-Reply-To: <xbSCjeDRwb0CWIhmEcQlhhZpHuwWrjosn3OulE4bhe8=.fc77088d-bedc-47c4-98b2-7ac74f368d31@github.com>
References: <bj6I0sIWwBuxo-YC7xW59uo-mEN1RByejEDgf2nKa6w=.8655fc0c-6264-4af1-be7b-02966460e3f4@github.com>
 <xbSCjeDRwb0CWIhmEcQlhhZpHuwWrjosn3OulE4bhe8=.fc77088d-bedc-47c4-98b2-7ac74f368d31@github.com>
Message-ID: <ZJ-yiycoeCDJQdCnLTYlhSE8DrHb5YD-g9XC4pcs1ck=.ddf8bb17-817b-494f-8514-eb399b943038@github.com>

On Wed, 31 Jul 2024 00:30:05 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Trivial cleanup of long unused code.
>> 
>> Thanks.
>
> Good.

Thanks for the review @vnkozlov

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20396#issuecomment-2259436253

From dholmes at openjdk.org  Wed Jul 31 01:01:49 2024
From: dholmes at openjdk.org (David Holmes)
Date: Wed, 31 Jul 2024 01:01:49 GMT
Subject: Integrated: 8337515: JVM_DumpAllStacks is dead code
In-Reply-To: <bj6I0sIWwBuxo-YC7xW59uo-mEN1RByejEDgf2nKa6w=.8655fc0c-6264-4af1-be7b-02966460e3f4@github.com>
References: <bj6I0sIWwBuxo-YC7xW59uo-mEN1RByejEDgf2nKa6w=.8655fc0c-6264-4af1-be7b-02966460e3f4@github.com>
Message-ID: <l17smZve1znr7ezNHLNjv3-ScmzFzIA5OATXYBMD3xs=.18ed23c4-c56a-4828-8523-9ad9317b2519@github.com>

On Tue, 30 Jul 2024 23:03:43 GMT, David Holmes <dholmes at openjdk.org> wrote:

> Trivial cleanup of long unused code.
> 
> Thanks.

This pull request has now been integrated.

Changeset: 1c6fef8f
Author:    David Holmes <dholmes at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/1c6fef8f1cd12b26de9d31799a6516ce4399313f
Stats:     11 lines in 2 files changed: 0 ins; 11 del; 0 mod

8337515: JVM_DumpAllStacks is dead code

Reviewed-by: kvn

-------------

PR: https://git.openjdk.org/jdk/pull/20396

From asmehra at openjdk.org  Wed Jul 31 01:38:44 2024
From: asmehra at openjdk.org (Ashutosh Mehra)
Date: Wed, 31 Jul 2024 01:38:44 GMT
Subject: RFR: 8337031: Improvements to CompilationMemoryStatistic
In-Reply-To: <yqXqOJCnOBQToVnGiTvMv9SRVECCZuArbWqfiVEj6VE=.eb63b66f-63a5-4c51-8757-87f2694afd98@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
 <yqXqOJCnOBQToVnGiTvMv9SRVECCZuArbWqfiVEj6VE=.eb63b66f-63a5-4c51-8757-87f2694afd98@github.com>
Message-ID: <rC-wpaU8uWZFb1Eq185T4yfXrH4C_Oea9qHRJMgnnf0=.47602ddd-f226-493f-8b39-57408dfc2daf@github.com>

On Wed, 24 Jul 2024 10:45:05 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031)
>> 
>> Testing:
>>   test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java
>>   test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java
>
> I plan to look at this later this week.

thanks @tstuefe @vnkozlov for reviewing and testing

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20304#issuecomment-2259466393

From asmehra at openjdk.org  Wed Jul 31 01:38:45 2024
From: asmehra at openjdk.org (Ashutosh Mehra)
Date: Wed, 31 Jul 2024 01:38:45 GMT
Subject: Integrated: 8337031: Improvements to CompilationMemoryStatistic
In-Reply-To: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
References: <H5B7Rup6aiEiiRC56wq4H5zfB8_jq2NF8be2ei-9dDs=.e89fe689-128d-4174-bce8-d6774332c7ba@github.com>
Message-ID: <V9CEV-z2FNrqekAnMk8JPDsz6RHRRPZB2sXPhITAkio=.f159a6fa-f1b5-4f5f-a16a-e1e6e1119732@github.com>

On Tue, 23 Jul 2024 21:46:50 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote:

> Some minor improvements to CompilationMemoryStatistic. More details are in [JDK-8337031](https://bugs.openjdk.org/browse/JDK-8337031)
> 
> Testing:
>   test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java
>   test/hotspot/jtreg/serviceability/dcmd/compiler/CompilerMemoryStatisticTest.java

This pull request has now been integrated.

Changeset: e63d0165
Author:    Ashutosh Mehra <asmehra at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/e63d01654e0b702b3a8c0c4de97a6bb6893fbd1f
Stats:     182 lines in 6 files changed: 84 ins; 27 del; 71 mod

8337031: Improvements to CompilationMemoryStatistic

Reviewed-by: kvn, stuefe

-------------

PR: https://git.openjdk.org/jdk/pull/20304

From kbarrett at openjdk.org  Wed Jul 31 01:56:31 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Wed, 31 Jul 2024 01:56:31 GMT
Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in
 prims code
In-Reply-To: <TqktOSyEh_Ox9Ex39pPlxyn_LBy3msYU05uizfb4tHc=.8d065351-2145-46d2-bc18-203cf3be6865@github.com>
References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
 <Em7Cdv0NCUyAnZtjOQQaTNJleGEvIi4mWAFAuUVCz24=.4eebb188-2eaa-484e-8f5d-557ac99fd67d@github.com>
 <GHKiuJLY8J-ixCnxqrGAOyAJm0wdZUOGI6sbioUCNS8=.45d5fe4b-e059-4473-885f-ade0efae9cb5@github.com>
 <TqktOSyEh_Ox9Ex39pPlxyn_LBy3msYU05uizfb4tHc=.8d065351-2145-46d2-bc18-203cf3be6865@github.com>
Message-ID: <sQbWWUWpc4CwcM-ufoIrVblSxPAjH1dBTLXDYheqo2U=.a3505ba7-fedc-4699-b60e-c0adc37f34c9@github.com>

On Tue, 30 Jul 2024 11:51:37 GMT, Julian Waters <jwaters at openjdk.org> wrote:

>> Now that we have, and are using, `[[noreturn]]` on all platforms, we no longer need that dead code.
>
> I'll admit, I do prefer having a return to end all possible control flows in a non void method, but oh well

I would rather it not look like it can return null (or some other manufactured "default") when it actually can't.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20385#discussion_r1697794269

From kbarrett at openjdk.org  Wed Jul 31 02:07:42 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Wed, 31 Jul 2024 02:07:42 GMT
Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in
 prims code
In-Reply-To: <oftVH8FweQHlfgduKuMopunOkTBQ30f8s0j5dB0AnQo=.6ab59e3a-906c-41d9-93c1-d614209531e9@github.com>
References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
 <oftVH8FweQHlfgduKuMopunOkTBQ30f8s0j5dB0AnQo=.6ab59e3a-906c-41d9-93c1-d614209531e9@github.com>
Message-ID: <4vjtLlNagS_WsEeSPaVUHZJsItLG-WvdHxIFTFnEvgk=.1a631179-543a-414e-ba8e-5c72a2f3c976@github.com>

On Tue, 30 Jul 2024 10:19:59 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> All right, this looks fine. (I am somewhat allergic to `{}` syntax, but it is what it is.)

The hoops one had to go through to get guaranteed value-initialization before we had brace initialization are really
not pretty. See
https://www.boost.org/doc/libs/1_85_0/libs/utility/doc/html/utility/utilities/value_init.html
and its associated implementation.

It might help if we were to commit to using direct brace initialization whenever appropriate, but that hasn't happened.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20385#issuecomment-2259500102

From aph at openjdk.org  Wed Jul 31 06:31:15 2024
From: aph at openjdk.org (Andrew Haley)
Date: Wed, 31 Jul 2024 06:31:15 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v11]
In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
Message-ID: <ZjZKhf-9TRmDtCdy22FudPupipyCsHsdXEEgE-P-nu8=.b514d4ec-6305-4b96-90cf-393b198a2a5d@github.com>

> This patch expands the use of a hash table for secondary superclasses
> to the interpreter, C1, and runtime. It also adds a C2 implementation
> of hashed lookup in cases where the superclass isn't known at compile
> time.
> 
> HotSpot shared runtime
> ----------------------
> 
> Building hashed secondary tables is now unconditional. It takes very
> little time, and now that the shared runtime always has the tables, it
> might as well take advantage of them. The shared code is easier to
> follow now, I think.
> 
> There might be a performance issue with x86-64 in that we build
> HotSpot for a default x86-64 target that does not support popcount.
> This means that HotSpot C++ runtime on x86 always uses a software
> emulation for popcount, even though the vast majority of machines made
> for the past 20 years can do popcount in a single instruction. It
> wouldn't be terribly hard to do something about that.
> 
> Having said that, the software popcount is really not bad.
> 
> x86
> ---
> 
> x86 is rather tricky, because we still support
> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
> well as 32- and 64-bit ports. There's some further complication in
> that only `RCX` can be used as a shift count, so there's some register
> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
> rather gnarly, with multiple levels of conditionals at compile time
> and runtime.
> 
> AArch64
> -------
> 
> AArch64 is considerably more straightforward. We always have a
> popcount instruction and (thankfully) no 32-bit code to worry about.
> 
> Generally
> ---------
> 
> I would dearly love simply to rip out the "old" secondary supers cache
> support, but I've left it in just in case someone has a performance
> regression.
> 
> The versions of `MacroAssembler::lookup_secondary_supers_table` that
> work with variable superclasses don't take a fixed set of temp
> registers, and neither do they call out to to a slow path subroutine.
> Instead, the slow patch is expanded inline.
> 
> I don't think this is necessarily bad. Apart from the very rare cases
> where C2 can't determine the superclass to search for at compile time,
> this code is only used for generating stubs, and it seemed to me
> ridiculous to have stubs calling other stubs.
> 
> I've followed the guidance from @iwanowww not to obsess too much about
> the performance of C1-compiled secondary supers lookups, and to prefer
> simplicity over absolute performance. Nonetheless, this is a
> complicated patch that touches many areas.

Andrew Haley has updated the pull request incrementally with three additional commits since the last revision:

 - Review comments
 - Experiment: test bitmap upfront.
 - Experiment: test bitmap upfront.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19989/files
  - new: https://git.openjdk.org/jdk/pull/19989/files/329f487a..5cca1cc2

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=10
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=09-10

  Stats: 26 lines in 4 files changed: 5 ins; 10 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/19989.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989

PR: https://git.openjdk.org/jdk/pull/19989

From aph at openjdk.org  Wed Jul 31 06:45:05 2024
From: aph at openjdk.org (Andrew Haley)
Date: Wed, 31 Jul 2024 06:45:05 GMT
Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and
 interpreter [v12]
In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com>
Message-ID: <IgSEghEHB3YOa1xbX84bNChiSEr7V9hRXPdHP2Q8RQI=.d9f881f3-3693-4815-bf0b-e2393eced10d@github.com>

> This patch expands the use of a hash table for secondary superclasses
> to the interpreter, C1, and runtime. It also adds a C2 implementation
> of hashed lookup in cases where the superclass isn't known at compile
> time.
> 
> HotSpot shared runtime
> ----------------------
> 
> Building hashed secondary tables is now unconditional. It takes very
> little time, and now that the shared runtime always has the tables, it
> might as well take advantage of them. The shared code is easier to
> follow now, I think.
> 
> There might be a performance issue with x86-64 in that we build
> HotSpot for a default x86-64 target that does not support popcount.
> This means that HotSpot C++ runtime on x86 always uses a software
> emulation for popcount, even though the vast majority of machines made
> for the past 20 years can do popcount in a single instruction. It
> wouldn't be terribly hard to do something about that.
> 
> Having said that, the software popcount is really not bad.
> 
> x86
> ---
> 
> x86 is rather tricky, because we still support
> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
> well as 32- and 64-bit ports. There's some further complication in
> that only `RCX` can be used as a shift count, so there's some register
> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
> rather gnarly, with multiple levels of conditionals at compile time
> and runtime.
> 
> AArch64
> -------
> 
> AArch64 is considerably more straightforward. We always have a
> popcount instruction and (thankfully) no 32-bit code to worry about.
> 
> Generally
> ---------
> 
> I would dearly love simply to rip out the "old" secondary supers cache
> support, but I've left it in just in case someone has a performance
> regression.
> 
> The versions of `MacroAssembler::lookup_secondary_supers_table` that
> work with variable superclasses don't take a fixed set of temp
> registers, and neither do they call out to to a slow path subroutine.
> Instead, the slow patch is expanded inline.
> 
> I don't think this is necessarily bad. Apart from the very rare cases
> where C2 can't determine the superclass to search for at compile time,
> this code is only used for generating stubs, and it seemed to me
> ridiculous to have stubs calling other stubs.
> 
> I've followed the guidance from @iwanowww not to obsess too much about
> the performance of C1-compiled secondary supers lookups, and to prefer
> simplicity over absolute performance. Nonetheless, this is a
> complicated patch that touches many areas.

Andrew Haley has updated the pull request incrementally with three additional commits since the last revision:

 - Merge branch 'JDK-8331658-work' of https://github.com/theRealAph/jdk into JDK-8331658-work
 - Fix AArch64
 - Review comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19989/files
  - new: https://git.openjdk.org/jdk/pull/19989/files/5cca1cc2..2769d9e7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=11
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=10-11

  Stats: 4 lines in 2 files changed: 0 ins; 1 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/19989.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989

PR: https://git.openjdk.org/jdk/pull/19989

From clanger at openjdk.org  Wed Jul 31 07:22:41 2024
From: clanger at openjdk.org (Christoph Langer)
Date: Wed, 31 Jul 2024 07:22:41 GMT
Subject: RFR: 8333144: docker tests do not work when ubsan is configured
 [v3]
In-Reply-To: <WOwcaWSeF_X020nBqsY6rs7STGxZmZVuZAyeA3nt1Tg=.a16acf38-8c6d-429a-b184-8c5c04ac9ceb@github.com>
References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
 <WOwcaWSeF_X020nBqsY6rs7STGxZmZVuZAyeA3nt1Tg=.a16acf38-8c6d-429a-b184-8c5c04ac9ceb@github.com>
Message-ID: <mc7jusNH20OOjMvwVhpsetjm0A1li1OjUlccr7qiOxc=.45faf772-7e8b-42ab-a7e7-a7aef5833cac@github.com>

On Tue, 30 Jul 2024 13:26:45 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work.
>> 
>> We find this in the test output
>> 
>> [STDOUT]
>> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory
>> 
>> The container where the test is executed does not contain the ubsan package; we might skip the test in this case.
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   install libubsan1 into test container

I think adding libubsan1 to the test container is the best way to go. If it cannot be made conditional on ubsan builds then be it so. Then the Whitebox changes should be removed obviously.

test/hotspot/jtreg/containers/docker/DockerBasicTest.java line 34:

> 32:  *          jdk.jartool/sun.tools.jar
> 33:  * @build HelloDocker
> 34:  * @run driver DockerBasicTest

you remove @build and @run directives from the test - probably not desired.

-------------

Changes requested by clanger (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19907#pullrequestreview-2209366794
PR Review Comment: https://git.openjdk.org/jdk/pull/19907#discussion_r1698020822

From sspitsyn at openjdk.org  Wed Jul 31 08:21:34 2024
From: sspitsyn at openjdk.org (Serguei Spitsyn)
Date: Wed, 31 Jul 2024 08:21:34 GMT
Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in
 prims code
In-Reply-To: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
Message-ID: <Ykmc8f7ocECyFzqaoQLO7uOx4yio6cqTR8-l-KA8nCk=.8dbfdc5a-bc55-4db9-b580-98b171d9600a@github.com>

On Tue, 30 Jul 2024 04:12:33 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this change that removes some uses of literal 0 as a null
> pointer constant in prims code.
> 
> Testing: mach5 tier1

Looks good. Thank you for fixing this!
The `ResultType ret{};` syntax is a little bit unusual but I'm okay with that. :)

-------------

Marked as reviewed by sspitsyn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20385#pullrequestreview-2209490825

From sspitsyn at openjdk.org  Wed Jul 31 08:27:31 2024
From: sspitsyn at openjdk.org (Serguei Spitsyn)
Date: Wed, 31 Jul 2024 08:27:31 GMT
Subject: RFR: 8331015: Obsolete -XX:+UseNotificationThread
In-Reply-To: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com>
References: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com>
Message-ID: <EjVshUcAGazHU4sYDhPK0JZJIJAHpOChL7qHhistaqM=.bec6c786-4687-4794-bcda-ffefc9a8b832@github.com>

On Tue, 30 Jul 2024 01:57:33 GMT, Alex Menkov <amenkov at openjdk.org> wrote:

> Obsolete UseNotificationThread flag which was deprecated in JDK 23.
> 
> Testing: tier1..tier5

Marked as reviewed by sspitsyn (Reviewer).

Looks good.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20381#pullrequestreview-2209502256
PR Review: https://git.openjdk.org/jdk/pull/20381#pullrequestreview-2209502965

From ayang at openjdk.org  Wed Jul 31 11:32:00 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Wed, 31 Jul 2024 11:32:00 GMT
Subject: RFR: 8337546: Remove unused GCCause::_adaptive_size_policy
Message-ID: <R02fNqBAuGz5WSD5tVAonh0GB2j7Apo1j24MbRjPArA=.66527688-96e6-4fa5-aa28-c82af17efc0b@github.com>

Trivial removing an unused gc-cause; it was previously used by Parallel only.

-------------

Commit messages:
 - remove-gc-cause

Changes: https://git.openjdk.org/jdk/pull/20403/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20403&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8337546
  Stats: 13 lines in 4 files changed: 0 ins; 11 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20403.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20403/head:pull/20403

PR: https://git.openjdk.org/jdk/pull/20403

From coleenp at openjdk.org  Wed Jul 31 12:26:33 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Wed, 31 Jul 2024 12:26:33 GMT
Subject: RFR: 8331015: Obsolete -XX:+UseNotificationThread
In-Reply-To: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com>
References: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com>
Message-ID: <steBn0CiWBFUT9xZOgikAo_W_krsGNUtjmw8uY4qF-Y=.d67294ef-2fd8-43c0-adf4-96c8f43d1d58@github.com>

On Tue, 30 Jul 2024 01:57:33 GMT, Alex Menkov <amenkov at openjdk.org> wrote:

> Obsolete UseNotificationThread flag which was deprecated in JDK 23.
> 
> Testing: tier1..tier5

Thank you for doing this.

-------------

Marked as reviewed by coleenp (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20381#pullrequestreview-2210020383

From tschatzl at openjdk.org  Wed Jul 31 13:12:33 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Wed, 31 Jul 2024 13:12:33 GMT
Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate [v3]
In-Reply-To: <F7o8T0rJbkysjNnN-pcQuRNXYfvRM1EyHrPSKBVcQ0Q=.79de0f66-f206-4e2a-ba1c-9d0f06bd025e@github.com>
References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com>
 <F7o8T0rJbkysjNnN-pcQuRNXYfvRM1EyHrPSKBVcQ0Q=.79de0f66-f206-4e2a-ba1c-9d0f06bd025e@github.com>
Message-ID: <WmFingximOMyc7ZwJfBYrTAd9rzbvgMVFeUnCV4r2tM=.83b7ac15-4e91-40c1-a39c-20762d0f270e@github.com>

On Thu, 25 Jul 2024 07:44:45 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> Simple obsoleting a Parallel GC product flag.
>
> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   copyright

Marked as reviewed by tschatzl (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20299#pullrequestreview-2210127266

From kbarrett at openjdk.org  Wed Jul 31 13:14:36 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Wed, 31 Jul 2024 13:14:36 GMT
Subject: RFR: 8337418: Fix -Wzero-as-null-pointer-constant warnings in
 prims code
In-Reply-To: <QctZn5PDD3uluZJ-W_CG3Ffo0f02PsY7Zlx5neUOICQ=.7ca80211-e028-4553-87f5-f27f17d903ea@github.com>
References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
 <QctZn5PDD3uluZJ-W_CG3Ffo0f02PsY7Zlx5neUOICQ=.7ca80211-e028-4553-87f5-f27f17d903ea@github.com>
Message-ID: <PJRhnn2ICjJpBJcqrQevZe42VSf5E4VgnpJTnuUCLHg=.814e73d4-977c-4d29-84a6-e64242d74bbe@github.com>

On Tue, 30 Jul 2024 07:54:43 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Please review this change that removes some uses of literal 0 as a null
>> pointer constant in prims code.
>> 
>> Testing: mach5 tier1
>
> Okay - looks good. Thanks.

Thanks for reviews @dholmes-ora , @shipilev , @TheShermanTanker , and @sspitsyn .

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20385#issuecomment-2260494132

From kbarrett at openjdk.org  Wed Jul 31 13:18:46 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Wed, 31 Jul 2024 13:18:46 GMT
Subject: Integrated: 8337418: Fix -Wzero-as-null-pointer-constant warnings in
 prims code
In-Reply-To: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
References: <yVCkVKo8tL4ijPwZ4-gztAP1j8wBMyn09t0ya9hrwww=.8a3ad992-d15c-49fe-8f73-a72a8f248332@github.com>
Message-ID: <XfzbhHQy-zY2flp7kgZ6beSFVJiVzOJL-JYFkn41ZQ4=.d6448dc0-5f8c-490e-91de-6851dc6228d8@github.com>

On Tue, 30 Jul 2024 04:12:33 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this change that removes some uses of literal 0 as a null
> pointer constant in prims code.
> 
> Testing: mach5 tier1

This pull request has now been integrated.

Changeset: 07dd7250
Author:    Kim Barrett <kbarrett at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/07dd725025a54035436a76ac4c0a8bb2b12e264a
Stats:     26 lines in 7 files changed: 0 ins; 3 del; 23 mod

8337418: Fix -Wzero-as-null-pointer-constant warnings in prims code

Reviewed-by: dholmes, shade, jwaters, sspitsyn

-------------

PR: https://git.openjdk.org/jdk/pull/20385

From tschatzl at openjdk.org  Wed Jul 31 13:29:32 2024
From: tschatzl at openjdk.org (Thomas Schatzl)
Date: Wed, 31 Jul 2024 13:29:32 GMT
Subject: RFR: 8337546: Remove unused GCCause::_adaptive_size_policy
In-Reply-To: <R02fNqBAuGz5WSD5tVAonh0GB2j7Apo1j24MbRjPArA=.66527688-96e6-4fa5-aa28-c82af17efc0b@github.com>
References: <R02fNqBAuGz5WSD5tVAonh0GB2j7Apo1j24MbRjPArA=.66527688-96e6-4fa5-aa28-c82af17efc0b@github.com>
Message-ID: <vWWWgQbg3HbkjWXGT3hq1wj-wR9xWYAn9X2Yndd3Dc8=.1da20e11-5c7b-4989-9a01-7c5d04e8840e@github.com>

On Wed, 31 Jul 2024 11:25:50 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

> Trivial removing an unused gc-cause; it was previously used by Parallel only.

Good.

-------------

Marked as reviewed by tschatzl (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20403#pullrequestreview-2210168192

From kbarrett at openjdk.org  Wed Jul 31 13:29:33 2024
From: kbarrett at openjdk.org (Kim Barrett)
Date: Wed, 31 Jul 2024 13:29:33 GMT
Subject: RFR: 8337546: Remove unused GCCause::_adaptive_size_policy
In-Reply-To: <R02fNqBAuGz5WSD5tVAonh0GB2j7Apo1j24MbRjPArA=.66527688-96e6-4fa5-aa28-c82af17efc0b@github.com>
References: <R02fNqBAuGz5WSD5tVAonh0GB2j7Apo1j24MbRjPArA=.66527688-96e6-4fa5-aa28-c82af17efc0b@github.com>
Message-ID: <5oVJOB_OmdBtra0NImrqPvDCfneIQJnN7dW6YyOx3zc=.608a67f3-ad10-4946-9372-da4355862e45@github.com>

On Wed, 31 Jul 2024 11:25:50 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

> Trivial removing an unused gc-cause; it was previously used by Parallel only.

Looks good, and trivial.

-------------

Marked as reviewed by kbarrett (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20403#pullrequestreview-2210171121

From mbaesken at openjdk.org  Wed Jul 31 14:07:46 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Wed, 31 Jul 2024 14:07:46 GMT
Subject: RFR: 8333144: docker tests do not work when ubsan is configured
 [v4]
In-Reply-To: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
Message-ID: <WY7JVdGq2cEFGFQ5cypCV_cjVNlp73ZWVkjkr2kpzxA=.a67f045b-8fc4-4426-8ac7-a513942731ed@github.com>

> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work.
> 
> We find this in the test output
> 
> [STDOUT]
> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory
> 
> The container where the test is executed does not contain the ubsan package; we might skip the test in this case.

Matthias Baesken has updated the pull request incrementally with two additional commits since the last revision:

 - remove method from WhiteBox.java
 - remove WB_isUbsanEnabled, fix test

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19907/files
  - new: https://git.openjdk.org/jdk/pull/19907/files/4a792430..c4c3fdbe

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19907&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19907&range=02-03

  Stats: 15 lines in 3 files changed: 2 ins; 11 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/19907.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19907/head:pull/19907

PR: https://git.openjdk.org/jdk/pull/19907

From mbaesken at openjdk.org  Wed Jul 31 14:07:46 2024
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Wed, 31 Jul 2024 14:07:46 GMT
Subject: RFR: 8333144: docker tests do not work when ubsan is configured
 [v3]
In-Reply-To: <WOwcaWSeF_X020nBqsY6rs7STGxZmZVuZAyeA3nt1Tg=.a16acf38-8c6d-429a-b184-8c5c04ac9ceb@github.com>
References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
 <WOwcaWSeF_X020nBqsY6rs7STGxZmZVuZAyeA3nt1Tg=.a16acf38-8c6d-429a-b184-8c5c04ac9ceb@github.com>
Message-ID: <mNj85L8m-PP8I0YzYN3m_Fg25COYN_06eexKsUhSBUQ=.ee5974bc-dee8-4849-8285-29c5d734805a@github.com>

On Tue, 30 Jul 2024 13:26:45 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work.
>> 
>> We find this in the test output
>> 
>> [STDOUT]
>> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory
>> 
>> The container where the test is executed does not contain the ubsan package; we might skip the test in this case.
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   install libubsan1 into test container

I removed the WhiteBox stuff.
Maybe David could give the change a try in the Oracle CI if that's possible ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19907#issuecomment-2260608472

From gcao at openjdk.org  Wed Jul 31 14:20:39 2024
From: gcao at openjdk.org (Gui Cao)
Date: Wed, 31 Jul 2024 14:20:39 GMT
Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191
 [v4]
In-Reply-To: <qkkur4f8aYsaOYftYIA67lXS3sFtoYJGjIonPoGsD4s=.d26a6434-2fd2-4f19-b355-3b6cfdb1fa49@github.com>
References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
 <qkkur4f8aYsaOYftYIA67lXS3sFtoYJGjIonPoGsD4s=.d26a6434-2fd2-4f19-b355-3b6cfdb1fa49@github.com>
Message-ID: <6Hb5Gtgjtpb9cZyyvCsHABv4X7kc2M-GT7ugPKFxcdw=.f76d7b81-cfff-46ee-b8ba-4b1d183a9fb5@github.com>

On Tue, 30 Jul 2024 08:48:11 GMT, Gui Cao <gcao at openjdk.org> wrote:

>> Hi, please help review this patch that fix the client VM build failed for riscv.
>> 
>> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421)
>> 
>> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro.  And made related adjustments in VM_Version::c2_initialize().
>> 
>> ### Testing
>> - [x] linux-riscv client VM fastdebug native build
>
> Gui Cao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Polishing

Thanks all for the review.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20386#issuecomment-2260636217

From duke at openjdk.org  Wed Jul 31 14:20:39 2024
From: duke at openjdk.org (duke)
Date: Wed, 31 Jul 2024 14:20:39 GMT
Subject: RFR: 8337421: RISC-V: client VM build failure after JDK-8335191
 [v4]
In-Reply-To: <qkkur4f8aYsaOYftYIA67lXS3sFtoYJGjIonPoGsD4s=.d26a6434-2fd2-4f19-b355-3b6cfdb1fa49@github.com>
References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
 <qkkur4f8aYsaOYftYIA67lXS3sFtoYJGjIonPoGsD4s=.d26a6434-2fd2-4f19-b355-3b6cfdb1fa49@github.com>
Message-ID: <Z4P_mdHz2QKlHfnjwVdGWQHbxvdUEIC5Y1gwST99E64=.9e07730b-2d52-48d2-8536-2166d9580ccc@github.com>

On Tue, 30 Jul 2024 08:48:11 GMT, Gui Cao <gcao at openjdk.org> wrote:

>> Hi, please help review this patch that fix the client VM build failed for riscv.
>> 
>> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421)
>> 
>> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro.  And made related adjustments in VM_Version::c2_initialize().
>> 
>> ### Testing
>> - [x] linux-riscv client VM fastdebug native build
>
> Gui Cao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Polishing

@zifeihan 
Your change (at version edf16e07daf5bf644afb7bc0111e8ddb9ff32ffe) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20386#issuecomment-2260637846

From gcao at openjdk.org  Wed Jul 31 14:46:37 2024
From: gcao at openjdk.org (Gui Cao)
Date: Wed, 31 Jul 2024 14:46:37 GMT
Subject: Integrated: 8337421: RISC-V: client VM build failure after JDK-8335191
In-Reply-To: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
References: <OzO21iwlaFanOXHsKREA_9VdX9fFo-KPm1LXpz1Dgdc=.21c067cb-5337-4f7a-8ab9-638872da22df@github.com>
Message-ID: <fvrLYt8b3LHEAPyCO4jM7rznhJIKuTKC58r7hzWiKw8=.5417b931-a3a2-4909-976c-d4d95a6208a5@github.com>

On Tue, 30 Jul 2024 05:41:45 GMT, Gui Cao <gcao at openjdk.org> wrote:

> Hi, please help review this patch that fix the client VM build failed for riscv.
> 
> Error log for client VM build to see: [JDK-8337421](https://bugs.openjdk.org/browse/JDK-8337421)
> 
> The root cause is that MaxVectorSize is defined in COMPILER2 or JVMCI, which is not included in client mode. In addition to this, I have adjusted the functions related to initialization using UseSHA256Intrinsics, UseSHA512Intrinsics, UseMD5Intrinsics, UseChaCha20Intrinsics, UseSHA1Intrinsics, UseAdler32Intrinsics to be under the control of the COMPILER2 macro.  And made related adjustments in VM_Version::c2_initialize().
> 
> ### Testing
> - [x] linux-riscv client VM fastdebug native build

This pull request has now been integrated.

Changeset: 7121d71b
Author:    Gui Cao <gcao at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/7121d71b516b415c7c11e3643731cd32d4057aa6
Stats:     207 lines in 3 files changed: 103 ins; 100 del; 4 mod

8337421: RISC-V: client VM build failure after JDK-8335191

Reviewed-by: fyang, mli

-------------

PR: https://git.openjdk.org/jdk/pull/20386

From shade at openjdk.org  Wed Jul 31 14:48:36 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Wed, 31 Jul 2024 14:48:36 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2]
In-Reply-To: <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
 <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com>
Message-ID: <NF4pT2sfZgR7LUZ8E9679vvcsjAFmF3kalBlJ55NJLE=.5c53a13f-4d6e-4161-92c6-60bfd620fba1@github.com>

On Mon, 22 Jul 2024 08:49:12 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> See bug for more discussion.
>> 
>> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
>> 
>> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
>> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
>> 
>> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
>> 
>> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
>> 
>> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
>> 
>> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
>> 
>> Additional testing:
>>  - [x] New IR tests
>>  - [x] Linux x86_64 server fastdebug, `all`
>>  - [x] Linux AArch64 server fastdebug, `all`
>>  - [x...
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
> 
>  - Merge branch 'master' into JDK-8333791-stable-field-barrier
>  - Variant 2: Only final-field like semantics for stable inits
>  - Variant 3: Handle everything, including reads by compilers

...and still looking for formal reviews, pretty please :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19635#issuecomment-2260695675

From ayang at openjdk.org  Wed Jul 31 16:26:36 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Wed, 31 Jul 2024 16:26:36 GMT
Subject: RFR: 8337027: Parallel: Obsolete BaseFootPrintEstimate [v3]
In-Reply-To: <F7o8T0rJbkysjNnN-pcQuRNXYfvRM1EyHrPSKBVcQ0Q=.79de0f66-f206-4e2a-ba1c-9d0f06bd025e@github.com>
References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com>
 <F7o8T0rJbkysjNnN-pcQuRNXYfvRM1EyHrPSKBVcQ0Q=.79de0f66-f206-4e2a-ba1c-9d0f06bd025e@github.com>
Message-ID: <EDNGq-8NnG8qrhXzN4Q5qxAVx4ru1UY_91xCy85X5F0=.e02c6ddc-1d6e-467c-91c2-fc6b3f0ca443@github.com>

On Thu, 25 Jul 2024 07:44:45 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> Simple obsoleting a Parallel GC product flag.
>
> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   copyright

Thanks for review.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20299#issuecomment-2260902728

From ayang at openjdk.org  Wed Jul 31 16:26:36 2024
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Wed, 31 Jul 2024 16:26:36 GMT
Subject: Integrated: 8337027: Parallel: Obsolete BaseFootPrintEstimate
In-Reply-To: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com>
References: <wULp2EAECh8W75aA83GCDEq9GzldQzBwwe16SqY6phk=.902d4251-a271-4575-8ac3-4f2224ca453c@github.com>
Message-ID: <SWLP_jvgK4lPKKvkSIImHz15Nxjd1yFQuV8yT5C2vmI=.fa6b3390-5d1f-4709-91da-7751dfff5840@github.com>

On Tue, 23 Jul 2024 14:11:20 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

> Simple obsoleting a Parallel GC product flag.

This pull request has now been integrated.

Changeset: e4c7850c
Author:    Albert Mingkun Yang <ayang at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/e4c7850c177899a5da6f5050cb0647a6e1a75d31
Stats:     34 lines in 7 files changed: 2 ins; 25 del; 7 mod

8337027: Parallel: Obsolete BaseFootPrintEstimate

Reviewed-by: tschatzl, kbarrett

-------------

PR: https://git.openjdk.org/jdk/pull/20299

From coleenp at openjdk.org  Wed Jul 31 18:40:56 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Wed, 31 Jul 2024 18:40:56 GMT
Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive
Message-ID: <qzky8BLyBVaSJmI00_ovSHZQxyudEMAGbnF6DHa97MI=.475e81bd-d939-451d-8960-a1afbc6c2126@github.com>

How does this rename look?  Instead of ClassLoaderData::keep_alive() and a _keep_alive refcount, it's been renamed to _strongly_reachable and is_strongly_reachable().
Tested with tier1 on Oracle supported platforms.

-------------

Commit messages:
 - Fix indent.
 - 8335059: Consider renaming ClassLoaderData::keep_alive

Changes: https://git.openjdk.org/jdk/pull/20408/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20408&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335059
  Stats: 37 lines in 10 files changed: 2 ins; 0 del; 35 mod
  Patch: https://git.openjdk.org/jdk/pull/20408.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20408/head:pull/20408

PR: https://git.openjdk.org/jdk/pull/20408

From vlivanov at openjdk.org  Wed Jul 31 18:52:36 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 31 Jul 2024 18:52:36 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2]
In-Reply-To: <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
 <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com>
Message-ID: <Gx0ZMYP1XcwD2T53qKz2Ww52IIYWbo8PxJlH0GUNDnM=.7a1fe94d-51c9-4ab0-a04b-072dd33fc2a6@github.com>

On Mon, 22 Jul 2024 08:49:12 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> See bug for more discussion.
>> 
>> Currently, C2 puts a `Release` barrier at exit of _every_ method that writes a `@Stable` field. This is a problem for high-performance code that initializes the stable field like this: https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/Enum.java#L182-L193
>> 
>> A more egregious example is here, which means that every `String` constructor actually does `Release` barrier for `@Stable` field write, while only a `StoreStore` for `final` field store would suffice:
>> https://github.com/openjdk/jdk/blob/79a23017fc7154738c375fbb12a997525c3bf9e7/src/java.base/share/classes/java/lang/String.java#L159-L160
>> 
>> AFAICS, the original intent for Release barrier in constructor for stable fields was to match the memory semantics of final fields better. `@Stable` are in some sense "super-finals": they are foldable like static finals or non-static trusted finals, but can be written anywhere. The `@Stable` machinery is intrinsically safe under races: either a compiler sees a component of stable subgraph in initialized state and folds it, or it sees a default value for the component and leaves it alone.
>> 
>> I [performed an audit](https://bugs.openjdk.org/browse/JDK-8333791?focusedId=14688000&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14688000) of current `@Stable` uses for fields that are not currently `final` or `volatile`, and there are cases where we write into `@Stable` fields in constructors. AFAICS, they are covered by final-field-like semantics by accident of having adjacent `final` fields.
>> 
>> Current PR implements Variant 2 from the discussion: makes sure stable fields are as memory-safe as finals, and that's it. I believe this is all-around a good compromise for both mainline and the backports: the performance is improved in one the path that matter, and we still have some safety margin in face of accidental removals of adjacent `final`-s, or in case I missed some spots during the audit.
>> 
>> C1 did not do anything special for `@Stable` fields at all, fixed those to match C2. Both Zero and template interpreters for non-TSO arches put barriers at every `return` (with notable exception of [ARM32](https://bugs.openjdk.org/browse/JDK-8333957)), which handles everything in an overkill manner.
>> 
>> Additional testing:
>>  - [x] New IR tests
>>  - [x] Linux x86_64 server fastdebug, `all`
>>  - [x] Linux AArch64 server fastdebug, `all`
>>  - [x...
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
> 
>  - Merge branch 'master' into JDK-8333791-stable-field-barrier
>  - Variant 2: Only final-field like semantics for stable inits
>  - Variant 3: Handle everything, including reads by compilers

src/hotspot/share/runtime/globals.hpp line 1997:

> 1995:           "Use a terrible hash function in order to generate many collisions.") \
> 1996:                                                                             \
> 1997:   develop(bool, RestrictStable, true,                                       \

What's the use case for the flag? Solely for testing purposes (since it's develop)? 
Alternatively, you could place the test classes on boot class path and enable test execution with product binaries.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1698961280

From shade at openjdk.org  Wed Jul 31 18:58:36 2024
From: shade at openjdk.org (Aleksey Shipilev)
Date: Wed, 31 Jul 2024 18:58:36 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2]
In-Reply-To: <Gx0ZMYP1XcwD2T53qKz2Ww52IIYWbo8PxJlH0GUNDnM=.7a1fe94d-51c9-4ab0-a04b-072dd33fc2a6@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
 <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com>
 <Gx0ZMYP1XcwD2T53qKz2Ww52IIYWbo8PxJlH0GUNDnM=.7a1fe94d-51c9-4ab0-a04b-072dd33fc2a6@github.com>
Message-ID: <Xndb4WvNE5J6vsrSFwbfQ6J0R07GQNw39q_LTLweraU=.d740fb08-9f38-4de5-9518-1db54fc005be@github.com>

On Wed, 31 Jul 2024 18:49:47 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
>> 
>>  - Merge branch 'master' into JDK-8333791-stable-field-barrier
>>  - Variant 2: Only final-field like semantics for stable inits
>>  - Variant 3: Handle everything, including reads by compilers
>
> src/hotspot/share/runtime/globals.hpp line 1997:
> 
>> 1995:           "Use a terrible hash function in order to generate many collisions.") \
>> 1996:                                                                             \
>> 1997:   develop(bool, RestrictStable, true,                                       \
> 
> What's the use case for the flag? Solely for testing purposes (since it's develop)? 
> Alternatively, you could place the test classes on boot class path and enable test execution with product binaries.

Yes, to access the annotation from test, like `RestrictContended` nearby:
https://github.com/openjdk/jdk/blob/97f7c03dd0ff389abefed7ea2a7257bcb42e0754/src/hotspot/share/classfile/classFileParser.cpp#L1960

Not sure if putting test classes on bootclasspath would work well with IR tests that are now running in driver mode. I'd prefer to keep the develop flag and keep tests in driver mode and without additional fluff.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1698965131

From stefank at openjdk.org  Wed Jul 31 19:04:30 2024
From: stefank at openjdk.org (Stefan Karlsson)
Date: Wed, 31 Jul 2024 19:04:30 GMT
Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive
In-Reply-To: <qzky8BLyBVaSJmI00_ovSHZQxyudEMAGbnF6DHa97MI=.475e81bd-d939-451d-8960-a1afbc6c2126@github.com>
References: <qzky8BLyBVaSJmI00_ovSHZQxyudEMAGbnF6DHa97MI=.475e81bd-d939-451d-8960-a1afbc6c2126@github.com>
Message-ID: <DptNYmKH9gv3oyw9FMkTD7xTMIEcztvLNeyhXnTqURA=.af2fce7a-e394-4b87-ae2e-d414bbb8acac@github.com>

On Wed, 31 Jul 2024 18:35:12 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> How does this rename look?  Instead of ClassLoaderData::keep_alive() and a _keep_alive refcount, it's been renamed to _strongly_reachable and is_strongly_reachable().
> Tested with tier1 on Oracle supported platforms.

There's a risk that someone incorrectly interprets:

- _strongly_reachable 0


to mean that the class loader isn't strongly reachable.

In the bug entry I suggested a name `_strong_count` and tried to avoid the word "reachable" because it already have a meaning for the GC. When talking about the "strong" property we both talk about strongly reachable and strong roots. The property for this CLD is that it is a strong root from the GC's perspective. Maybe we can use that instead. What do you think about 
`_strong_root` and `is_strong_root`? Or maybe even `_root` and `is_root`?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20408#issuecomment-2261207888

From vlivanov at openjdk.org  Wed Jul 31 19:28:33 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 31 Jul 2024 19:28:33 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2]
In-Reply-To: <Xndb4WvNE5J6vsrSFwbfQ6J0R07GQNw39q_LTLweraU=.d740fb08-9f38-4de5-9518-1db54fc005be@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
 <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com>
 <Gx0ZMYP1XcwD2T53qKz2Ww52IIYWbo8PxJlH0GUNDnM=.7a1fe94d-51c9-4ab0-a04b-072dd33fc2a6@github.com>
 <Xndb4WvNE5J6vsrSFwbfQ6J0R07GQNw39q_LTLweraU=.d740fb08-9f38-4de5-9518-1db54fc005be@github.com>
Message-ID: <2IdxXlsbkFOF9BnHuiSXm96Fil-4YoA0GCdKOIz2tPE=.c596ab28-a346-44f6-9e80-7ee76a2aa20b@github.com>

On Wed, 31 Jul 2024 18:53:43 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> src/hotspot/share/runtime/globals.hpp line 1997:
>> 
>>> 1995:           "Use a terrible hash function in order to generate many collisions.") \
>>> 1996:                                                                             \
>>> 1997:   develop(bool, RestrictStable, true,                                       \
>> 
>> What's the use case for the flag? Solely for testing purposes (since it's develop)? 
>> Alternatively, you could place the test classes on boot class path and enable test execution with product binaries.
>
> Yes, to access the annotation from test, like `RestrictContended` nearby:
> https://github.com/openjdk/jdk/blob/97f7c03dd0ff389abefed7ea2a7257bcb42e0754/src/hotspot/share/classfile/classFileParser.cpp#L1960
> 
> Not sure if putting test classes on bootclasspath would work well with IR tests that are now running in driver mode. I'd prefer to keep the develop flag and keep tests in driver mode and without additional fluff.

`RestrictContended` and `RestrictReservedStack` are product flags.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1698997608

From vlivanov at openjdk.org  Wed Jul 31 19:28:34 2024
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 31 Jul 2024 19:28:34 GMT
Subject: RFR: 8333791: Fix memory barriers for @Stable fields [v2]
In-Reply-To: <2IdxXlsbkFOF9BnHuiSXm96Fil-4YoA0GCdKOIz2tPE=.c596ab28-a346-44f6-9e80-7ee76a2aa20b@github.com>
References: <evOfIZ9GrX6MWLVfSnEfuEGkJ9kHTZaNFfaPA15ufbk=.3d8f5d66-4728-4de6-8aa1-bafc97ce2fa6@github.com>
 <ZJj4fYHqnd5jkIRau4mSsU409_JidyOnKLTpqbNqoFY=.78a4eb10-1311-4d15-a148-f4e3fec17bd3@github.com>
 <Gx0ZMYP1XcwD2T53qKz2Ww52IIYWbo8PxJlH0GUNDnM=.7a1fe94d-51c9-4ab0-a04b-072dd33fc2a6@github.com>
 <Xndb4WvNE5J6vsrSFwbfQ6J0R07GQNw39q_LTLweraU=.d740fb08-9f38-4de5-9518-1db54fc005be@github.com>
 <2IdxXlsbkFOF9BnHuiSXm96Fil-4YoA0GCdKOIz2tPE=.c596ab28-a346-44f6-9e80-7ee76a2aa20b@github.com>
Message-ID: <oXUvUuFN_ItNrmxvXSQM5JsEKzGndG_SmENEQ44epR4=.19762cb1-ae8e-4b41-a626-a2a23012fa3e@github.com>

On Wed, 31 Jul 2024 19:22:35 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Yes, to access the annotation from test, like `RestrictContended` nearby:
>> https://github.com/openjdk/jdk/blob/97f7c03dd0ff389abefed7ea2a7257bcb42e0754/src/hotspot/share/classfile/classFileParser.cpp#L1960
>> 
>> Not sure if putting test classes on bootclasspath would work well with IR tests that are now running in driver mode. I'd prefer to keep the develop flag and keep tests in driver mode and without additional fluff.
>
> `RestrictContended` and `RestrictReservedStack` are product flags.

I'm not saying that `RestrictStable` should be made product. It was a deliberate decision to limit it only to trusted classes. 

There are existing tests for `@Stable` (under `test/hotspot/jtreg/compiler/stable/`) and they don't require any special assistance from the JVM.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19635#discussion_r1699000542

From coleen.phillimore at oracle.com  Wed Jul 31 19:46:43 2024
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Wed, 31 Jul 2024 15:46:43 -0400
Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive
In-Reply-To: <DptNYmKH9gv3oyw9FMkTD7xTMIEcztvLNeyhXnTqURA=.af2fce7a-e394-4b87-ae2e-d414bbb8acac@github.com>
References: <qzky8BLyBVaSJmI00_ovSHZQxyudEMAGbnF6DHa97MI=.475e81bd-d939-451d-8960-a1afbc6c2126@github.com>
 <DptNYmKH9gv3oyw9FMkTD7xTMIEcztvLNeyhXnTqURA=.af2fce7a-e394-4b87-ae2e-d414bbb8acac@github.com>
Message-ID: <ddea7a55-e60c-4fd7-b6ea-0c08748ad205@oracle.com>



On 7/31/24 3:04 PM, Stefan Karlsson wrote:
> On Wed, 31 Jul 2024 18:35:12 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:
>
>> How does this rename look?  Instead of ClassLoaderData::keep_alive() and a _keep_alive refcount, it's been renamed to _strongly_reachable and is_strongly_reachable().
>> Tested with tier1 on Oracle supported platforms.
> There's a risk that someone incorrectly interprets:
>
> - _strongly_reachable 0
>
>
> to mean that the class loader isn't strongly reachable.

That is what _strongly_reachable == 0 means, that the CLD isn't strongly 
reachable.? (?)

>
> In the bug entry I suggested a name `_strong_count` and tried to avoid the word "reachable" because it already have a meaning for the GC. When talking about the "strong" property we both talk about strongly reachable and strong roots. The property for this CLD is that it is a strong root from the GC's perspective. Maybe we can use that instead. What do you think about
> `_strong_root` and `is_strong_root`? Or maybe even `_root` and `is_root`?

"strong" has meaning for GC with the addition of "root" or "reachable", 
but on its own has no meaning for CLDG.? "strong" can mean a whole host 
of things.? It doesn't help me know why we're setting this flag for this 
CLD.

I want the attribute to tell me that GC can't unload this CLD!
>
> -------------
>
> PR Comment: https://git.openjdk.org/jdk/pull/20408#issuecomment-2261207888


From coleen.phillimore at oracle.com  Wed Jul 31 19:49:40 2024
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Wed, 31 Jul 2024 15:49:40 -0400
Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive
In-Reply-To: <ddea7a55-e60c-4fd7-b6ea-0c08748ad205@oracle.com>
References: <qzky8BLyBVaSJmI00_ovSHZQxyudEMAGbnF6DHa97MI=.475e81bd-d939-451d-8960-a1afbc6c2126@github.com>
 <DptNYmKH9gv3oyw9FMkTD7xTMIEcztvLNeyhXnTqURA=.af2fce7a-e394-4b87-ae2e-d414bbb8acac@github.com>
 <ddea7a55-e60c-4fd7-b6ea-0c08748ad205@oracle.com>
Message-ID: <e4f4922f-0bc9-4ee5-88ee-728c9c477589@oracle.com>



On 7/31/24 3:46 PM, coleen.phillimore at oracle.com wrote:
>
>
> On 7/31/24 3:04 PM, Stefan Karlsson wrote:
>> On Wed, 31 Jul 2024 18:35:12 GMT, Coleen Phillimore 
>> <coleenp at openjdk.org> wrote:
>>
>>> How does this rename look?? Instead of ClassLoaderData::keep_alive() 
>>> and a _keep_alive refcount, it's been renamed to _strongly_reachable 
>>> and is_strongly_reachable().
>>> Tested with tier1 on Oracle supported platforms.
>> There's a risk that someone incorrectly interprets:
>>
>> - _strongly_reachable 0
>>
>>
>> to mean that the class loader isn't strongly reachable.
>
> That is what _strongly_reachable == 0 means, that the CLD isn't 
> strongly reachable.? (?)
>
>>
>> In the bug entry I suggested a name `_strong_count` and tried to 
>> avoid the word "reachable" because it already have a meaning for the 
>> GC. When talking about the "strong" property we both talk about 
>> strongly reachable and strong roots. The property for this CLD is 
>> that it is a strong root from the GC's perspective. Maybe we can use 
>> that instead. What do you think about
>> `_strong_root` and `is_strong_root`? Or maybe even `_root` and 
>> `is_root`?
>
> "strong" has meaning for GC with the addition of "root" or 
> "reachable", but on its own has no meaning for CLDG.? "strong" can 
> mean a whole host of things.? It doesn't help me know why we're 
> setting this flag for this CLD.
>
> I want the attribute to tell me that GC can't unload this CLD!

How about _gc_root and is_gc_root() ?
>>
>> -------------
>>
>> PR Comment: 
>> https://git.openjdk.org/jdk/pull/20408#issuecomment-2261207888
>


From amenkov at openjdk.org  Wed Jul 31 20:05:37 2024
From: amenkov at openjdk.org (Alex Menkov)
Date: Wed, 31 Jul 2024 20:05:37 GMT
Subject: Integrated: 8331015: Obsolete -XX:+UseNotificationThread
In-Reply-To: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com>
References: <bLUGHCTJHF_LiwVu0wVJ2onQG6wD5_k_RnDstWMkkhw=.5b5d3af1-f406-41f4-b9b5-1137cab9fa8c@github.com>
Message-ID: <hpTcWsFUasOZ7VebYQlsV8-d6lmZD2pvuK759k8-1PE=.899119f7-9d6c-4bba-b918-0e6c24fcb73e@github.com>

On Tue, 30 Jul 2024 01:57:33 GMT, Alex Menkov <amenkov at openjdk.org> wrote:

> Obsolete UseNotificationThread flag which was deprecated in JDK 23.
> 
> Testing: tier1..tier5

This pull request has now been integrated.

Changeset: 8af2ef35
Author:    Alex Menkov <amenkov at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/8af2ef35b6f9399b6d25ff7a4a72ad283df63f03
Stats:     41 lines in 7 files changed: 1 ins; 31 del; 9 mod

8331015: Obsolete -XX:+UseNotificationThread

Reviewed-by: dholmes, kevinw, sspitsyn, coleenp

-------------

PR: https://git.openjdk.org/jdk/pull/20381

From coleenp at openjdk.org  Wed Jul 31 20:41:47 2024
From: coleenp at openjdk.org (Coleen Phillimore)
Date: Wed, 31 Jul 2024 20:41:47 GMT
Subject: RFR: 8335059: Consider renaming ClassLoaderData::keep_alive [v2]
In-Reply-To: <qzky8BLyBVaSJmI00_ovSHZQxyudEMAGbnF6DHa97MI=.475e81bd-d939-451d-8960-a1afbc6c2126@github.com>
References: <qzky8BLyBVaSJmI00_ovSHZQxyudEMAGbnF6DHa97MI=.475e81bd-d939-451d-8960-a1afbc6c2126@github.com>
Message-ID: <GyNKTm70AgDEk0aji4kZnBWp5wdQ4o1sojpOs0S5Y7s=.fc5ed1b4-bae7-4322-a772-2c2cabf1389d@github.com>

> How does this rename look?  Instead of ClassLoaderData::keep_alive() and a _keep_alive refcount, it's been renamed to _strongly_reachable and is_strongly_reachable().
> Tested with tier1 on Oracle supported platforms.

Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:

  Rename to keep_alive_ref_count

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20408/files
  - new: https://git.openjdk.org/jdk/pull/20408/files/fc69e2e1..38f036f7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20408&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20408&range=00-01

  Stats: 29 lines in 10 files changed: 0 ins; 0 del; 29 mod
  Patch: https://git.openjdk.org/jdk/pull/20408.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20408/head:pull/20408

PR: https://git.openjdk.org/jdk/pull/20408

From clanger at openjdk.org  Wed Jul 31 20:59:33 2024
From: clanger at openjdk.org (Christoph Langer)
Date: Wed, 31 Jul 2024 20:59:33 GMT
Subject: RFR: 8333144: docker tests do not work when ubsan is configured
 [v4]
In-Reply-To: <WY7JVdGq2cEFGFQ5cypCV_cjVNlp73ZWVkjkr2kpzxA=.a67f045b-8fc4-4426-8ac7-a513942731ed@github.com>
References: <ZvbABYMRyAzsduPjTnYhPBs3v5b06J6p0z0rHvfVAjE=.508e7351-d483-4a99-8115-79dd51d24586@github.com>
 <WY7JVdGq2cEFGFQ5cypCV_cjVNlp73ZWVkjkr2kpzxA=.a67f045b-8fc4-4426-8ac7-a513942731ed@github.com>
Message-ID: <3OFH8Fhnbd7pkBTFmQ0cH5FjWTmfIV26C2gqQSI-Vlc=.63d3fb39-f116-4b5d-b392-10069a987a7f@github.com>

On Wed, 31 Jul 2024 14:07:46 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work.
>> 
>> We find this in the test output
>> 
>> [STDOUT]
>> /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory
>> 
>> The container where the test is executed does not contain the ubsan package; we might skip the test in this case.
>
> Matthias Baesken has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - remove method from WhiteBox.java
>  - remove WB_isUbsanEnabled, fix test

Fine for me.

-------------

Marked as reviewed by clanger (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19907#pullrequestreview-2211120542

From azafari at openjdk.org  Wed Jul 31 23:20:07 2024
From: azafari at openjdk.org (Afshin Zafari)
Date: Wed, 31 Jul 2024 23:20:07 GMT
Subject: RFR: 8333151: Investigate if the Hotspot Arena chunk pools still make
 sense
Message-ID: <nDdImFYUsNwB-8V2TVDpyUKBNWr3gt7h6tgCiWybfFA=.0cf55e88-69cb-4a4e-ad26-f9d9fa8231e2@github.com>

Using `ChunkPool` or not is investigated in this PR based on time and memory consumption.
Based on the tests using ChunkPool shows no better speed nor memory footprint.
Memory usage is taken from RSS reports of Linux API.

-------------

Commit messages:
 - 8333151: Investigate if the Hotspot Arena chunk pools still make sense
 - Merge branch '_8333151_chunk_pool_test' of http://github.com/afshin-zafari/jdk into _8333151_chunk_pool_test
 - add memory footprint measurement
 - 8333151: Investigate if the Hotspot Arena chunk pools still make sense
 - rebase master
 - compare the memory footprint
 - add memory footprint measurement
 - 8333151: Investigate if the Hotspot Arena chunk pools still make sense

Changes: https://git.openjdk.org/jdk/pull/20411/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20411&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8333151
  Stats: 119 lines in 3 files changed: 113 ins; 0 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/20411.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20411/head:pull/20411

PR: https://git.openjdk.org/jdk/pull/20411

From azafari at openjdk.org  Wed Jul 31 23:38:02 2024
From: azafari at openjdk.org (Afshin Zafari)
Date: Wed, 31 Jul 2024 23:38:02 GMT
Subject: RFR: 8333151: Investigate if the Hotspot Arena chunk pools still
 make sense [v2]
In-Reply-To: <nDdImFYUsNwB-8V2TVDpyUKBNWr3gt7h6tgCiWybfFA=.0cf55e88-69cb-4a4e-ad26-f9d9fa8231e2@github.com>
References: <nDdImFYUsNwB-8V2TVDpyUKBNWr3gt7h6tgCiWybfFA=.0cf55e88-69cb-4a4e-ad26-f9d9fa8231e2@github.com>
Message-ID: <uSrCn5pmcLBmOhoFkJ0hH2nEOLABLhMCJOjaucfiJKE=.5691ad8a-8ff3-4036-ac30-b0807786f78a@github.com>

> Using `ChunkPool` or not is investigated in this PR based on time and memory consumption.
> Based on the tests using ChunkPool shows no better speed nor memory footprint.
> Memory usage is taken from RSS reports of Linux API.

Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision:

  fixes after merge glitches

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20411/files
  - new: https://git.openjdk.org/jdk/pull/20411/files/081cf0a2..dc6be286

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20411&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20411&range=00-01

  Stats: 8 lines in 2 files changed: 0 ins; 0 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/20411.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20411/head:pull/20411

PR: https://git.openjdk.org/jdk/pull/20411

From kvn at openjdk.org  Wed Jul 31 23:46:45 2024
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Wed, 31 Jul 2024 23:46:45 GMT
Subject: RFR: 8337396: Cleanup usage of ExternalAddess
Message-ID: <MbVHDHN0Sd9JmSr8tzsk_TW9patwxXiBBkUpFhqPOD8=.2f5a1013-6d44-4e88-8f44-31bfcb5ef5bc@github.com>

`ExternalAddess` should be used only for data load. For calls (and jump) instructions we should use `RuntimeAddress` which uses `runtime_call_Relocation`.

I found few places where `ExternalAddess` is used incorrectly and fixed them.

I also added code to print "hottest" (most referenced) `ExternalAddess` addresses in global table to move them into static global tables which will be introduced by [JDK-8334691](https://bugs.openjdk.org/browse/JDK-8334691) and [JDK-8337519](https://bugs.openjdk.org/browse/JDK-8337519).

Here is current output from debug VM on MacBook M1 (Aarch64):

External addresses table: 6 entries, 324 accesses
0:      158 0x00000001082de0f0 : extn: vmClasses::_klasses+480
1:       84 0x00000001082ddf20 : extn: vmClasses::_klasses+16
2:       40 0x00000001082c4790 : extn: SharedRuntime::_partial_subtype_ctr
3:       24 0x00000001082bdb04 : extn: JvmtiExport::_should_notify_object_alloc
4:       18 0x0000000118384080 : stub: forward exception


on MacOS-x64:

External addresses table: 143 entries, 44405 accesses
0:    11766 0x00000001047922a0 : extn: CompressedOops::_narrow_oop
1:    11002 0x0000000104474384 : 'should not reach here'
2:     9672 0x0000000104581a90 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+882068
3:     2447 0x0000000104508005 : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+383753
4:     1916 0x000000010458188e : extn: ClassLoader::file_name_for_class_name(char const*, int)::class_suffix+881554


and on linux-x64:

External addresses table: 143 entries, 77297 accesses
0:    22334 0x00007f35d5b9c000 : ''
1:    19789 0x00007f35d55eea1f : 'should not reach here'
2:    18366 0x00007f35d5747bb8 : 'MacroAssembler::decode_heap_oop: heap base corrupted?'
3:     5036 0x00007f35d56e4d40 : 'uncommon trap returned which should never happen'
4:     3643 0x00007f35d57479f8 : 'MacroAssembler::encode_heap_oop: heap base corrupted?'


Few points about difference in output:
1. aarch64 does not use `ExternalAddess` or any relocation for messages (strings).
2. `stub: forward exception` corresponds to `StubRoutines::forward_exception_entry()` for which C2 generates tail-call from [C2's stubs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/generateOptoStub.cpp#L258C48-L258C87). It is difficult to convert it to `RuntimeAddress` because how relocation for constants in C2 are handled.
3. linux-x64 implementation of `dlladdr()`, I used to print C++ symbol name, only works for functions:
    `0x00007f35d5b9c000` points to `CompressedOops::_narrow_oop._base` from code in [MacroAssembler::verify_heapbase()](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L5760) and on aarch64 [verify_heapbase()](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2959) is empty (guarded by `#if 0`).
4. I think `ClassLoader::file_name_for_class_name()+...` on MacOSX-x64 corresponds to strings on linux-x64.

Additionally I moved asserts before locks in `ExternalsRecorder` methods.

Tested: tier1-3, xcomp, stress

-------------

Commit messages:
 - 8337396: Cleanup usage of ExternalAddess

Changes: https://git.openjdk.org/jdk/pull/20412/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20412&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8337396
  Stats: 104 lines in 5 files changed: 86 ins; 3 del; 15 mod
  Patch: https://git.openjdk.org/jdk/pull/20412.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20412/head:pull/20412

PR: https://git.openjdk.org/jdk/pull/20412